src.model.deeplearn.layer.submanifold_spconv3d_layer

Classes

SubmanifoldSpConv3DLayer(*args, **kwargs)

class src.model.deeplearn.layer.submanifold_spconv3d_layer.SubmanifoldSpConv3DLayer(*args, **kwargs)

Author:: Alberto M. Esmoris Pena

Submanifold sparse 3D convolution at a single hierarchy depth, driven by a dense submanifold neighbor table emitted by the C++ pre-processor and concatenated across receptive fields by DLSparseConcatSequencer.

The layer’s call expects two inputs:

\(\pmb{F} \in \mathbb{R}^{(1 + R) \times n_f}\) — the concatenated per-cell features at the current depth, with row 0 reserved as the shared ground row (all zeros).
\(\pmb{S} \in \mathbb{Z}^{(1 + R) \times (2 w + 1)^3}\) — the submanifold neighbor table. Row 0 is the ground marker (all zeros). Row \(v \in [1, R]\) contains the one-based indices of the cells in the \((2 w + 1)^3\)-cell window centered on the \(v\)-th active cell, in z-inner / y-middle / x-outer ordering. Entries equal to 0 mark inactive neighbors — tf.gather() will then fetch the ground row.

The convolution is then a single fused operation:

\[\pmb{G}_{v*} = \sum_{p = 0}^{(2 w + 1)^3 - 1} \pmb{F}_{\pmb{S}_{v p} *} \pmb{W}_{p}\]

Implemented as tf.einsum('ijk,jkm->im', tf.gather(F, S[1:]), W) where \(\pmb{W} \in \mathbb{R}^{(2 w + 1)^3 \times n_f \times n_g}\) is the kernel and the first axis of \(\pmb{W}\) indexes the kernel position in the same z-inner / y-middle / x-outer order as the columns of \(\pmb{S}\). The kernel-position axis j is bound between the gathered features and the weights so each kernel position has its own weight slice (see the explanatory comment in spconv3d_on_elem()).

The dense-gather design trims the layer body to a few lines — a single tf.gather + matmul — and unblocks real batch normalization across all receptive fields by removing the per-element tf.map_fn that the earlier hash-table / SparseIndexingMapLayer path required.

Variables:

w (int) – Submanifold convolutional window half-size. The window covers \((2 w + 1)^3\) cells.
f (int) – Number of convolutional filters / kernel positions. Equals \((2 w + 1)^3\) by construction.
nf (int) – Input feature dimension.
ng (int) – Output feature dimension.
W (tf.Variable) – Kernel of shape \((f, n_f, n_g)\).

__init__(w, f, nf, ng, built_W=False, W_initializer=None, W_regularizer=None, W_constraint=None, **kwargs): See Layer and layer.Layer.__init__().

build(dim_in): Build the convolutional kernel weights. See Layer and layer.Layer.build().

call(inputs, training=False, mask=False)

Apply the submanifold convolution to the concatenated input.

Parameters:: inputs – A two-element list / tuple. inputs[0] is \(\pmb{F}\) ((1 + R, n_f) float32); inputs[1] is the submanifold neighbor table \(\pmb{S}\) ((1 + R, (2 w + 1)^3) int32).
Returns:: A (1 + R, n_g) tensor whose row 0 is a fresh ground row (all zeros) and rows \(v \in [1, R]\) hold the convolved per-cell features.
Return type:: tf.Tensor

static spconv3d_on_elem(F, S_active, W)

Pad-based variant of the submanifold convolution kernel.

Parameters:

F – Padded feature tensor (1 + R_in, n_f) — the ground row at index 0 holds zeros and is gathered when S_active contains the sentinel value 0.
S_active – Neighbor index table (R, num_neighbors) with values in [0, R_in]; value 0 fetches the ground row.
W – Kernel (num_neighbors, n_f, n_g).

Returns:

Convolved features (R, n_g).

The 'ijk,jkm->im' einsum binds the kernel-position axis (j) of the gathered features to the kernel-position axis of W, giving each kernel position its own weight slice (Graham 2018 sparse-conv math).

Kept for callers that already hold a padded F. Callers that hold the active-cell view of F should prefer spconv3d_on_elem_active() (no per-call tf.pad allocation).

static spconv3d_on_elem_active(F_active, S_active, W)

Active-form variant of spconv3d_on_elem(). Same math, but operates directly on the active-cell view of F so the caller does not have to prepend a ground row with tf.pad() before every spconv call.

Parameters:

F_active – Active-cell feature tensor (R_in, n_f) — no ground row.
S_active –
Neighbor index table (R, num_neighbors) with values in [0, R_in]; value 0 is the “missing neighbor” sentinel (the convolved output is zero for that (r, k) position).

Invariant (enforced by the C++ pre-processor and the sequencer’s offset arithmetic): S_active must satisfy 0 <= S_active <= R_in. The active-form path clamps via tf.maximum(S - 1, 0) then multiplies by (S > 0) to zero the sentinel rows. As a side effect out-of-range values (S > R_in or S < 0) are silently absorbed — clamped + masked away — whereas the pad-based path spconv3d_on_elem() would crash on them under eager mode (InvalidArgumentError) and produce undefined output under tf.function. The two paths diverge in error behaviour for malformed input. If you suspect a pre-processor regression, switch to the pad path or enable tf.debugging.enable_check_numerics.
W – Kernel (num_neighbors, n_f, n_g).

Returns:

Convolved features (R, n_g).

Equivalence to the pad path: gather(F_pad, S)[r, k] = 0 when S[r, k] == 0 and = F_pad[S[r, k]] = F_active[S[r, k] - 1] otherwise. Here we compute gather(F_active, max(S - 1, 0)) * (S > 0), which evaluates to 0 exactly when the sentinel fires and to F_active[S - 1] elsewhere — bit-identical math.

Memory: this variant avoids the per-call (1 + R_in) × n_f pad allocation. The mask broadcast (R, num_neighbors, 1) is smaller whenever n_f > 1.

The underlying matmul is the reshape-then-matmul form (see spconv3d_on_idx_real()), not an einsum, so cuBLAS / OpenBLAS receive a clean (R, f·n_f) × (f·n_f, n_g) GEMM rather than the Reshape → Transpose → BatchMatMul → Reshape chain that TF lowered tf.einsum('ijk,jkm->im', ...) to.

static spconv3d_on_idx_real(F_active, idx, real, W)

Variant of spconv3d_on_elem_active() that takes the precomputed clamped index idx = max(S_active - 1, 0) and sentinel mask real = cast(S_active > 0, dtype) directly.

Use this when the same depth-\(t\) S_active drives more than one spconv at a call site (the typical encoder / decoder block runs num_spconvs + 1 SSC3D invocations against the same table). The caller computes idx and real once and reuses them for every call, avoiding the redundant tf.maximum / tf.cast per kernel.

Parameters:

F_active – Active-cell feature tensor (R_in, n_f).
idx – Pre-clamped neighbor indices (R, num_neighbors) with values in [0, R_in).
real – Pre-cast sentinel mask (R, num_neighbors) with the same dtype as F_active.
W – Kernel (num_neighbors, n_f, n_g).

get_config(): Return necessary data to serialize the layer.

classmethod from_config(config): Deserialize the layer from a config dict.