src.model.deeplearn.layer.submanifold_spconv3d_layer
Classes
|
- class src.model.deeplearn.layer.submanifold_spconv3d_layer.SubmanifoldSpConv3DLayer(*args, **kwargs)
- Author:
Alberto M. Esmoris Pena
Submanifold sparse 3D convolution at a single hierarchy depth, driven by a dense submanifold neighbor table emitted by the C++ pre-processor and concatenated across receptive fields by
DLSparseConcatSequencer.The layer’s call expects two inputs:
\(\pmb{F} \in \mathbb{R}^{(1 + R) \times n_f}\) — the concatenated per-cell features at the current depth, with row 0 reserved as the shared ground row (all zeros).
\(\pmb{S} \in \mathbb{Z}^{(1 + R) \times (2 w + 1)^3}\) — the submanifold neighbor table. Row 0 is the ground marker (all zeros). Row \(v \in [1, R]\) contains the one-based indices of the cells in the \((2 w + 1)^3\)-cell window centered on the \(v\)-th active cell, in z-inner / y-middle / x-outer ordering. Entries equal to 0 mark inactive neighbors —
tf.gather()will then fetch the ground row.
The convolution is then a single fused operation:
\[\pmb{G}_{v*} = \sum_{p = 0}^{(2 w + 1)^3 - 1} \pmb{F}_{\pmb{S}_{v p} *} \pmb{W}_{p}\]Implemented as
tf.einsum('ijk,jkm->im', tf.gather(F, S[1:]), W)where \(\pmb{W} \in \mathbb{R}^{(2 w + 1)^3 \times n_f \times n_g}\) is the kernel and the first axis of \(\pmb{W}\) indexes the kernel position in the same z-inner / y-middle / x-outer order as the columns of \(\pmb{S}\). The kernel-position axisjis bound between the gathered features and the weights so each kernel position has its own weight slice (see the explanatory comment inspconv3d_on_elem()).The dense-gather design trims the layer body to a few lines — a single
tf.gather+matmul— and unblocks real batch normalization across all receptive fields by removing the per-elementtf.map_fnthat the earlier hash-table /SparseIndexingMapLayerpath required.- Variables:
w (int) – Submanifold convolutional window half-size. The window covers \((2 w + 1)^3\) cells.
f (int) – Number of convolutional filters / kernel positions. Equals \((2 w + 1)^3\) by construction.
nf (int) – Input feature dimension.
ng (int) – Output feature dimension.
W (
tf.Variable) – Kernel of shape \((f, n_f, n_g)\).
- __init__(w, f, nf, ng, built_W=False, W_initializer=None, W_regularizer=None, W_constraint=None, **kwargs)
See
Layerandlayer.Layer.__init__().
- call(inputs, training=False, mask=False)
Apply the submanifold convolution to the concatenated input.
- Parameters:
inputs – A two-element list / tuple.
inputs[0]is \(\pmb{F}\) ((1 + R, n_f)float32);inputs[1]is the submanifold neighbor table \(\pmb{S}\) ((1 + R, (2 w + 1)^3)int32).- Returns:
A
(1 + R, n_g)tensor whose row 0 is a fresh ground row (all zeros) and rows \(v \in [1, R]\) hold the convolved per-cell features.- Return type:
tf.Tensor
- static spconv3d_on_elem(F, S_active, W)
Pad-based variant of the submanifold convolution kernel.
- Parameters:
F – Padded feature tensor
(1 + R_in, n_f)— the ground row at index 0 holds zeros and is gathered whenS_activecontains the sentinel value 0.S_active – Neighbor index table
(R, num_neighbors)with values in[0, R_in]; value 0 fetches the ground row.W – Kernel
(num_neighbors, n_f, n_g).
- Returns:
Convolved features
(R, n_g).
The
'ijk,jkm->im'einsum binds the kernel-position axis (j) of the gathered features to the kernel-position axis ofW, giving each kernel position its own weight slice (Graham 2018 sparse-conv math).Kept for callers that already hold a padded
F. Callers that hold the active-cell view ofFshould preferspconv3d_on_elem_active()(no per-calltf.padallocation).
- static spconv3d_on_elem_active(F_active, S_active, W)
Active-form variant of
spconv3d_on_elem(). Same math, but operates directly on the active-cell view ofFso the caller does not have to prepend a ground row withtf.pad()before every spconv call.- Parameters:
F_active – Active-cell feature tensor
(R_in, n_f)— no ground row.S_active –
Neighbor index table
(R, num_neighbors)with values in[0, R_in]; value 0 is the “missing neighbor” sentinel (the convolved output is zero for that(r, k)position).Invariant (enforced by the C++ pre-processor and the sequencer’s offset arithmetic):
S_activemust satisfy0 <= S_active <= R_in. The active-form path clamps viatf.maximum(S - 1, 0)then multiplies by(S > 0)to zero the sentinel rows. As a side effect out-of-range values (S > R_inorS < 0) are silently absorbed — clamped + masked away — whereas the pad-based pathspconv3d_on_elem()would crash on them under eager mode (InvalidArgumentError) and produce undefined output undertf.function. The two paths diverge in error behaviour for malformed input. If you suspect a pre-processor regression, switch to the pad path or enabletf.debugging.enable_check_numerics.W – Kernel
(num_neighbors, n_f, n_g).
- Returns:
Convolved features
(R, n_g).
Equivalence to the pad path:
gather(F_pad, S)[r, k] = 0whenS[r, k] == 0and= F_pad[S[r, k]] = F_active[S[r, k] - 1]otherwise. Here we computegather(F_active, max(S - 1, 0)) * (S > 0), which evaluates to0exactly when the sentinel fires and toF_active[S - 1]elsewhere — bit-identical math.Memory: this variant avoids the per-call
(1 + R_in) × n_fpad allocation. The mask broadcast(R, num_neighbors, 1)is smaller whenevern_f > 1.The underlying matmul is the reshape-then-matmul form (see
spconv3d_on_idx_real()), not an einsum, so cuBLAS / OpenBLAS receive a clean(R, f·n_f) × (f·n_f, n_g)GEMM rather than theReshape → Transpose → BatchMatMul → Reshapechain that TF loweredtf.einsum('ijk,jkm->im', ...)to.
- static spconv3d_on_idx_real(F_active, idx, real, W)
Variant of
spconv3d_on_elem_active()that takes the precomputed clamped indexidx = max(S_active - 1, 0)and sentinel maskreal = cast(S_active > 0, dtype)directly.Use this when the same depth-\(t\)
S_activedrives more than one spconv at a call site (the typical encoder / decoder block runsnum_spconvs + 1SSC3D invocations against the same table). The caller computesidxandrealonce and reuses them for every call, avoiding the redundanttf.maximum/tf.castper kernel.- Parameters:
F_active – Active-cell feature tensor
(R_in, n_f).idx – Pre-clamped neighbor indices
(R, num_neighbors)with values in[0, R_in).real – Pre-cast sentinel mask
(R, num_neighbors)with the same dtype asF_active.W – Kernel
(num_neighbors, n_f, n_g).
- get_config()
Return necessary data to serialize the layer.
- classmethod from_config(config)
Deserialize the layer from a config dict.