src.model.deeplearn.sequencer.dl_sparse_concat_sequencer

Classes

DLSparseConcatSequencer(X, y, batch_size, ...)

class src.model.deeplearn.sequencer.dl_sparse_concat_sequencer.DLSparseConcatSequencer(X, y, batch_size, **kwargs)
Author:

Alberto M. Esmoris Pena

A deep learning sequencer that feeds receptive fields into a sparse 3D convolutional neural network by concatenating per-batch-element receptive fields into a single global tensor.

Each batch produces:

  1. A single feature tensor \(\pmb{F} \in \mathbb{R}^{(1 + \Sigma_k R_{k0}) \times n_f}\) where row 0 is the shared ground row and rows \([1 + \mathrm{off}_{k0}, R_{k0} + \mathrm{off}_{k0}]\) hold the level-0 features of receptive field \(k\) (with \(\mathrm{off}_{k0} = \sum_{j < k} R_{j0}\) being the cumulative number of active cells from preceding receptive fields). Inactive cells are never present.

  2. A submanifold neighbor table \(\pmb{S}_t \in \mathbb{Z}^{(1 + \Sigma_k R_{kt}) \times (2 w_t + 1)^3}\) per depth \(t \in [0, t^*)\). Row 0 is the global ground row (entries all zero — looking it up returns the ground row of \(\pmb{F}\)). Rows \(v \in [1, R_{kt} + \mathrm{off}_{kt}]\) hold the one-based sequential indices of the cells in the submanifold convolutional window centered on the \(v\)-th active cell, offset so that the indices reference the correct row in the concatenated feature tensor.

  3. A downsampling neighbor table \(\pmb{D}_t \in \mathbb{Z}^{(1 + \Sigma_k R_{k(t+1)}) \times w_t^{D^3}}\) per depth \(t \in [0, t^* - 1)\). Indices point into depth \(t\)’s row space; row indices are at depth \(t + 1\).

  4. An upsampling neighbor table \(\pmb{U}_t \in \mathbb{Z}^{(1 + \Sigma_k R_{kt}) \times w_t^{U^3}}\) per depth \(t \in [0, t^* - 1)\). Indices point into depth \(t + 1\)’s row space; row indices are at depth \(t\).

Every batch is statically padded along its row axis so every batch this sequencer emits presents the same shape to the model. This is what eliminates tf.function retracing on the forward pass. Concretely, the row axis of each emitted tensor is 1 + pad_R_per_t[t] instead of the variable 1 + Σ_k R_{kt} of the per-batch real cells. F is padded with zero rows; the neighbor tables S/D/U are padded with all-zero rows so the downstream tf.gather fetches the ground row and the gather + matmul outputs zero features for padded cells. Labels are padded with class 0 and the loss / metric receives a matching sample_weight vector that is 1.0 over the real cells and 0.0 over the padded tail (and also over real cells whose label is in ignore_labels).

Every batch also emits per-depth real-cell masks M_t at the end of the input list. Each mask is a 1-D boolean tensor of length pad_R_per_t[t] with True over the real cells and False over the padded tail. The SpConv architecture forwards these masks to every MaskedBatchNormalization so the batch statistics ignore the padded zero rows and the running mean / variance converge to the real-cell distribution (not biased by the padding ratio).

Predictions come back stacked at n_batches × pad_R_per_t[0] rows; post_process_output() strips the padded tail of every batch and splits the real-cell rows back into per-receptive-field arrays using the cumulative-offset bookkeeping computed at prepare_data() time.

The sequencer expects the input list to be the HierarchicalSGPreProcessorPP pyout extended with the dense neighbor tables emitted by the C++ pre-processor. Specifically, X[0] is Fout, X[6] is S, X[7] is D, X[8] is U.

Variables:
  • total_elems (int) – Total number of receptive fields in the dataset.

  • max_depth (int) – Hierarchy depth \(t^*\).

  • R_per_rf_t (np.ndarray) – Number of active cells per receptive field per depth. Shape (total_elems, max_depth).

  • cum_offset_per_rf_t (np.ndarray) – Cumulative active-cell counts per depth, used to derive per-batch offsets. Shape (total_elems + 1, max_depth); cum_offset_per_rf_t[k, t] is the row position in the global depth-\(t\) tensor at which receptive field \(k\)’s first active cell sits.

  • pad_R_per_t (np.ndarray) – Per-depth static pad budget = sum of the top batch_size values of R_per_rf_t[:, t]. Bounds the worst-case batch sum regardless of which RFs the random shuffle groups together. Shape (max_depth,).

  • ignore_labels (np.ndarray or None) – Optional 1-D array of label values whose cells are masked out of the training loss / metric (sample_weight = 0). None disables masking.

__init__(X, y, batch_size, **kwargs)

Initialize the sequencer. See DLAbstractSequencer.__init__() for the base contract.

ignore_labels (optional kwarg, list of int): labels that should be excluded from the training loss / metric. Real cells whose label is in this list get sample_weight = 0.0 so they do not contribute to gradient updates. This is the cleanest way to keep noisy “unclassified” or domain-irrelevant cells out of training without dropping them entirely from the receptive field.

property batch_size
set_input_data(X, y)

Bind the input data and (re)compute per-RF active-cell counts and cumulative offsets. See DLAbstractSequencer.set_input_data().

Parameters:
  • X – The input data. Expected to be the HierarchicalSGPreProcessorPP pyout extended with the dense neighbor tables — a list of length 9 where X[0] is Fout and X[6:9] are S, D, U.

  • y – Reference labels as a list of per-RF 1-D arrays of shape (R_k0,).

prepare_data()

Compute per-RF active-cell counts, cumulative offsets, and the static per-depth pad sizes from the dense neighbor tables. Called every time the input data is rebound (e.g., when DLOfflineSequencer loads a new point cloud).

pad_R_per_t[t] is the smallest depth-\(t\) row count large enough to accommodate any batch this sequencer can possibly emit. It is computed as the sum of the top batch_size values of R_per_rf_t[:, t] — that bounds the worst-case batch sum regardless of which RFs random-shuffle groups together. Every batch is then padded to this fixed shape, eliminating per-step tf.function retracing.

getitem_training(idx)

See DLAbstractSequencer.getitem_training().

Returns a 3-tuple (batch_X, batch_y, sample_weight). Static-shape padding adds masked rows to every batch so all batches present an identical input shape to the model (avoiding tf.function retracing). sample_weight is 1.0 for the real cells of the batch and 0.0 for the padded tail; the loss and the metrics therefore ignore the padded rows entirely.

getitem_predict(idx)

See DLAbstractSequencer.getitem_predict().

on_epoch_end_training()

See DLAbstractSequencer.on_epoch_end_training().

init_random_indices()

See DLAbstractSequencer.init_random_indices().

apply_random_indices()

See DLAbstractSequencer.apply_random_indices(). Reorders every per-RF list in self.X and self.y according to self.Irandom, then recomputes the cumulative offsets.

extract_input_batch(start_idx, end_idx)

Build the concatenated input tensor list for the receptive fields in [start_idx, end_idx). The output ordering matches SpConv3DPwiseClassif.build_input():

[F, S_0, S_1, ..., S_{t*-1}, D_0, ..., D_{t*-2}, U_0, ..., U_{t*-2}]

Every tensor is padded along its row axis so that all batches produced by this sequencer share an identical shape — that is what unblocks the tf.function cache. F is padded with 0.0 rows; the neighbor tables (S, D, U) are padded with all-zero rows so that a downstream tf.gather fetches the ground row of F and the downstream gather + matmul outputs zeros for the padded cells. post_process_output and the sample-weight returned by extract_reference_batch() strip the padded rows out of the predictions and the loss respectively.

Every output tensor is allocated once at its final padded shape and per-RF blocks are written into it with an in-place + offset * (entry != 0) expression — avoiding the double-allocation pattern of “concat then pad” and the per-RF np.where round-trip.

Returns:

A list of int32 / float32 numpy arrays ready to be fed into Keras’s predict_on_batch() / train_on_batch().

Return type:

list

extract_reference_batch(start_idx, end_idx)

Flat concatenation of per-RF labels over the batch interval, padded along the row axis to pad_R_per_t[0] so that the loss / metric receives a tensor of the same shape on every step. Returns (y_padded, sample_weight) where sample_weight is 1.0 for the real-cell rows and 0.0 for the padded tail.

post_process_output(z_rf)

Split the flat model output back into a per-receptive-field list. Called by the model handler after stacking every batch’s output into a single (n_batches * pad_R_0, num_classes) tensor — i.e., each batch contributes pad_R_per_t[0] rows but only the first sum(R_k0) rows of each batch slice carry real predictions. This method strips the padded tail off every batch and then splits the resulting flat tensor by the per-RF cumulative offsets stored at prepare_data() time.

Note

This method assumes batches were processed in sequential idx = 0, 1, ..., n_batches - 1 order and that the sequencer state (self.X, cum_offset_per_rf_t) has not been mutated between the prediction loop and this call. VL3D’s prediction handler satisfies both invariants by building a fresh sequencer for the prediction pass; callers that reuse a sequencer across train and predict must avoid triggering a shuffle in between.

Parameters:

z_rf (np.ndarray) – The stacked per-batch model output.

Returns:

A list of per-RF arrays, each of shape (R_k0, num_classes).

Return type:

list of np.ndarray