src.model.deeplearn.sequencer.transf_octorf_sequencer

Classes

TransfOctoRFSequencer(storage_path, ...[, ...])

class src.model.deeplearn.sequencer.transf_octorf_sequencer.TransfOctoRFSequencer(storage_path, batch_size, num_classes, chunk_randomization=True, batch_randomization=True, augmentor=None, point_wise_labels=False, **kwargs)

Author:: Alberto M. Esmoris Pena

Disk-based sequencer for TransfOctoRF neural network training. Reads compact 2D arrays from HDF5 chunks and performs the KNN gather per mini-batch at training time.

Each chunk in the HDF5 file represents one point cloud’s worth of centroid-level data in compact 2D form. Features and coordinates cover the full centroid set (S_full) so that neighbor indices remain valid. Neighbors, mask, and labels cover only the selected training centers (S_sel <= S_full), as determined by the training input strategy.

storage.h5
├── chunk_0/
│   ├── features     (S_full, n_f)  float32
│   ├── coordinates  (S_full, 3)    float32
│   ├── neighbors    (S_sel, K)     int32
│   ├── mask         (S_sel, K)     bool
│   └── labels       (S_sel,)       int32
├── chunk_1/
│   └── ...

The KNN gather (expanding 2D features into 3D neighbor tensors) is deferred to __getitem__, where it operates on tiny mini-batches (~100 KB at batch_size=32). This avoids materializing the full (S, K, n_f) tensor (~16 GB) in memory or on disk.

During each epoch, chunks are loaded one at a time into memory (~1.5 GB per chunk instead of ~17 GB). Batches are extracted from the cached chunk. Chunk and batch order can be randomized per epoch.

Variables:

storage_path (str) – Path to the HDF5 file.
batch_size (int) – Number of samples per batch.
num_classes (int) – Number of output classes.
chunk_randomization (bool) – Shuffle chunk order per epoch.
batch_randomization (bool) – Shuffle samples within chunks.

__init__(storage_path, batch_size, num_classes, chunk_randomization=True, batch_randomization=True, augmentor=None, point_wise_labels=False, **kwargs)

static create_storage(storage_path, chunks_data)

Create the HDF5 storage file from a list of chunk dicts.

Each chunk dict must contain compact 2D arrays: 'features' (S_full, nf), 'coordinates' (S_full, 3), 'neighbors' (S_sel, K), 'mask' (S_sel, K), 'labels' (S_sel,). Features/coordinates may have more rows than neighbors/mask/labels when a training input strategy filters the training centers.

Parameters:

storage_path – Output HDF5 file path.
chunks_data – List of chunk dicts.

on_epoch_end(): Reshuffle chunk and batch order for next epoch.

close(): Close the HDF5 file.