src.model.deeplearn.sequencer.transf_octorf_sequencer
Classes
|
- class src.model.deeplearn.sequencer.transf_octorf_sequencer.TransfOctoRFSequencer(storage_path, batch_size, num_classes, chunk_randomization=True, batch_randomization=True, augmentor=None, point_wise_labels=False, **kwargs)
- Author:
Alberto M. Esmoris Pena
Disk-based sequencer for TransfOctoRF neural network training. Reads compact 2D arrays from HDF5 chunks and performs the KNN gather per mini-batch at training time.
Each chunk in the HDF5 file represents one point cloud’s worth of centroid-level data in compact 2D form. Features and coordinates cover the full centroid set (S_full) so that neighbor indices remain valid. Neighbors, mask, and labels cover only the selected training centers (S_sel <= S_full), as determined by the training input strategy.
storage.h5 ├── chunk_0/ │ ├── features (S_full, n_f) float32 │ ├── coordinates (S_full, 3) float32 │ ├── neighbors (S_sel, K) int32 │ ├── mask (S_sel, K) bool │ └── labels (S_sel,) int32 ├── chunk_1/ │ └── ...
The KNN gather (expanding 2D features into 3D neighbor tensors) is deferred to
__getitem__, where it operates on tiny mini-batches (~100 KB at batch_size=32). This avoids materializing the full(S, K, n_f)tensor (~16 GB) in memory or on disk.During each epoch, chunks are loaded one at a time into memory (~1.5 GB per chunk instead of ~17 GB). Batches are extracted from the cached chunk. Chunk and batch order can be randomized per epoch.
- Variables:
storage_path (str) – Path to the HDF5 file.
batch_size (int) – Number of samples per batch.
num_classes (int) – Number of output classes.
chunk_randomization (bool) – Shuffle chunk order per epoch.
batch_randomization (bool) – Shuffle samples within chunks.
- __init__(storage_path, batch_size, num_classes, chunk_randomization=True, batch_randomization=True, augmentor=None, point_wise_labels=False, **kwargs)
- static create_storage(storage_path, chunks_data)
Create the HDF5 storage file from a list of chunk dicts.
Each chunk dict must contain compact 2D arrays:
'features'(S_full, nf),'coordinates'(S_full, 3),'neighbors'(S_sel, K),'mask'(S_sel, K),'labels'(S_sel,). Features/coordinates may have more rows than neighbors/mask/labels when a training input strategy filters the training centers.- Parameters:
storage_path – Output HDF5 file path.
chunks_data – List of chunk dicts.
- on_epoch_end()
Reshuffle chunk and batch order for next epoch.
- close()
Close the HDF5 file.