src.model.transf_octorf_classification_model
Functions
|
Translate |
Classes
|
Streaming IQR clamping + standardization for NN input features. |
|
- src.model.transf_octorf_classification_model.build_decimation_spec_for_torf(entry, m)
Translate
fps_decorator/mindist_decoratorentries to the normalized dict consumed bypyvl3dpp.octree_mine_leaves_*_ff.Returns
{}(empty dict, dispatched to the C++ nullptr fast path) when:neither key is present in
entry,both keys are explicitly
None/ JSONnull, oreither key is present as an empty dict
{}.
Raises
MinerExceptionwhen:both keys are non-empty on the same entry (mirrors the Python decorator class separation between
FPSDecoratedMinerandMinDistDecoratedMiner); ora required sub-key is missing (
num_pointsforfps_decorator,min_distanceformindist_decorator).
Logs a single
warningline for the ignored sub-keys (per entry). The C++ adapter flow has no encode/decode step sonum_encoding_neighborsand friends are accepted (for copy-paste compatibility with Python decorator JSON) but not consumed.- Parameters:
entry (dict) – A single
mining_configentry (dict).m (int) – Size of the input source cloud, used to resolve
num_pointsexpressions like"m/2".
- Returns:
Normalized dict for the C++ binding, or
{}.- Return type:
dict
- class src.model.transf_octorf_classification_model.FeaturePreprocessor(iqr_multiplier=1.5, eps=1e-08)
Streaming IQR clamping + standardization for NN input features.
Designed for the TransfOctoRF offline sequencing pattern where training data is spread across multiple files (one per point cloud). The preprocessor is fitted once after the RF stage and before the NN stage, using a two-pass streaming approach:
Pass 1 (percentiles): Accumulate all feature values per column across all files to compute exact \(Q_1\) and \(Q_3\). Only one column’s values are held in memory at a time.
Pass 2 (mean/std): Stream files again, clamp each chunk using the bounds from Pass 1, and accumulate running mean and variance via Welford’s online algorithm.
After fitting,
transform()is called per-chunk during NN training (the preprocessor parameters are frozen).Pass 1 — Percentiles and bounds:
\[Q_{1,j}, Q_{3,j} = \text{percentile}_{25}(\mathbf{f}_j),\; \text{percentile}_{75}(\mathbf{f}_j)\]\[\text{IQR}_j = Q_{3,j} - Q_{1,j}, \quad L_j = Q_{1,j} - \alpha \cdot \text{IQR}_j, \quad U_j = Q_{3,j} + \alpha \cdot \text{IQR}_j\]Pass 2 — Online mean and standard deviation (Welford’s algorithm) on clamped values:
\[\bar{f}_j^{(n)} = \bar{f}_j^{(n-1)} + \frac{\tilde{f}_{nj} - \bar{f}_j^{(n-1)}}{n}, \quad M_{2,j}^{(n)} = M_{2,j}^{(n-1)} + (\tilde{f}_{nj} - \bar{f}_j^{(n-1)}) (\tilde{f}_{nj} - \bar{f}_j^{(n)})\]where \(\tilde{f}_{nj} = \text{clamp}(f_{nj}, L_j, U_j)\). After processing all \(N\) samples:
\[\mu_j = \bar{f}_j^{(N)}, \quad \sigma_j = \sqrt{M_{2,j}^{(N)} / N}\]Transform (applied per-chunk, frozen parameters):
\[\hat{f}_{ij} = \frac{\text{clamp}(f_{ij}, L_j, U_j) - \mu_j} {\sigma_j + \epsilon}\]- Variables:
iqr_multiplier – IQR multiplier for clamping bounds (default 1.5).
Q1 – 25th percentile per feature.
Q3 – 75th percentile per feature.
lower – Lower clamping bound per feature.
upper – Upper clamping bound per feature.
mu – Mean per feature (after clamping).
sigma – Std per feature (after clamping).
- __init__(iqr_multiplier=1.5, eps=1e-08)
- fit(chunk_factory)
Fit preprocessing parameters from a chunk factory.
The
chunk_factoryis a callable that returns a fresh iterator of 2D numpy arrays each time it is called. This allows multiple passes over the data without holding all chunks in memory simultaneously. A plain list also works (it is wrapped automatically).At no point is data from more than one chunk held in memory simultaneously. Percentiles are computed by appending each chunk’s column values to a temporary binary file (one chunk at a time), then computing percentiles from a memory-mapped view. Note that
np.percentilemay materialize one column’s data into RAM for sorting, but this is a single column (not a full chunk or multiple chunks). Mean and standard deviation are computed via a vectorized Welford algorithm, one chunk at a time.- Parameters:
chunk_factory – Callable returning an iterator of
(n_i, n_f)arrays, or a list of such arrays.
- fit_single(F)
Convenience method: fit from a single in-memory matrix.
Equivalent to
fit([F]).- Parameters:
F – Feature matrix (N x n_f).
- transform(F, dtype=None)
Apply IQR clamping + standardization to F (N x n_f).
Parameters are frozen from the last
fit()call.- Parameters:
F – Feature matrix to transform.
dtype – Output dtype (e.g.,
np.float32). When given, the preprocessor parameters are cast to this dtype before computation, so all intermediates stay in the requested precision. Default isNone(preserve the input dtype).
- Returns:
Standardized feature matrix.
- fit_transform(F)
Fit and transform in one step.
- class src.model.transf_octorf_classification_model.TransfOctoRFClassificationModel(**kwargs)
- Author:
Alberto M. Esmoris Pena
Three-stage classification model for 3D point clouds:
Octree stage: Voxelizes point clouds, computes leaf centroids, extracts multi-scale features via C++ miner adapters.
Random Forest stage: C++ RF trained on centroid features. Outputs pseudoprobabilities, entropy, class ambiguity.
Neural Network stage: Transformer or SharedMLP consumes K-neighbor centroids with features + RF outputs. Outputs final predictions.
The model is NOT a deep learning model (
is_deep_learning_model()returnsFalse). It orchestrates the C++ RF and Keras NN internally.See
ClassificationModelandModel.- Variables:
leaf_voxel_length (float) – Octree voxel side length.
ro (float) – Neighborhood radius (0 = auto).
rf_model (
RandomForestPPClassificationModel) – The C++ Random Forest wrapper.nn_handler (
TransfOctoRFHandler) – The neural network handler.preprocessor (
FeaturePreprocessor) – IQR + standardization fitted on NN training data.nn_train_on_pcloud (bool) – Whether the pipeline’s input point cloud data (X, F, y) is also used to train the neural network. When
False, the pipeline data only trains the RF; the NN must be trained from a separate source (e.g., offline sequencing). Set toFalseto avoid data leakage between the RF and NN stages.lowest_uncertainty_prediction (bool) – When
True, the final prediction for each sample is selected from the RF or NN output based on which has the lowest class ambiguity. WhenFalse(default), NN predictions are always used.
- static extract_model_args(spec)
Extract arguments from a pipeline specification.
- __init__(**kwargs)
Initialize a TransfOctoRFClassificationModel.
- is_deep_learning_model()
TransfOctoRF is NOT a pure deep learning model.
- overwrite_pretrained_model(spec)
Continue-training entry point for TORF. Called by the sequential pipeline after loading a previously trained TORF model via the
pretrained_modelkey. Updates the orchestrator attributes that govern further NN training so the user can override them in the JSON spec (e.g., raise the number of epochs, attach a new transfer/freeze spec).- Parameters:
spec (dict) – The training-stage spec dict from the JSON pipeline.
- prepare_model()
Prepare the C++ RF and NN handler.
- build_arch_and_handler(n_f, num_classes)
Shared helper that builds a
TransfOctoRFPwiseClassifarchitecture and aTransfOctoRFHandlerfromself.nn_hparams.Used by both
prepare_nn_handler()(training) and__setstate__()(deserialization) to avoid duplicating the parameter mapping logic.- Parameters:
n_f (int) – Number of input features per centroid.
num_classes (int) – Number of output classes.
- Returns:
(arch, handler) tuple.
- Return type:
tuple
- prepare_nn_handler(n_f, num_classes)
Build and compile the NN handler.
When this model was loaded from a pickled checkpoint via the
pretrained_modelpipeline mechanism,__setstate__has already restored a fittednn_handlerwhosearch.nncarries the learned weights. In that case we must not rebuild the handler — doing so would silently discard those weights. The rebuild path is taken only when no usable handler is present.- Parameters:
n_f – Number of NN input features.
num_classes – Number of output classes.
- mine_centroids(X_pts, centroids, F_pts=None)
Run all miners in
mining_configon the given centroid positions, using the original point cloud as support.Dispatches each entry to the corresponding C++ miner adapter:
"GeometricFeatures++": multi-scale geometric descriptors."HeightFeatures++": height statistics."SmoothFeatures++": smoothed point cloud features."Recount++": counting-based features.
Each entry may include a
"frenames"list to override the default output feature names.- Parameters:
X_pts (
np.ndarray) – Original point cloud coordinates (N, 3).centroids (
np.ndarray) – Octree centroid coordinates (S, 3).F_pts (
np.ndarrayor None) – Point cloud features (N, n_f_pcloud), required bySmoothFeatures++andRecount++.
- Returns:
(features, fnames) — mined feature matrix (S, n_f) and corresponding feature names.
- Return type:
tuple[np.ndarray, list]
- static mine_geom(X_pts, centroids, entry)
Dispatch
GeometricFeatures++to C++ adapter.The entry uses the same specification as the standard VL3D
GeometricFeaturesPPminer: a singleneighborhood.radiusvalue. Usefrenamesto assign custom output names. For multi-scale features, add multiple entries inmining_configwith different radii.- Parameters:
X_pts – Point cloud coordinates (N, 3).
centroids – Centroid coordinates (S, 3).
entry – Mining config entry dict.
- Returns:
(features, names).
- static mine_height(X_pts, centroids, entry)
Dispatch
HeightFeatures++to C++ adapter.- Parameters:
X_pts – Point cloud coordinates (N, 3).
centroids – Centroid coordinates (S, 3).
entry – Mining config entry dict.
- Returns:
(features, names).
- static mine_smooth(X_pts, centroids, F_pts, entry)
Dispatch
SmoothFeatures++to C++ adapter.- Parameters:
X_pts – Point cloud coordinates (N, 3).
centroids – Centroid coordinates (S, 3).
F_pts – Point cloud features (N, n_f).
entry – Mining config entry dict.
- Returns:
(features, names).
- static mine_recount(X_pts, centroids, F_pts, entry)
Dispatch
Recount++to C++ adapter.- Parameters:
X_pts – Point cloud coordinates (N, 3).
centroids – Centroid coordinates (S, 3).
F_pts – Point cloud features (N, n_f).
entry – Mining config entry dict.
- Returns:
(features, names).
- compute_rf_outputs(centroids, features, labels)
Train or predict with RF, compute entropy + class ambiguity.
- Parameters:
centroids – (S, 3) centroid coordinates.
features – (S, nf) mined features.
labels – (S,) class labels (for training) or None.
- Returns:
(proba, entropy, class_ambiguity) arrays.
- build_knn_neighbors(centroids, center_indices=None)
Build neighbor indices for centroids using the configured
neighborhood_strategy.When
center_indicesis given, neighborhoods are computed only for the selected centers (indices intocentroids). The returned neighbor indices still reference the fullcentroidsarray. This avoids computing neighborhoods for centroids that won’t be used as training samples.When
"knn"(default): K-nearest neighbors via scipy KDTree.When
"spherical_fps": sphere query (radiusneighborhood_radius) + FPS subsampling to K points via C++alg_spherical_fps_neighbors.- Parameters:
centroids (
np.ndarray) – (S, 3) full centroid array.center_indices (
np.ndarrayor None) – Indices of centroids to use as neighborhood centers. When None, all centroids are used.
- Returns:
(neighbors, mask) — indices (S_sel, K) int32 and validity mask (S_sel, K) bool, where S_sel is len(center_indices) or S if None.
- Return type:
tuple
- build_knn_handle(centroids)
Build a persistent C++ KNN handle on the given centroid set. The octree is built once and can be queried repeatedly via
handle.query()(KNN by index) orhandle.query_nearest()(k=1 nearest for external points).- Parameters:
centroids (
np.ndarray) – Centroid coordinates (S, 3).- Returns:
Persistent KNN handle.
- should_use_chunked_knn(S, center_indices=None)
Decide whether to use chunked per-batch KNN instead of materializing the full
(S, K)neighbors array. The decision depends on the centroid count, thepredict_chunked_knn_thresholdconfig, and the neighborhood strategy.- Parameters:
S (int) – Number of centroids.
center_indices (
np.ndarrayor None) – Selected center indices, or None for full evaluation.
- Returns:
True if chunked KNN should be used.
- Return type:
bool
- build_spherical_fps_neighbors(centroids, center_indices=None)
Build neighbors via sphere query + FPS subsampling using C++
alg_sphere_fps_neighbors.- Parameters:
centroids – (S, 3) full centroid array.
center_indices – Indices of centroids to query. When None, all centroids are queried.
- Returns:
(neighbors, mask) — (S_sel, K) int32/bool.
- build_nn_features(X, rf_proba, entropy, class_ambiguity)
Build the NN feature matrix from input features and RF outputs, selecting columns according to
nn_fnames.All features are returned in a single matrix. The preprocessor (IQR clamping + standardization) is applied uniformly to all columns, including RF probabilities and class ambiguity. Standardizing all inputs to zero-mean ensures mixed-sign gradients in the first layer, avoiding the zig-zag convergence problem that arises when all inputs are non-negative.
- Parameters:
X (
np.ndarray) – Full feature matrix (S, n_f).rf_proba (
np.ndarray) – RF pseudoprobabilities (S, n_c).entropy (
np.ndarray) – Prediction entropy (S,).class_ambiguity (
np.ndarray) – Class ambiguity (S,).
- Returns:
Concatenated NN feature matrix (S, n_f_nn).
- Return type:
np.ndarray
- static class_ambiguity(proba)
Compute class ambiguity from a probability matrix.
\[a = 1 - p_{\max} + p_{\text{second}}\]- Parameters:
proba (
np.ndarray) – Probability matrix (S, n_c).- Returns:
Class ambiguity (S,).
- Return type:
np.ndarray
- expand_nn_fnames(n_c)
Build expanded NN feature names, replacing the
'rf_proba'shorthand withn_cper-class column names. Whenclass_namesis available, the columns are namedrf_<class_name>; otherwise they fall back torf_proba_0, …,rf_proba_{n_c-1}.- Parameters:
n_c (int) – Number of classes.
- Returns:
List of expanded feature names.
- Return type:
list
- batched_nn_predict(nn_features_std, coords, neighbors, mask, rf_fallback=None, knn_handle=None, center_indices=None)
Run batched NN prediction using
gather_and_center()per batch to avoid GPU OOM.When
knn_handleis provided, KNN is computed per-batch via the persistent handle instead of indexing a pre-built neighbors array. This avoids materializing the full(S, K)array (~40 GB for large point clouds).- Parameters:
nn_features_std (
np.ndarray) – Preprocessed features (S, nf) float32.coords (
np.ndarray) – Centroid coordinates (S, 3) float32.neighbors (
np.ndarrayor None) – KNN indices (S_sel, K) int32. None when chunked KNN is used.mask (
np.ndarrayor None) – Validity mask (S_sel, K) bool. None when chunked KNN is used.rf_fallback (
np.ndarrayor None) – RF probabilities (S, n_c) used for centroids not reached by any selected center (only whennn_point_wise_labels=True).knn_handle – Persistent KNN handle from
build_knn_handle(). When provided, KNN is queried per-batch instead of usingneighbors.center_indices (
np.ndarrayor None) – (S,) int32 center indices for the chunked path. Required whenknn_handleis not None.
- Returns:
NN probabilities (S, n_c).
- Return type:
np.ndarray
- static gather_and_center(features, coords, neighbors, mask, feat_out=None, coord_out=None)
Gather KNN neighbor features and coordinates, center coordinates on the neighborhood center, and zero out invalid neighbors. This is the single entry point for assembling the (B, K, nf) and (B, K, 3) tensors that the NN receives as input.
Uses a fused C++ implementation (
alg_gather_center_mask_fs32) that performs the gather, centering, and masking in a single OpenMP- parallelized pass, avoiding the multi-GB intermediate arrays that numpy fancy indexing would create.When
feat_outandcoord_outare provided (pre-allocated buffers of the correct shape), the C++ function writes into them directly, avoiding repeated ~7 GB allocations across batches.- Parameters:
features (
np.ndarray) – Preprocessed features (S, nf) f32.coords (
np.ndarray) – Centroid coordinates (S, 3) f32.neighbors (
np.ndarray) – KNN indices (B, K) int32.mask (
np.ndarray) – Neighbor validity mask (B, K) bool.feat_out (
np.ndarrayor None) – Pre-allocated (B, K, nf) f32 or None.coord_out (
np.ndarrayor None) – Pre-allocated (B, K, 3) f32 or None.
- Returns:
(feat_t, coord_t, mask) — (B, K, nf), (B, K, 3), (B, K) all float32/bool.
- Return type:
tuple
- assemble_nn_input(centroids, features, neighbors, mask)
Assemble the NN input tensors. Delegates to
gather_and_center().- Parameters:
centroids – Centroid coordinates (S, 3).
features – Preprocessed features (S, nf).
neighbors – KNN indices (S, K).
mask – Neighbor validity mask (S, K).
- Returns:
(feat_t, coord_t, mask, n_f_nn).
- apply_training_input_strategy(centroids, labels, strategy_spec)
Select which centroids serve as neighborhood centers for NN training, according to the given strategy.
- Parameters:
centroids (
np.ndarray) – Centroid coordinates (S, 3).labels (
np.ndarray) – Centroid labels (S,) int32.strategy_spec (dict or None) – Strategy specification dict.
- Returns:
Index array of selected centroid indices.
- Return type:
np.ndarray
- apply_predictive_input_strategy(centroids)
Select which centroids serve as neighborhood centers for NN prediction. Only active when
nn_point_wise_labels=True— the scatter-accumulate ensures every centroid receives a prediction as a neighbor of the selected centers.Supported strategies:
"full"(default): all centroids are centers."fps": furthest point sampling. Controlled bypredictive_K(target count) andpredictive_fps_fast(mode 0–4)."mindist_decimation": min distance decimation. Controlled bypredictive_min_distance.
- Parameters:
centroids (
np.ndarray) – Centroid coordinates (S, 3).- Returns:
Index array of selected centroid indices, or None for full selection.
- Return type:
np.ndarrayor None
- static match_coords_to_indices(original, selected, eps=1e-07)
Match selected coordinates back to original indices via KDTree nearest-neighbor query.
- Parameters:
original – (S, 3) original centroid coords.
selected – (M, 3) selected coords from C++.
eps – Maximum distance tolerance.
- Returns:
Index array of matched original indices.
- Return type:
np.ndarray
- preprocess_pcloud(X_coords, y, F_pts=None)
Build octree centroids from a raw point cloud and mine features using
mining_config.- Parameters:
X_coords (
np.ndarray) – Point cloud coordinates (N, 3).y (
np.ndarray) – Point-wise class labels (N,).F_pts (
np.ndarrayor None) – Point cloud features (N, n_f_pcloud), needed bySmoothFeatures++andRecount++.
- Returns:
(features, labels, X_centered) — mined feature matrix at centroids (S, n_f_mined), centroid labels (S,), and globally centered point cloud coordinates (N, 3) float32. Callers that don’t need X_centered can use
[:2]to unpack only the first two values. Also updatesself.fnameswith the mined feature names and stores centroid coordinates inself.centroids_.- Return type:
tuple[np.ndarray, np.ndarray, np.ndarray]
- train_rf_from_pclouds(paths)
Train the C++ Random Forest from a pool of point cloud files using offline (serialized) training.
Each file is loaded, octree-voxelized, and feature-mined one at a time. The mined features and labels are written to a C++ data store file, then freed. After all files are processed, data stores are merged and the RF is trained from the merged store. At no point are multiple point clouds’ data held in memory simultaneously.
- Parameters:
paths (list of str) – List of LAS/LAZ file paths.
- train_base(pcloud)
Override
Model.train_base()to extract both the structure space (coordinates) and the feature space (fnames) from the point cloud.Coordinates are always used for the octree and KNN. Features from
fnamesare available to data miners (e.g., smooth and recount) and, whenmining_configis empty, serve as the direct RF/NN input.- Parameters:
pcloud (
PointCloud) – Input point cloud.- Returns:
The trained model.
- Return type:
- training(X_coords, y, info=True, F_pts=None)
Train the full TransfOctoRF pipeline.
- Parameters:
X_coords – Point cloud coordinates (N, 3). When
mining_configis empty, this is the pre-mined feature matrix instead.y – Class labels (N,).
F_pts (
np.ndarrayor None) – Point cloud features (N, n_f_pcloud) fromfnames. Available to data miners that need input features (smooth, recount). Can be None if no features are needed.
- fit_nn_sequencer(storage_path, n_f_nn)
Shared helper: build the NN handler, create a sequencer on the HDF5 cache, train, clean up.
Used by
_train_nn_from_hdf5(),train_nn_from_pclouds()(Stage 4), andtrain_nn().- Parameters:
storage_path (str) – Path to the HDF5 training file.
n_f_nn (int) – Number of NN input features.
- Returns:
Keras training history.
- Return type:
keras.callbacks.History
- train_nn_from_hdf5(storage_path)
Train the NN directly from an existing HDF5 cache file, skipping all preprocessing stages. Used when
disable_nn_offline_storage_writingis True.- Parameters:
storage_path (str) – Path to the HDF5 cache file.
- Returns:
Keras training history.
- Return type:
keras.callbacks.History
- train_nn_from_pclouds(paths)
Train the NN stage from a list of point cloud files.
Only one point cloud’s data is held in memory at a time. Three passes over the files are performed:
Preprocess pass: Load each pcloud, run octree + mining + RF outputs, build NN features, feed them to the
FeaturePreprocessorfit, then discard. Only per-column statistics are retained.Serialize pass: Reload each pcloud, recompute mined features and NN features, transform with the fitted preprocessor, build KNN + assemble tensors, append chunk to HDF5, then discard.
Train: Open the HDF5 via the sequencer and train the NN.
- Parameters:
paths (list of str) – List of LAS/LAZ file paths.
- Returns:
Keras training history.
- Return type:
keras.callbacks.History
- train_nn(X_coords, y, F_pts=None, _skip_mining=False)
Train the neural network stage on the given data.
Runs stages 2–6 of the pipeline: compute RF outputs, preprocess features, build K-NN neighbors, assemble tensors, and train the NN via disk-based sequencing.
The RF must already be trained before calling this method. Can be called with data different from the RF training set to avoid data leakage.
- Parameters:
X_coords (
np.ndarray) – Point cloud coordinates (N, 3). Whenmining_configis empty or_skip_miningis True, this is the pre-mined feature matrix.y (
np.ndarray) – Class labels (N,).F_pts (
np.ndarrayor None) – Point cloud features (N, n_f_pcloud).
- Returns:
Keras training history.
- Return type:
keras.callbacks.History
- prepare_nn_input(X, fit_preprocessor=False)
Shared stages 2–5 of the TransfOctoRF pipeline: compute RF outputs, build NN features, preprocess, build KNN neighbors, and assemble the NN input tensors.
Used by both
train_nn(withfit_preprocessor=True) and_predict(withfit_preprocessor=False).- Parameters:
X (
np.ndarray) – Feature matrix (S, n_f).fit_preprocessor (bool) – If True, fit a new preprocessor on the data. If False, use the existing one (frozen from training).
- Returns:
(nn_input, rf_proba, rf_ambiguity, nn_features_std) where nn_input is [feat_t, coord_t, mask_t] and nn_features_std is the flat (S, n_f_nn) preprocessed features.
- Return type:
tuple
- run_rfvsnn_evaluation(X, y)
Run the RF vs NN comparison evaluation and produce report and/or plot.
Center selection for the NN evaluation (requires
nn_point_wise_labels=True):When a predictive input strategy is configured, its centers are used. The scatter-accumulate mechanism fills all centroids, so the evaluation covers the full set.
When no predictive strategy is configured but a training input strategy is available, the training strategy centers are used. Scatter- accumulate still ensures full coverage.
When no strategy is configured, or
nn_point_wise_labels=False, the NN evaluates all centroids directly. Predictive strategies are meaningless without point-wise scatter because unselected centroids would have no NN predictions.
- Parameters:
X (
np.ndarray) – Centroid feature matrix (S, n_f).y (
np.ndarray) – Centroid labels (S,).
- export_support_points(path, out_prefix=None, centroids=None)
Export centroids as a LAS/LAZ point cloud.
- Parameters:
path (str) – Output file path (LAS/LAZ).
out_prefix (str or None) – Output prefix for
*expansion.centroids (
np.ndarrayor None) – Centroid coordinates to export. When None, usesself.centroids_(all centroids).
- export_receptive_fields(y, neighbors, mask, nn_feat_std, coords, rf_dir=None, dist_report_path=None, dist_plot_path=None, y_all=None)
Export receptive field reports, distribution reports, and distribution plots. Reuses the existing
ReceptiveFieldsReport,ReceptiveFieldsDistributionReport, andReceptiveFieldsDistributionPlotclasses.The method is isolated from the training/prediction logic and only called when at least one output path is configured. Paths must already be resolved (no
*expansion here).- Parameters:
y – Centroid labels (S,). Can be None.
neighbors – KNN indices (S, K).
mask – Validity mask (S, K).
nn_feat_std – Preprocessed NN features (S, nf_nn) float32.
coords – Centroid coordinates (S, 3).
rf_dir – Directory for RF point clouds.
dist_report_path – Path for distribution CSV.
dist_plot_path – Path for distribution plot.
y_all – Full centroid labels (S_full,) for per-neighbor label gathering when
nn_point_wise_labels=True. Can be None.
- on_training_finished(X, y, yhat=None)
See
model.Model.on_training_finished().
- predict(pcloud, X=None)
Override
Model.predict()to handle internal octree + mining whenmining_configis set.Always extracts coordinates from the point cloud for the octree. Features from
fnamesare passed to data miners that need them (smooth, recount).- Parameters:
pcloud (
PointCloud) – Input point cloud.X (
np.ndarrayor None) – Pre-computed feature matrix (optional).
- Returns:
Point-wise predicted class labels.
- Return type:
np.ndarray
- run_centroid_pipeline(X, coords=None, knn_handle=None)
Shared centroid-level prediction pipeline: RF outputs -> NN features -> preprocess -> KNN -> batched NN predict -> select predictions.
Used by both
predict()(with mining) andpredict_centroids()(without mining).- Parameters:
X (
np.ndarray) – Centroid feature matrix (S, nf).coords (
np.ndarrayor None) – Centroid coordinates (S, 3). When None, resolved fromself.centroids_or X.knn_handle – Persistent KNN handle from
build_knn_handle(). When the centroid count exceeds the chunked KNN threshold, KNN is computed per-batch to avoid materializing the full (S, K) neighbors array. If a handle is provided it is reused; if None one is built internally and released after use.
- Returns:
(preds, proba, rf_proba, rf_ambiguity, nn_features_std, coords_f32, neighbors, knn_mask) — predictions + intermediate data needed by callers. When chunked KNN is used, neighbors and knn_mask are None.
- Return type:
tuple
- predict_centroids(X, zout=None)
Centroid-level prediction with the full TransfOctoRF pipeline (RF + NN). Predicts in batches to avoid GPU OOM on large point clouds.
- Parameters:
X – Centroid feature matrix (S, n_f).
- Returns:
Predicted class labels (S,).
- static select_lowest_ambiguity(rf_proba, rf_ambiguity, nn_proba)
Select predictions from the source (RF or NN) with the lowest class ambiguity for each sample.
Class ambiguity is defined as \(1 - p_{\text{max}} + p_{\text{second}}\).
- Parameters:
rf_proba (
np.ndarray) – RF probabilities (S, n_c).rf_ambiguity (
np.ndarray) – RF class ambiguity (S,).nn_proba (
np.ndarray) – NN probabilities (S, n_c) or (S, 1).
- Returns:
(proba, preds) — selected probabilities and predicted class labels.
- Return type:
tuple[np.ndarray, np.ndarray]