src.model.transf_octorf_classification_model

Functions

build_decimation_spec_for_torf(entry, m)

Translate fps_decorator / mindist_decorator entries to the normalized dict consumed by pyvl3dpp.octree_mine_leaves_*_ff.

Classes

FeaturePreprocessor([iqr_multiplier, eps])

Streaming IQR clamping + standardization for NN input features.

TransfOctoRFClassificationModel(**kwargs)

src.model.transf_octorf_classification_model.build_decimation_spec_for_torf(entry, m)

Translate fps_decorator / mindist_decorator entries to the normalized dict consumed by pyvl3dpp.octree_mine_leaves_*_ff.

Returns {} (empty dict, dispatched to the C++ nullptr fast path) when:

  • neither key is present in entry,

  • both keys are explicitly None / JSON null, or

  • either key is present as an empty dict {}.

Raises MinerException when:

  • both keys are non-empty on the same entry (mirrors the Python decorator class separation between FPSDecoratedMiner and MinDistDecoratedMiner); or

  • a required sub-key is missing (num_points for fps_decorator, min_distance for mindist_decorator).

Logs a single warning line for the ignored sub-keys (per entry). The C++ adapter flow has no encode/decode step so num_encoding_neighbors and friends are accepted (for copy-paste compatibility with Python decorator JSON) but not consumed.

Parameters:
  • entry (dict) – A single mining_config entry (dict).

  • m (int) – Size of the input source cloud, used to resolve num_points expressions like "m/2".

Returns:

Normalized dict for the C++ binding, or {}.

Return type:

dict

class src.model.transf_octorf_classification_model.FeaturePreprocessor(iqr_multiplier=1.5, eps=1e-08)

Streaming IQR clamping + standardization for NN input features.

Designed for the TransfOctoRF offline sequencing pattern where training data is spread across multiple files (one per point cloud). The preprocessor is fitted once after the RF stage and before the NN stage, using a two-pass streaming approach:

  • Pass 1 (percentiles): Accumulate all feature values per column across all files to compute exact \(Q_1\) and \(Q_3\). Only one column’s values are held in memory at a time.

  • Pass 2 (mean/std): Stream files again, clamp each chunk using the bounds from Pass 1, and accumulate running mean and variance via Welford’s online algorithm.

After fitting, transform() is called per-chunk during NN training (the preprocessor parameters are frozen).

Pass 1 — Percentiles and bounds:

\[Q_{1,j}, Q_{3,j} = \text{percentile}_{25}(\mathbf{f}_j),\; \text{percentile}_{75}(\mathbf{f}_j)\]
\[\text{IQR}_j = Q_{3,j} - Q_{1,j}, \quad L_j = Q_{1,j} - \alpha \cdot \text{IQR}_j, \quad U_j = Q_{3,j} + \alpha \cdot \text{IQR}_j\]

Pass 2 — Online mean and standard deviation (Welford’s algorithm) on clamped values:

\[\bar{f}_j^{(n)} = \bar{f}_j^{(n-1)} + \frac{\tilde{f}_{nj} - \bar{f}_j^{(n-1)}}{n}, \quad M_{2,j}^{(n)} = M_{2,j}^{(n-1)} + (\tilde{f}_{nj} - \bar{f}_j^{(n-1)}) (\tilde{f}_{nj} - \bar{f}_j^{(n)})\]

where \(\tilde{f}_{nj} = \text{clamp}(f_{nj}, L_j, U_j)\). After processing all \(N\) samples:

\[\mu_j = \bar{f}_j^{(N)}, \quad \sigma_j = \sqrt{M_{2,j}^{(N)} / N}\]

Transform (applied per-chunk, frozen parameters):

\[\hat{f}_{ij} = \frac{\text{clamp}(f_{ij}, L_j, U_j) - \mu_j} {\sigma_j + \epsilon}\]
Variables:
  • iqr_multiplier – IQR multiplier for clamping bounds (default 1.5).

  • Q1 – 25th percentile per feature.

  • Q3 – 75th percentile per feature.

  • lower – Lower clamping bound per feature.

  • upper – Upper clamping bound per feature.

  • mu – Mean per feature (after clamping).

  • sigma – Std per feature (after clamping).

__init__(iqr_multiplier=1.5, eps=1e-08)
fit(chunk_factory)

Fit preprocessing parameters from a chunk factory.

The chunk_factory is a callable that returns a fresh iterator of 2D numpy arrays each time it is called. This allows multiple passes over the data without holding all chunks in memory simultaneously. A plain list also works (it is wrapped automatically).

At no point is data from more than one chunk held in memory simultaneously. Percentiles are computed by appending each chunk’s column values to a temporary binary file (one chunk at a time), then computing percentiles from a memory-mapped view. Note that np.percentile may materialize one column’s data into RAM for sorting, but this is a single column (not a full chunk or multiple chunks). Mean and standard deviation are computed via a vectorized Welford algorithm, one chunk at a time.

Parameters:

chunk_factory – Callable returning an iterator of (n_i, n_f) arrays, or a list of such arrays.

fit_single(F)

Convenience method: fit from a single in-memory matrix.

Equivalent to fit([F]).

Parameters:

F – Feature matrix (N x n_f).

transform(F, dtype=None)

Apply IQR clamping + standardization to F (N x n_f).

Parameters are frozen from the last fit() call.

Parameters:
  • F – Feature matrix to transform.

  • dtype – Output dtype (e.g., np.float32). When given, the preprocessor parameters are cast to this dtype before computation, so all intermediates stay in the requested precision. Default is None (preserve the input dtype).

Returns:

Standardized feature matrix.

fit_transform(F)

Fit and transform in one step.

class src.model.transf_octorf_classification_model.TransfOctoRFClassificationModel(**kwargs)
Author:

Alberto M. Esmoris Pena

Three-stage classification model for 3D point clouds:

  1. Octree stage: Voxelizes point clouds, computes leaf centroids, extracts multi-scale features via C++ miner adapters.

  2. Random Forest stage: C++ RF trained on centroid features. Outputs pseudoprobabilities, entropy, class ambiguity.

  3. Neural Network stage: Transformer or SharedMLP consumes K-neighbor centroids with features + RF outputs. Outputs final predictions.

The model is NOT a deep learning model (is_deep_learning_model() returns False). It orchestrates the C++ RF and Keras NN internally.

See ClassificationModel and Model.

Variables:
  • leaf_voxel_length (float) – Octree voxel side length.

  • ro (float) – Neighborhood radius (0 = auto).

  • rf_model (RandomForestPPClassificationModel) – The C++ Random Forest wrapper.

  • nn_handler (TransfOctoRFHandler) – The neural network handler.

  • preprocessor (FeaturePreprocessor) – IQR + standardization fitted on NN training data.

  • nn_train_on_pcloud (bool) – Whether the pipeline’s input point cloud data (X, F, y) is also used to train the neural network. When False, the pipeline data only trains the RF; the NN must be trained from a separate source (e.g., offline sequencing). Set to False to avoid data leakage between the RF and NN stages.

  • lowest_uncertainty_prediction (bool) – When True, the final prediction for each sample is selected from the RF or NN output based on which has the lowest class ambiguity. When False (default), NN predictions are always used.

static extract_model_args(spec)

Extract arguments from a pipeline specification.

__init__(**kwargs)

Initialize a TransfOctoRFClassificationModel.

is_deep_learning_model()

TransfOctoRF is NOT a pure deep learning model.

overwrite_pretrained_model(spec)

Continue-training entry point for TORF. Called by the sequential pipeline after loading a previously trained TORF model via the pretrained_model key. Updates the orchestrator attributes that govern further NN training so the user can override them in the JSON spec (e.g., raise the number of epochs, attach a new transfer/freeze spec).

Parameters:

spec (dict) – The training-stage spec dict from the JSON pipeline.

prepare_model()

Prepare the C++ RF and NN handler.

build_arch_and_handler(n_f, num_classes)

Shared helper that builds a TransfOctoRFPwiseClassif architecture and a TransfOctoRFHandler from self.nn_hparams.

Used by both prepare_nn_handler() (training) and __setstate__() (deserialization) to avoid duplicating the parameter mapping logic.

Parameters:
  • n_f (int) – Number of input features per centroid.

  • num_classes (int) – Number of output classes.

Returns:

(arch, handler) tuple.

Return type:

tuple

prepare_nn_handler(n_f, num_classes)

Build and compile the NN handler.

When this model was loaded from a pickled checkpoint via the pretrained_model pipeline mechanism, __setstate__ has already restored a fitted nn_handler whose arch.nn carries the learned weights. In that case we must not rebuild the handler — doing so would silently discard those weights. The rebuild path is taken only when no usable handler is present.

Parameters:
  • n_f – Number of NN input features.

  • num_classes – Number of output classes.

mine_centroids(X_pts, centroids, F_pts=None)

Run all miners in mining_config on the given centroid positions, using the original point cloud as support.

Dispatches each entry to the corresponding C++ miner adapter:

  • "GeometricFeatures++": multi-scale geometric descriptors.

  • "HeightFeatures++": height statistics.

  • "SmoothFeatures++": smoothed point cloud features.

  • "Recount++": counting-based features.

Each entry may include a "frenames" list to override the default output feature names.

Parameters:
  • X_pts (np.ndarray) – Original point cloud coordinates (N, 3).

  • centroids (np.ndarray) – Octree centroid coordinates (S, 3).

  • F_pts (np.ndarray or None) – Point cloud features (N, n_f_pcloud), required by SmoothFeatures++ and Recount++.

Returns:

(features, fnames) — mined feature matrix (S, n_f) and corresponding feature names.

Return type:

tuple[np.ndarray, list]

static mine_geom(X_pts, centroids, entry)

Dispatch GeometricFeatures++ to C++ adapter.

The entry uses the same specification as the standard VL3D GeometricFeaturesPP miner: a single neighborhood.radius value. Use frenames to assign custom output names. For multi-scale features, add multiple entries in mining_config with different radii.

Parameters:
  • X_pts – Point cloud coordinates (N, 3).

  • centroids – Centroid coordinates (S, 3).

  • entry – Mining config entry dict.

Returns:

(features, names).

static mine_height(X_pts, centroids, entry)

Dispatch HeightFeatures++ to C++ adapter.

Parameters:
  • X_pts – Point cloud coordinates (N, 3).

  • centroids – Centroid coordinates (S, 3).

  • entry – Mining config entry dict.

Returns:

(features, names).

static mine_smooth(X_pts, centroids, F_pts, entry)

Dispatch SmoothFeatures++ to C++ adapter.

Parameters:
  • X_pts – Point cloud coordinates (N, 3).

  • centroids – Centroid coordinates (S, 3).

  • F_pts – Point cloud features (N, n_f).

  • entry – Mining config entry dict.

Returns:

(features, names).

static mine_recount(X_pts, centroids, F_pts, entry)

Dispatch Recount++ to C++ adapter.

Parameters:
  • X_pts – Point cloud coordinates (N, 3).

  • centroids – Centroid coordinates (S, 3).

  • F_pts – Point cloud features (N, n_f).

  • entry – Mining config entry dict.

Returns:

(features, names).

compute_rf_outputs(centroids, features, labels)

Train or predict with RF, compute entropy + class ambiguity.

Parameters:
  • centroids – (S, 3) centroid coordinates.

  • features – (S, nf) mined features.

  • labels – (S,) class labels (for training) or None.

Returns:

(proba, entropy, class_ambiguity) arrays.

build_knn_neighbors(centroids, center_indices=None)

Build neighbor indices for centroids using the configured neighborhood_strategy.

When center_indices is given, neighborhoods are computed only for the selected centers (indices into centroids). The returned neighbor indices still reference the full centroids array. This avoids computing neighborhoods for centroids that won’t be used as training samples.

When "knn" (default): K-nearest neighbors via scipy KDTree.

When "spherical_fps": sphere query (radius neighborhood_radius) + FPS subsampling to K points via C++ alg_spherical_fps_neighbors.

Parameters:
  • centroids (np.ndarray) – (S, 3) full centroid array.

  • center_indices (np.ndarray or None) – Indices of centroids to use as neighborhood centers. When None, all centroids are used.

Returns:

(neighbors, mask) — indices (S_sel, K) int32 and validity mask (S_sel, K) bool, where S_sel is len(center_indices) or S if None.

Return type:

tuple

build_knn_handle(centroids)

Build a persistent C++ KNN handle on the given centroid set. The octree is built once and can be queried repeatedly via handle.query() (KNN by index) or handle.query_nearest() (k=1 nearest for external points).

Parameters:

centroids (np.ndarray) – Centroid coordinates (S, 3).

Returns:

Persistent KNN handle.

should_use_chunked_knn(S, center_indices=None)

Decide whether to use chunked per-batch KNN instead of materializing the full (S, K) neighbors array. The decision depends on the centroid count, the predict_chunked_knn_threshold config, and the neighborhood strategy.

Parameters:
  • S (int) – Number of centroids.

  • center_indices (np.ndarray or None) – Selected center indices, or None for full evaluation.

Returns:

True if chunked KNN should be used.

Return type:

bool

build_spherical_fps_neighbors(centroids, center_indices=None)

Build neighbors via sphere query + FPS subsampling using C++ alg_sphere_fps_neighbors.

Parameters:
  • centroids – (S, 3) full centroid array.

  • center_indices – Indices of centroids to query. When None, all centroids are queried.

Returns:

(neighbors, mask) — (S_sel, K) int32/bool.

build_nn_features(X, rf_proba, entropy, class_ambiguity)

Build the NN feature matrix from input features and RF outputs, selecting columns according to nn_fnames.

All features are returned in a single matrix. The preprocessor (IQR clamping + standardization) is applied uniformly to all columns, including RF probabilities and class ambiguity. Standardizing all inputs to zero-mean ensures mixed-sign gradients in the first layer, avoiding the zig-zag convergence problem that arises when all inputs are non-negative.

Parameters:
  • X (np.ndarray) – Full feature matrix (S, n_f).

  • rf_proba (np.ndarray) – RF pseudoprobabilities (S, n_c).

  • entropy (np.ndarray) – Prediction entropy (S,).

  • class_ambiguity (np.ndarray) – Class ambiguity (S,).

Returns:

Concatenated NN feature matrix (S, n_f_nn).

Return type:

np.ndarray

static class_ambiguity(proba)

Compute class ambiguity from a probability matrix.

\[a = 1 - p_{\max} + p_{\text{second}}\]
Parameters:

proba (np.ndarray) – Probability matrix (S, n_c).

Returns:

Class ambiguity (S,).

Return type:

np.ndarray

expand_nn_fnames(n_c)

Build expanded NN feature names, replacing the 'rf_proba' shorthand with n_c per-class column names. When class_names is available, the columns are named rf_<class_name>; otherwise they fall back to rf_proba_0, …, rf_proba_{n_c-1}.

Parameters:

n_c (int) – Number of classes.

Returns:

List of expanded feature names.

Return type:

list

batched_nn_predict(nn_features_std, coords, neighbors, mask, rf_fallback=None, knn_handle=None, center_indices=None)

Run batched NN prediction using gather_and_center() per batch to avoid GPU OOM.

When knn_handle is provided, KNN is computed per-batch via the persistent handle instead of indexing a pre-built neighbors array. This avoids materializing the full (S, K) array (~40 GB for large point clouds).

Parameters:
  • nn_features_std (np.ndarray) – Preprocessed features (S, nf) float32.

  • coords (np.ndarray) – Centroid coordinates (S, 3) float32.

  • neighbors (np.ndarray or None) – KNN indices (S_sel, K) int32. None when chunked KNN is used.

  • mask (np.ndarray or None) – Validity mask (S_sel, K) bool. None when chunked KNN is used.

  • rf_fallback (np.ndarray or None) – RF probabilities (S, n_c) used for centroids not reached by any selected center (only when nn_point_wise_labels=True).

  • knn_handle – Persistent KNN handle from build_knn_handle(). When provided, KNN is queried per-batch instead of using neighbors.

  • center_indices (np.ndarray or None) – (S,) int32 center indices for the chunked path. Required when knn_handle is not None.

Returns:

NN probabilities (S, n_c).

Return type:

np.ndarray

static gather_and_center(features, coords, neighbors, mask, feat_out=None, coord_out=None)

Gather KNN neighbor features and coordinates, center coordinates on the neighborhood center, and zero out invalid neighbors. This is the single entry point for assembling the (B, K, nf) and (B, K, 3) tensors that the NN receives as input.

Uses a fused C++ implementation (alg_gather_center_mask_fs32) that performs the gather, centering, and masking in a single OpenMP- parallelized pass, avoiding the multi-GB intermediate arrays that numpy fancy indexing would create.

When feat_out and coord_out are provided (pre-allocated buffers of the correct shape), the C++ function writes into them directly, avoiding repeated ~7 GB allocations across batches.

Parameters:
  • features (np.ndarray) – Preprocessed features (S, nf) f32.

  • coords (np.ndarray) – Centroid coordinates (S, 3) f32.

  • neighbors (np.ndarray) – KNN indices (B, K) int32.

  • mask (np.ndarray) – Neighbor validity mask (B, K) bool.

  • feat_out (np.ndarray or None) – Pre-allocated (B, K, nf) f32 or None.

  • coord_out (np.ndarray or None) – Pre-allocated (B, K, 3) f32 or None.

Returns:

(feat_t, coord_t, mask) — (B, K, nf), (B, K, 3), (B, K) all float32/bool.

Return type:

tuple

assemble_nn_input(centroids, features, neighbors, mask)

Assemble the NN input tensors. Delegates to gather_and_center().

Parameters:
  • centroids – Centroid coordinates (S, 3).

  • features – Preprocessed features (S, nf).

  • neighbors – KNN indices (S, K).

  • mask – Neighbor validity mask (S, K).

Returns:

(feat_t, coord_t, mask, n_f_nn).

apply_training_input_strategy(centroids, labels, strategy_spec)

Select which centroids serve as neighborhood centers for NN training, according to the given strategy.

Parameters:
  • centroids (np.ndarray) – Centroid coordinates (S, 3).

  • labels (np.ndarray) – Centroid labels (S,) int32.

  • strategy_spec (dict or None) – Strategy specification dict.

Returns:

Index array of selected centroid indices.

Return type:

np.ndarray

apply_predictive_input_strategy(centroids)

Select which centroids serve as neighborhood centers for NN prediction. Only active when nn_point_wise_labels=True — the scatter-accumulate ensures every centroid receives a prediction as a neighbor of the selected centers.

Supported strategies:

  • "full" (default): all centroids are centers.

  • "fps": furthest point sampling. Controlled by predictive_K (target count) and predictive_fps_fast (mode 0–4).

  • "mindist_decimation": min distance decimation. Controlled by predictive_min_distance.

Parameters:

centroids (np.ndarray) – Centroid coordinates (S, 3).

Returns:

Index array of selected centroid indices, or None for full selection.

Return type:

np.ndarray or None

static match_coords_to_indices(original, selected, eps=1e-07)

Match selected coordinates back to original indices via KDTree nearest-neighbor query.

Parameters:
  • original – (S, 3) original centroid coords.

  • selected – (M, 3) selected coords from C++.

  • eps – Maximum distance tolerance.

Returns:

Index array of matched original indices.

Return type:

np.ndarray

preprocess_pcloud(X_coords, y, F_pts=None)

Build octree centroids from a raw point cloud and mine features using mining_config.

Parameters:
  • X_coords (np.ndarray) – Point cloud coordinates (N, 3).

  • y (np.ndarray) – Point-wise class labels (N,).

  • F_pts (np.ndarray or None) – Point cloud features (N, n_f_pcloud), needed by SmoothFeatures++ and Recount++.

Returns:

(features, labels, X_centered) — mined feature matrix at centroids (S, n_f_mined), centroid labels (S,), and globally centered point cloud coordinates (N, 3) float32. Callers that don’t need X_centered can use [:2] to unpack only the first two values. Also updates self.fnames with the mined feature names and stores centroid coordinates in self.centroids_.

Return type:

tuple[np.ndarray, np.ndarray, np.ndarray]

train_rf_from_pclouds(paths)

Train the C++ Random Forest from a pool of point cloud files using offline (serialized) training.

Each file is loaded, octree-voxelized, and feature-mined one at a time. The mined features and labels are written to a C++ data store file, then freed. After all files are processed, data stores are merged and the RF is trained from the merged store. At no point are multiple point clouds’ data held in memory simultaneously.

Parameters:

paths (list of str) – List of LAS/LAZ file paths.

train_base(pcloud)

Override Model.train_base() to extract both the structure space (coordinates) and the feature space (fnames) from the point cloud.

Coordinates are always used for the octree and KNN. Features from fnames are available to data miners (e.g., smooth and recount) and, when mining_config is empty, serve as the direct RF/NN input.

Parameters:

pcloud (PointCloud) – Input point cloud.

Returns:

The trained model.

Return type:

TransfOctoRFClassificationModel

training(X_coords, y, info=True, F_pts=None)

Train the full TransfOctoRF pipeline.

Parameters:
  • X_coords – Point cloud coordinates (N, 3). When mining_config is empty, this is the pre-mined feature matrix instead.

  • y – Class labels (N,).

  • F_pts (np.ndarray or None) – Point cloud features (N, n_f_pcloud) from fnames. Available to data miners that need input features (smooth, recount). Can be None if no features are needed.

fit_nn_sequencer(storage_path, n_f_nn)

Shared helper: build the NN handler, create a sequencer on the HDF5 cache, train, clean up.

Used by _train_nn_from_hdf5(), train_nn_from_pclouds() (Stage 4), and train_nn().

Parameters:
  • storage_path (str) – Path to the HDF5 training file.

  • n_f_nn (int) – Number of NN input features.

Returns:

Keras training history.

Return type:

keras.callbacks.History

train_nn_from_hdf5(storage_path)

Train the NN directly from an existing HDF5 cache file, skipping all preprocessing stages. Used when disable_nn_offline_storage_writing is True.

Parameters:

storage_path (str) – Path to the HDF5 cache file.

Returns:

Keras training history.

Return type:

keras.callbacks.History

train_nn_from_pclouds(paths)

Train the NN stage from a list of point cloud files.

Only one point cloud’s data is held in memory at a time. Three passes over the files are performed:

  1. Preprocess pass: Load each pcloud, run octree + mining + RF outputs, build NN features, feed them to the FeaturePreprocessor fit, then discard. Only per-column statistics are retained.

  2. Serialize pass: Reload each pcloud, recompute mined features and NN features, transform with the fitted preprocessor, build KNN + assemble tensors, append chunk to HDF5, then discard.

  3. Train: Open the HDF5 via the sequencer and train the NN.

Parameters:

paths (list of str) – List of LAS/LAZ file paths.

Returns:

Keras training history.

Return type:

keras.callbacks.History

train_nn(X_coords, y, F_pts=None, _skip_mining=False)

Train the neural network stage on the given data.

Runs stages 2–6 of the pipeline: compute RF outputs, preprocess features, build K-NN neighbors, assemble tensors, and train the NN via disk-based sequencing.

The RF must already be trained before calling this method. Can be called with data different from the RF training set to avoid data leakage.

Parameters:
  • X_coords (np.ndarray) – Point cloud coordinates (N, 3). When mining_config is empty or _skip_mining is True, this is the pre-mined feature matrix.

  • y (np.ndarray) – Class labels (N,).

  • F_pts (np.ndarray or None) – Point cloud features (N, n_f_pcloud).

Returns:

Keras training history.

Return type:

keras.callbacks.History

prepare_nn_input(X, fit_preprocessor=False)

Shared stages 2–5 of the TransfOctoRF pipeline: compute RF outputs, build NN features, preprocess, build KNN neighbors, and assemble the NN input tensors.

Used by both train_nn (with fit_preprocessor=True) and _predict (with fit_preprocessor=False).

Parameters:
  • X (np.ndarray) – Feature matrix (S, n_f).

  • fit_preprocessor (bool) – If True, fit a new preprocessor on the data. If False, use the existing one (frozen from training).

Returns:

(nn_input, rf_proba, rf_ambiguity, nn_features_std) where nn_input is [feat_t, coord_t, mask_t] and nn_features_std is the flat (S, n_f_nn) preprocessed features.

Return type:

tuple

run_rfvsnn_evaluation(X, y)

Run the RF vs NN comparison evaluation and produce report and/or plot.

Center selection for the NN evaluation (requires nn_point_wise_labels=True):

  • When a predictive input strategy is configured, its centers are used. The scatter-accumulate mechanism fills all centroids, so the evaluation covers the full set.

  • When no predictive strategy is configured but a training input strategy is available, the training strategy centers are used. Scatter- accumulate still ensures full coverage.

  • When no strategy is configured, or nn_point_wise_labels=False, the NN evaluates all centroids directly. Predictive strategies are meaningless without point-wise scatter because unselected centroids would have no NN predictions.

Parameters:
  • X (np.ndarray) – Centroid feature matrix (S, n_f).

  • y (np.ndarray) – Centroid labels (S,).

export_support_points(path, out_prefix=None, centroids=None)

Export centroids as a LAS/LAZ point cloud.

Parameters:
  • path (str) – Output file path (LAS/LAZ).

  • out_prefix (str or None) – Output prefix for * expansion.

  • centroids (np.ndarray or None) – Centroid coordinates to export. When None, uses self.centroids_ (all centroids).

export_receptive_fields(y, neighbors, mask, nn_feat_std, coords, rf_dir=None, dist_report_path=None, dist_plot_path=None, y_all=None)

Export receptive field reports, distribution reports, and distribution plots. Reuses the existing ReceptiveFieldsReport, ReceptiveFieldsDistributionReport, and ReceptiveFieldsDistributionPlot classes.

The method is isolated from the training/prediction logic and only called when at least one output path is configured. Paths must already be resolved (no * expansion here).

Parameters:
  • y – Centroid labels (S,). Can be None.

  • neighbors – KNN indices (S, K).

  • mask – Validity mask (S, K).

  • nn_feat_std – Preprocessed NN features (S, nf_nn) float32.

  • coords – Centroid coordinates (S, 3).

  • rf_dir – Directory for RF point clouds.

  • dist_report_path – Path for distribution CSV.

  • dist_plot_path – Path for distribution plot.

  • y_all – Full centroid labels (S_full,) for per-neighbor label gathering when nn_point_wise_labels=True. Can be None.

on_training_finished(X, y, yhat=None)

See model.Model.on_training_finished().

predict(pcloud, X=None)

Override Model.predict() to handle internal octree + mining when mining_config is set.

Always extracts coordinates from the point cloud for the octree. Features from fnames are passed to data miners that need them (smooth, recount).

Parameters:
  • pcloud (PointCloud) – Input point cloud.

  • X (np.ndarray or None) – Pre-computed feature matrix (optional).

Returns:

Point-wise predicted class labels.

Return type:

np.ndarray

run_centroid_pipeline(X, coords=None, knn_handle=None)

Shared centroid-level prediction pipeline: RF outputs -> NN features -> preprocess -> KNN -> batched NN predict -> select predictions.

Used by both predict() (with mining) and predict_centroids() (without mining).

Parameters:
  • X (np.ndarray) – Centroid feature matrix (S, nf).

  • coords (np.ndarray or None) – Centroid coordinates (S, 3). When None, resolved from self.centroids_ or X.

  • knn_handle – Persistent KNN handle from build_knn_handle(). When the centroid count exceeds the chunked KNN threshold, KNN is computed per-batch to avoid materializing the full (S, K) neighbors array. If a handle is provided it is reused; if None one is built internally and released after use.

Returns:

(preds, proba, rf_proba, rf_ambiguity, nn_features_std, coords_f32, neighbors, knn_mask) — predictions + intermediate data needed by callers. When chunked KNN is used, neighbors and knn_mask are None.

Return type:

tuple

predict_centroids(X, zout=None)

Centroid-level prediction with the full TransfOctoRF pipeline (RF + NN). Predicts in batches to avoid GPU OOM on large point clouds.

Parameters:

X – Centroid feature matrix (S, n_f).

Returns:

Predicted class labels (S,).

static select_lowest_ambiguity(rf_proba, rf_ambiguity, nn_proba)

Select predictions from the source (RF or NN) with the lowest class ambiguity for each sample.

Class ambiguity is defined as \(1 - p_{\text{max}} + p_{\text{second}}\).

Parameters:
  • rf_proba (np.ndarray) – RF probabilities (S, n_c).

  • rf_ambiguity (np.ndarray) – RF class ambiguity (S,).

  • nn_proba (np.ndarray) – NN probabilities (S, n_c) or (S, 1).

Returns:

(proba, preds) — selected probabilities and predicted class labels.

Return type:

tuple[np.ndarray, np.ndarray]