src.utils.ctransf.directional_reclassifier

Classes

DirectionalReclassifier(**kwargs)

class src.utils.ctransf.directional_reclassifier.DirectionalReclassifier(**kwargs)
Author:

Alberto M. Esmoris Pena

Class to transform the classifications (or predictions) of a point cloud by labeling points as overhang or underhang with respect to a locally fitted plane. The plane is estimated by region-growing PCA seeded at every reclassifiable point reachable by a covering min-distance subsample.

The signed distance of a point \(\pmb{x} \in \mathbb{R}^{3}\) to the local plane (with centroid \(\pmb{\mu}\) and forward-aligned unit normal \(\pmb{n}\)) drives the label:

\[\begin{split}\begin{aligned} (\pmb{x} - \pmb{\mu})^{\intercal} \pmb{n} \geq +\epsilon &\;\Rightarrow\; \text{overhang} \\ (\pmb{x} - \pmb{\mu})^{\intercal} \pmb{n} \leq -\epsilon &\;\Rightarrow\; \text{underhang} \\ \text{otherwise} &\;\Rightarrow\; \text{unchanged} \end{aligned}\end{split}\]

See ClassTransformer.

Variables:
  • output_class_names (list of str) – Names of the output classes.

  • reclassifications (list of dict) –

    Per-reclassification specifications. Each entry is a dict with the following keys:

    source_classes

    List of input-class names. Only points belonging to one of these classes may be reclassified by the entry.

    source_is_prediction

    True to read source classes from the "prediction" attribute, False (default) to read from the "classification" attribute.

    forward_direction

    Three-component vector \(\pmb{u}\) used to break the sign ambiguity of the fitted normal. Internally normalized to unit length; rejected with norm \(\lVert \pmb{u} \rVert < 10^{-7}\).

    target_overhang_class

    Name of the output class to assign to overhang points. If None, no overhang reclassification is performed.

    target_underhang_class

    Name of the output class to assign to underhang points. If None, no underhang reclassification is performed.

    conditions

    Optional list of relational filters defining the geometric-computation set \(P\). Same convention as PointCloudFilter (value_name may be "classification", "prediction", or any feature name; condition_type is one of "equals", "not_equals", "less_than", "less_than_or_equal_to", "greater_than", "greater_than_or_equal_to", "in", or "not_in"; action is "preserve" or "discard"). When conditions is None, all points participate (\(P\) is the full cloud).

    variety_distance_tolerance

    Plane-distance tolerance \(\epsilon \geq 0\).

    eigenthreshold

    Eigenvalue threshold \(\tau \geq 0\). A region-growing iteration is accepted when \(\lambda_3(\widetilde{N}) \leq \tau^{2}\). Note that the initial cluster is always accepted regardless of \(\lambda_3\).

    init_radius

    Initial sphere radius \(r_0 > 0\) for the seed query.

    step_radius

    Region-growing radius \(r_{\Delta} > 0\) for each iteration.

    min_distance

    Optional min-distance decimation radius \(d_{*} \geq 0\). When > 0, the input cloud is first subsampled via MinDistanceSubsampler at this radius; plane fitting and distance checks run on the decimated representation; non-decimated points in the reclassification domain (i.e., source-class points in \(C\)) inherit their label and every enabled extra (signed distance, smallest eigenvalue, cluster index, cluster radius) from their closest decimated neighbor. When that closest decimated neighbor is itself untouched (e.g., it was outside \(C \cap P\) on the decimated cloud, or skipped by the small-cluster guard, or no seed grew into it), the non-decimated point inherits the sentinel value of every extra (0 for distance / eigenmin / cluster_radius, -1 for cluster_idx) even though it is in \(C \cap P\) on the full cloud. When 0 (the default) or missing or null, no decimation is applied and the algorithm runs directly on the input cloud.

    degree

    Order of the local surface model fit to the cluster. Defaults to 1. Supported values are 1 and 2. With degree = 1 (the default) the cluster’s PCA plane is the surface model and the deviation thresholded against variety_distance_tolerance is the signed plane distance \(d = (\pmb{x} - \pmb{\mu})^{\intercal} \pmb{n}\).

    With degree = 2 a least-squares quadric \(h(u, v) = \theta_0 + \theta_1 u + \theta_2 v + \theta_3 u^{2} + \theta_4 u v + \theta_5 v^{2}\) is fit in the local PCA frame \((\pmb{e}_{1}, \pmb{e}_{2}, \pmb{n})\) and each cluster runs an AIC model selection between the plane fit (3 parameters) and the quadric fit (6 parameters) on its own points. Concretely, with \(\mathrm{RSS}_{1} = \sum_{i} h_{i}^{2}\), \(\mathrm{RSS}_{2} = \mathrm{RSS}_{1} - \pmb{\theta}^{\intercal} \pmb{b}\) (OLS identity), and MLE-style variance estimators \(\hat{\sigma}_{k}^{\,2} = \mathrm{RSS}_{k} / m\), the cluster picks the plane fit when \(m \, \ln (\hat{\sigma}_{1}^{\,2} / \hat{\sigma}_{2}^{\,2}) < 6\). Otherwise it picks the quadric. Small clusters with \(m \leq 12\) default to the plane (avoids the \(m = 6\) quadric- interpolation regime).

    For clusters where AIC picks the plane, the thresholded deviation is the literal signed plane distance \(d = (\pmb{x} - \pmb{\mu})^{\intercal} \pmb{n}\), byte-equivalent to degree = 1. For clusters where AIC picks the quadric, the thresholded deviation is a safeguarded geometric-distance estimate from \(\pmb{x}\) to the fitted quadric surface (rather than the algebraic vertical residual along \(\pmb{n}\), which can over- or under-estimate the true Euclidean distance whenever the local surface gradient is non-trivial). The estimate is the smaller- in-magnitude of (i) the first-order tangent-plane distance \(d_{1} = r_{0} / \lVert \pmb{N}_{0} \rVert\) with \(r_{0} = h_{P} - h(u_{P}, v_{P})\) and \(\lVert \pmb{N}_{0} \rVert = \sqrt{1 + (h_{u}^{(0)})^{2} + (h_{v}^{(0)})^{2}}\), and (ii) one Newton step of the foot-of-perpendicular problem on the quadric (see the C++ class doxygen for the Hessian and Cramer-step derivation). The safeguard guarantees \(|d_{\text{out}}| \leq |d_{1}|\) and falls back to \(d_{1}\) when the Newton step is singular, non-finite, or has larger magnitude. This removes intrinsic curvature from the deviation, so true protrusions stand out on curved walls.

    degree = 2 is therefore a capability upper bound rather than a forced model choice: the user accepts up to a quadric’s expressive power, and AIC picks the cheapest-fitting model up to that bound for every cluster.

    Note: when degree = 2 the variety_distance feature column carries the threshold-target value (the literal signed plane distance for plane-chosen clusters; the safeguarded geometric-distance estimate for quadric- chosen clusters), not the raw quadric residual. The variety_distance_tolerance semantics \(|d| \geq \epsilon \;\Rightarrow\; \text{label}\) are unchanged across modes.

    With degree = 3 a local cubic polynomial is fit (10 coefficients in the local PCA frame) and a 3-way AICc compares plane / quadric / cubic per cluster. AICc adds the small-sample correction \(+ 2 p (p+1) / (m - p - 1)\) to AIC; it penalises the cubic’s \(p = 10\) parameters more strictly when the cluster is small. The same safeguarded geometric-distance machinery (option 1 + Newton + smaller-magnitude safeguard) applies, with cubic-specific gradient and Hessian. See the C++ class doxygen for the full math.

    < 1 raises at instantiation with "degree less than 1 does not make sense"; > 5 raises with "degree > 5 is not currently supported".

    trimmed_refit

    (bool, default False.) When True, the reclassifier performs a one-pass trimmed-OLS refit after the initial fit at each AIC/AICc compare site: cluster points whose first-pass residual exceeds trim_factor * variety_distance_tolerance are dropped from the running OLS sums (subtraction approach), the surface is re-solved on the kept subset, and the refit coefficients drive the AIC/AICc compare and the per-point distance loop. The trim is committed only if every sub-fit that the no-trim path actually scored at this compare site can be recomputed on the trimmed set (rank gate passes, arma::solve succeeds, trimmed RSS satisfies monotonicity). Otherwise the trim is aborted and the no-trim path runs end-to-end. Targets the bump-bias failure mode where strong bumps in a cluster pull the OLS fit toward themselves and produce wall-as-overhang false positives. See the C++ class doxygen for the full math + the path-aware semantics for degree = 3 (sub-cases 3A/3B/3C).

    trim_factor

    (float, default 1.0.) Strictly positive, finite multiplier on variety_distance_tolerance defining the trim threshold when trimmed_refit is True. 1.0 means “trim points whose first-pass residual exceeds the user’s overhang threshold” — the natural choice. Larger values keep more borderline points; smaller values are more aggressive. Non-positive, non-finite, non-numeric, or boolean values are rejected at instantiation.

  • nthreads (int) – Number of threads for parallel C++ execution. -1 means as many threads as available cores.

Per-reclassification output feature keys — every entry of reclassifications may include the following keys (all optional, all defaulting to False/canonical names). Each enabled column is a separate per-reclassification output, so multi-entry pipelines can produce distinct, non-overlapping columns:

variety_distance_as_feature (bool, default False).

When True, the signed distance to the AICc-selected local surface for this reclassification’s winning seed is exposed as an extra float32 column.

variety_distance_feature_name (str, default

"variety_distance"). Column name; must be a non-empty string ≤ 32 bytes (LAS extra-dim limit). Must be unique across every enabled output column of every reclassification — otherwise __init__ raises.

eigenmin_as_feature / eigenmin_feature_name

(default False / "eigenmin"): smallest covariance-eigenvalue \(\lambda_{3}\) of the winning seed’s cluster.

reclassification_cluster_as_feature /

reclassification_cluster_name (default False / "reclassification_cluster"): per-reclassification int32 cluster index, densified to a contiguous \([0, n-1]\) range within this reclassification. Untouched points carry the sentinel -1. There is no cross-reclassification stacking — each entry has its own column with its own id range.

reclassification_cluster_radius_as_feature /

reclassification_cluster_radius_name (default False / "reclassification_cluster_radius"): per-point \(r_0 + n_{\text{accepted}} \cdot r_{\Delta}\) of the winning seed.

fit_quality_as_feature / fit_quality_name

(default False / "fit_quality"): RMSE (root mean square error) of the AICc-selected polynomial fit’s residuals on the cluster. Same length units as variety_distance_tolerance.

Untouched points carry 0 for every float column and -1 for reclassification_cluster. Each enabled column’s name must be unique across all reclassifications; cross-reclassification name collisions raise at instantiation.

Note that point in \(C\) (source-class points) but outside \(P\) (filtered out by conditions) are silently left unchanged: they cannot seed a region and never appear in any neighborhood.

static extract_ctransf_args(spec)

Extract the arguments to initialize/instantiate a DirectionalReclassifier.

Parameters:

spec – The key-word specification containing the arguments.

Returns:

The arguments to initialize/instantiate a DirectionalReclassifier.

__init__(**kwargs)

Initialize/instantiate a DirectionalReclassifier.

Parameters:

kwargs – The attributes for the DirectionalReclassifier

transform(y, X=None, F=None, fnames=None, yclass=None, ypred=None, out_prefix=None)

The fundamental transformation logic defining the directional reclassifier.

Parameters:
  • y (np.ndarray) – The vector of classes (either reference classifications or predictions); this is the channel the DR rewrites and that the caller writes back to the cloud.

  • X (np.ndarray) – The structure space matrix representing the input point cloud whose classes must be transformed.

  • F (np.ndarray) – The feature space matrix representing the input point cloud whose classes must be transformed.

  • fnames (list of str) – The list with the name for each considered feature.

  • yclass (np.ndarray) – Reference-classification vector. Used by condition evaluation when value_name == "classification" so the lookup hits the actual classifications even when y holds the predictions (on_predictions=True). When None, defaults to y for backward compatibility.

  • ypred (np.ndarray) – Prediction vector. Used by condition evaluation when value_name == "prediction" and by source_is_prediction = true reclassifications. When None, the legacy F-lookup is attempted as a fallback (raises if predictions are needed but unavailable).

  • out_prefix – See class_transformer.ClassTransformer.transform()

Returns:

The transformed vector of classes.

Return type:

np.ndarray

transform_pcloud(pcloud, out_prefix=None)

See class_transformer.ClassTransformer.transform().

Parameters:
  • pcloud (PointCloud) – The point cloud whose classes/predictions must be transformed.

  • out_prefix (str) – Output prefix forwarded to the reports/plots.

Returns:

The point cloud with updated classes/predictions.

Return type:

PointCloud

apply_conditions(conditions, y, ypred, X, F, flut=None, yinlut=None)

Build a per-point boolean mask of the geometric-computation set \(P\) from the relational conditions of one reclassification entry. value_name may target the "classification" vector, the "prediction" vector, or any feature.

Parameters:
  • conditions (list of dict) – List of condition dicts (or None).

  • y (np.ndarray) – The classification vector.

  • ypred (np.ndarray) – The prediction vector (or None).

  • X (np.ndarray) – The structure space matrix.

  • F (np.ndarray) – The feature space matrix (or None).

  • flut (dict) – The feature look-up table.

  • yinlut (dict) – The input-class look-up table.

Returns:

The boolean per-point mask of points in \(P\).

Return type:

np.ndarray of bool

needs_classification_channel()

Whether any reclassification’s conditions reference the "classification" channel. Used by transform_pcloud to decide whether the actual classification vector must be loaded as an auxiliary channel (separate from the channel being transformed). Returns False when no condition references classifications.

Returns:

True if any condition uses value_name == "classification".

Return type:

bool

needs_prediction_channel()

Whether any reclassification needs the "prediction" channel — either via a condition referencing value_name == "prediction" or via source_is_prediction = true on the entry. Used by transform_pcloud to decide whether the prediction vector must be loaded as an auxiliary channel.

Returns:

True if any reclassification needs predictions.

Return type:

bool

determine_fnames()

Determine the names of the features involved in any condition across all reclassifications. Coordinates and the classification / prediction channels are excluded.

Returns:

The names of the features needed by the conditions.

Return type:

list of str