src.utils.ctransf.directional_reclassifier
Classes
|
- class src.utils.ctransf.directional_reclassifier.DirectionalReclassifier(**kwargs)
- Author:
Alberto M. Esmoris Pena
Class to transform the classifications (or predictions) of a point cloud by labeling points as overhang or underhang with respect to a locally fitted plane. The plane is estimated by region-growing PCA seeded at every reclassifiable point reachable by a covering min-distance subsample.
The signed distance of a point \(\pmb{x} \in \mathbb{R}^{3}\) to the local plane (with centroid \(\pmb{\mu}\) and forward-aligned unit normal \(\pmb{n}\)) drives the label:
\[\begin{split}\begin{aligned} (\pmb{x} - \pmb{\mu})^{\intercal} \pmb{n} \geq +\epsilon &\;\Rightarrow\; \text{overhang} \\ (\pmb{x} - \pmb{\mu})^{\intercal} \pmb{n} \leq -\epsilon &\;\Rightarrow\; \text{underhang} \\ \text{otherwise} &\;\Rightarrow\; \text{unchanged} \end{aligned}\end{split}\]See
ClassTransformer.- Variables:
output_class_names (list of str) – Names of the output classes.
reclassifications (list of dict) –
Per-reclassification specifications. Each entry is a dict with the following keys:
- –
source_classes List of input-class names. Only points belonging to one of these classes may be reclassified by the entry.
- –
source_is_prediction Trueto read source classes from the"prediction"attribute,False(default) to read from the"classification"attribute.- –
forward_direction Three-component vector \(\pmb{u}\) used to break the sign ambiguity of the fitted normal. Internally normalized to unit length; rejected with norm \(\lVert \pmb{u} \rVert < 10^{-7}\).
- –
target_overhang_class Name of the output class to assign to overhang points. If
None, no overhang reclassification is performed.- –
target_underhang_class Name of the output class to assign to underhang points. If
None, no underhang reclassification is performed.- –
conditions Optional list of relational filters defining the geometric-computation set \(P\). Same convention as
PointCloudFilter(value_namemay be"classification","prediction", or any feature name;condition_typeis one of"equals","not_equals","less_than","less_than_or_equal_to","greater_than","greater_than_or_equal_to","in", or"not_in";actionis"preserve"or"discard"). WhenconditionsisNone, all points participate (\(P\) is the full cloud).- –
variety_distance_tolerance Plane-distance tolerance \(\epsilon \geq 0\).
- –
eigenthreshold Eigenvalue threshold \(\tau \geq 0\). A region-growing iteration is accepted when \(\lambda_3(\widetilde{N}) \leq \tau^{2}\). Note that the initial cluster is always accepted regardless of \(\lambda_3\).
- –
init_radius Initial sphere radius \(r_0 > 0\) for the seed query.
- –
step_radius Region-growing radius \(r_{\Delta} > 0\) for each iteration.
- –
min_distance Optional min-distance decimation radius \(d_{*} \geq 0\). When
> 0, the input cloud is first subsampled viaMinDistanceSubsamplerat this radius; plane fitting and distance checks run on the decimated representation; non-decimated points in the reclassification domain (i.e., source-class points in \(C\)) inherit their label and every enabled extra (signed distance, smallest eigenvalue, cluster index, cluster radius) from their closest decimated neighbor. When that closest decimated neighbor is itself untouched (e.g., it was outside \(C \cap P\) on the decimated cloud, or skipped by the small-cluster guard, or no seed grew into it), the non-decimated point inherits the sentinel value of every extra (0for distance / eigenmin / cluster_radius,-1for cluster_idx) even though it is in \(C \cap P\) on the full cloud. When0(the default) or missing ornull, no decimation is applied and the algorithm runs directly on the input cloud.- –
degree Order of the local surface model fit to the cluster. Defaults to
1. Supported values are1and2. Withdegree = 1(the default) the cluster’s PCA plane is the surface model and the deviation thresholded againstvariety_distance_toleranceis the signed plane distance \(d = (\pmb{x} - \pmb{\mu})^{\intercal} \pmb{n}\).With
degree = 2a least-squares quadric \(h(u, v) = \theta_0 + \theta_1 u + \theta_2 v + \theta_3 u^{2} + \theta_4 u v + \theta_5 v^{2}\) is fit in the local PCA frame \((\pmb{e}_{1}, \pmb{e}_{2}, \pmb{n})\) and each cluster runs an AIC model selection between the plane fit (3 parameters) and the quadric fit (6 parameters) on its own points. Concretely, with \(\mathrm{RSS}_{1} = \sum_{i} h_{i}^{2}\), \(\mathrm{RSS}_{2} = \mathrm{RSS}_{1} - \pmb{\theta}^{\intercal} \pmb{b}\) (OLS identity), and MLE-style variance estimators \(\hat{\sigma}_{k}^{\,2} = \mathrm{RSS}_{k} / m\), the cluster picks the plane fit when \(m \, \ln (\hat{\sigma}_{1}^{\,2} / \hat{\sigma}_{2}^{\,2}) < 6\). Otherwise it picks the quadric. Small clusters with \(m \leq 12\) default to the plane (avoids the \(m = 6\) quadric- interpolation regime).For clusters where AIC picks the plane, the thresholded deviation is the literal signed plane distance \(d = (\pmb{x} - \pmb{\mu})^{\intercal} \pmb{n}\), byte-equivalent to
degree = 1. For clusters where AIC picks the quadric, the thresholded deviation is a safeguarded geometric-distance estimate from \(\pmb{x}\) to the fitted quadric surface (rather than the algebraic vertical residual along \(\pmb{n}\), which can over- or under-estimate the true Euclidean distance whenever the local surface gradient is non-trivial). The estimate is the smaller- in-magnitude of (i) the first-order tangent-plane distance \(d_{1} = r_{0} / \lVert \pmb{N}_{0} \rVert\) with \(r_{0} = h_{P} - h(u_{P}, v_{P})\) and \(\lVert \pmb{N}_{0} \rVert = \sqrt{1 + (h_{u}^{(0)})^{2} + (h_{v}^{(0)})^{2}}\), and (ii) one Newton step of the foot-of-perpendicular problem on the quadric (see the C++ class doxygen for the Hessian and Cramer-step derivation). The safeguard guarantees \(|d_{\text{out}}| \leq |d_{1}|\) and falls back to \(d_{1}\) when the Newton step is singular, non-finite, or has larger magnitude. This removes intrinsic curvature from the deviation, so true protrusions stand out on curved walls.degree = 2is therefore a capability upper bound rather than a forced model choice: the user accepts up to a quadric’s expressive power, and AIC picks the cheapest-fitting model up to that bound for every cluster.Note: when
degree = 2thevariety_distancefeature column carries the threshold-target value (the literal signed plane distance for plane-chosen clusters; the safeguarded geometric-distance estimate for quadric- chosen clusters), not the raw quadric residual. Thevariety_distance_tolerancesemantics \(|d| \geq \epsilon \;\Rightarrow\; \text{label}\) are unchanged across modes.With
degree = 3a local cubic polynomial is fit (10 coefficients in the local PCA frame) and a 3-way AICc compares plane / quadric / cubic per cluster. AICc adds the small-sample correction \(+ 2 p (p+1) / (m - p - 1)\) to AIC; it penalises the cubic’s \(p = 10\) parameters more strictly when the cluster is small. The same safeguarded geometric-distance machinery (option 1 + Newton + smaller-magnitude safeguard) applies, with cubic-specific gradient and Hessian. See the C++ class doxygen for the full math.< 1raises at instantiation with"degree less than 1 does not make sense";> 5raises with"degree > 5 is not currently supported".
–
trimmed_refit(bool, default
False.) WhenTrue, the reclassifier performs a one-pass trimmed-OLS refit after the initial fit at each AIC/AICc compare site: cluster points whose first-pass residual exceedstrim_factor * variety_distance_toleranceare dropped from the running OLS sums (subtraction approach), the surface is re-solved on the kept subset, and the refit coefficients drive the AIC/AICc compare and the per-point distance loop. The trim is committed only if every sub-fit that the no-trim path actually scored at this compare site can be recomputed on the trimmed set (rank gate passes,arma::solvesucceeds, trimmed RSS satisfies monotonicity). Otherwise the trim is aborted and the no-trim path runs end-to-end. Targets the bump-bias failure mode where strong bumps in a cluster pull the OLS fit toward themselves and produce wall-as-overhang false positives. See the C++ class doxygen for the full math + the path-aware semantics fordegree = 3(sub-cases 3A/3B/3C).–
trim_factor(float, default
1.0.) Strictly positive, finite multiplier onvariety_distance_tolerancedefining the trim threshold whentrimmed_refitisTrue.1.0means “trim points whose first-pass residual exceeds the user’s overhang threshold” — the natural choice. Larger values keep more borderline points; smaller values are more aggressive. Non-positive, non-finite, non-numeric, or boolean values are rejected at instantiation.- –
nthreads (int) – Number of threads for parallel C++ execution.
-1means as many threads as available cores.
Per-reclassification output feature keys — every entry of
reclassificationsmay include the following keys (all optional, all defaulting toFalse/canonical names). Each enabled column is a separate per-reclassification output, so multi-entry pipelines can produce distinct, non-overlapping columns:- –
variety_distance_as_feature(bool, defaultFalse). When
True, the signed distance to the AICc-selected local surface for this reclassification’s winning seed is exposed as an extrafloat32column.- –
variety_distance_feature_name(str, default "variety_distance"). Column name; must be a non-empty string ≤ 32 bytes (LAS extra-dim limit). Must be unique across every enabled output column of every reclassification — otherwise__init__raises.- –
eigenmin_as_feature/eigenmin_feature_name (default
False/"eigenmin"): smallest covariance-eigenvalue \(\lambda_{3}\) of the winning seed’s cluster.- –
reclassification_cluster_as_feature/ reclassification_cluster_name(defaultFalse/"reclassification_cluster"): per-reclassificationint32cluster index, densified to a contiguous \([0, n-1]\) range within this reclassification. Untouched points carry the sentinel-1. There is no cross-reclassification stacking — each entry has its own column with its own id range.- –
reclassification_cluster_radius_as_feature/ reclassification_cluster_radius_name(defaultFalse/"reclassification_cluster_radius"): per-point \(r_0 + n_{\text{accepted}} \cdot r_{\Delta}\) of the winning seed.- –
fit_quality_as_feature/fit_quality_name (default
False/"fit_quality"): RMSE (root mean square error) of the AICc-selected polynomial fit’s residuals on the cluster. Same length units asvariety_distance_tolerance.
Untouched points carry
0for every float column and-1forreclassification_cluster. Each enabled column’s name must be unique across all reclassifications; cross-reclassification name collisions raise at instantiation.Note that point in \(C\) (source-class points) but outside \(P\) (filtered out by
conditions) are silently left unchanged: they cannot seed a region and never appear in any neighborhood.- static extract_ctransf_args(spec)
Extract the arguments to initialize/instantiate a DirectionalReclassifier.
- Parameters:
spec – The key-word specification containing the arguments.
- Returns:
The arguments to initialize/instantiate a DirectionalReclassifier.
- __init__(**kwargs)
Initialize/instantiate a DirectionalReclassifier.
- Parameters:
kwargs – The attributes for the DirectionalReclassifier
- transform(y, X=None, F=None, fnames=None, yclass=None, ypred=None, out_prefix=None)
The fundamental transformation logic defining the directional reclassifier.
- Parameters:
y (
np.ndarray) – The vector of classes (either reference classifications or predictions); this is the channel the DR rewrites and that the caller writes back to the cloud.X (
np.ndarray) – The structure space matrix representing the input point cloud whose classes must be transformed.F (
np.ndarray) – The feature space matrix representing the input point cloud whose classes must be transformed.fnames (list of str) – The list with the name for each considered feature.
yclass (
np.ndarray) – Reference-classification vector. Used by condition evaluation whenvalue_name == "classification"so the lookup hits the actual classifications even whenyholds the predictions (on_predictions=True). WhenNone, defaults toyfor backward compatibility.ypred (
np.ndarray) – Prediction vector. Used by condition evaluation whenvalue_name == "prediction"and bysource_is_prediction = truereclassifications. WhenNone, the legacy F-lookup is attempted as a fallback (raises if predictions are needed but unavailable).out_prefix – See
class_transformer.ClassTransformer.transform()
- Returns:
The transformed vector of classes.
- Return type:
np.ndarray
- transform_pcloud(pcloud, out_prefix=None)
See
class_transformer.ClassTransformer.transform().- Parameters:
pcloud (
PointCloud) – The point cloud whose classes/predictions must be transformed.out_prefix (str) – Output prefix forwarded to the reports/plots.
- Returns:
The point cloud with updated classes/predictions.
- Return type:
- apply_conditions(conditions, y, ypred, X, F, flut=None, yinlut=None)
Build a per-point boolean mask of the geometric-computation set \(P\) from the relational
conditionsof one reclassification entry.value_namemay target the"classification"vector, the"prediction"vector, or any feature.- Parameters:
conditions (list of dict) – List of condition dicts (or
None).y (
np.ndarray) – The classification vector.ypred (
np.ndarray) – The prediction vector (orNone).X (
np.ndarray) – The structure space matrix.F (
np.ndarray) – The feature space matrix (orNone).flut (dict) – The feature look-up table.
yinlut (dict) – The input-class look-up table.
- Returns:
The boolean per-point mask of points in \(P\).
- Return type:
np.ndarrayof bool
- needs_classification_channel()
Whether any reclassification’s conditions reference the
"classification"channel. Used bytransform_pcloudto decide whether the actual classification vector must be loaded as an auxiliary channel (separate from the channel being transformed). ReturnsFalsewhen no condition references classifications.- Returns:
Trueif any condition usesvalue_name == "classification".- Return type:
bool
- needs_prediction_channel()
Whether any reclassification needs the
"prediction"channel — either via a condition referencingvalue_name == "prediction"or viasource_is_prediction = trueon the entry. Used bytransform_pcloudto decide whether the prediction vector must be loaded as an auxiliary channel.- Returns:
Trueif any reclassification needs predictions.- Return type:
bool
- determine_fnames()
Determine the names of the features involved in any condition across all reclassifications. Coordinates and the
classification/predictionchannels are excluded.- Returns:
The names of the features needed by the conditions.
- Return type:
list of str