src.utils.curve.simple_coverage_refiner

Classes

SimpleCoverageRefiner([merge_radius, ...])

class src.utils.curve.simple_coverage_refiner.SimpleCoverageRefiner(merge_radius=None, z_tolerance=2.0, min_voxel_size=0.1, min_curve_length=None, min_segment_length=0.5, chain_radius_factor=1.0, nthreads=-1, snap_enable=False, snap_max_shift=None, snap_min_neighbors=1, snap_smoothing_iterations=2, snap_target_radius=None, snap_pull_radius=None, snap_densify_step=0.5, snap_truncate_iterations=2, hallucination_drop_enable=False, hallucination_drop_radius=None, hallucination_drop_threshold=0.5, hallucination_drop_densify_step=0.5, hallucination_min_supported_length=None, endpoint_extension_enable=False, endpoint_extension_radius=None, endpoint_extension_step=0.5, endpoint_extension_max_length=None, coord_center=None, x_orig_f64=None, kdtree_provider=None, sce=None)
Author:

Alberto M. Esmoris Pena

Coverage-refinement and post-optimisation cleanup helper for the curves produced by SimpleCurveExtractor. Wraps the H30/H31 snap-to-input shifts, the H32 hallucination-drop splitting, the H36 endpoint-extension walks, the iterative coverage-refinement extractor, and the fixed cleanup cascade that runs at the end of the SCE global-optimisation phase.

Instances are constructed once per consuming SCE pipeline phase from the private orchestrator helpers _step_coverage_refinement and _step_global_optimization (with a third single- use construction inside _concatenate_orphan_into_neighbour for the static densify_polyline_3d_arc helper). Each instance takes a snapshot of the relevant SCE attributes (24 fields) plus a callable hook to the SCE input-cloud KDTree cache and a back-reference to the SCE instance for cross-class invocations (skeleton extraction, topological pruning, geometry refinement, chain-merge, post-opt relabel / gap-split / orphan-merge – all of which remain on SCE for future refactors).

The three driver-style entry points are:

  1. refine_coverage() – iteratively detects uncovered regions of the input curve cloud and extracts supplementary curves through the SCE sub-pipeline.

  2. run_post_opt_cleanup() – executes the fixed H32..H37 cascade at the end of _step_global_optimization.

  3. snap_polylines_to_input() (and the H32 drop_hallucinated_features / H36 extend_polyline_endpoints passes invoked by the cleanup driver and the orchestrator’s snap loops).

Variables:
  • merge_radius (float or None) – Endpoint-merge radius forwarded to refine_coverage() and the post-opt cascade.

  • z_tolerance (float) – Z-tolerance threshold used by the post-opt Z-jump split (delegated to SimplePolylineSanitizer).

  • min_voxel_size (float) – Per-instance scale used to derive radii on point clouds at any resolution.

  • min_curve_length (float or None) – Lower bound for retained full-curve length, consumed by refine_coverage().

  • min_segment_length (float) – Lower bound for kept sub-curves emitted by the supplementary extractor inside refine_coverage().

  • chain_radius_factor (float) – Multiplier applied to skel_vs to derive the per-cluster chain radius inside refine_coverage().

  • nthreads (int) – Outer-OMP thread budget for the batched stabilize call (forwarded by SCE when invoking refine_coverage()).

  • snap_enable (bool) – H30 master switch for snap_polylines_to_input().

  • snap_max_shift (float) – 3D shift cap (m) per interior vertex.

  • snap_min_neighbors (int) – Minimum number of contributing missed points required to commit a per-vertex shift.

  • snap_smoothing_iterations (int) – Number of 3-point moving-average smoothing passes applied to the per-vertex shift sequence.

  • snap_target_radius (float) – Coverage radius used to flag MISSED input points (3D Euclidean).

  • snap_pull_radius (float) – Maximum pull distance from a missed input point to its candidate interior vertex (3D Euclidean).

  • snap_densify_step (float) – Densify step for the per-polyline 3D arc-length sampling driving the MISSED-input classification.

  • snap_truncate_iterations (int) – Number of snap-then-truncate cycles inside run_post_opt_cleanup().

  • hallucination_drop_enable (bool) – H32 master switch.

  • hallucination_drop_radius (float) – Support radius used by drop_hallucinated_features().

  • hallucination_drop_threshold (float) – Length-weighted per-feature hallucinated fraction above which the feature is split.

  • hallucination_drop_densify_step (float) – 3D arc-length step for the support sampling.

  • hallucination_min_supported_length (float) – Lower bound on supported sub-polyline 3D length; below this the sub-polyline is dropped.

  • endpoint_extension_enable (bool) – H36 master switch.

  • endpoint_extension_radius (float) – Support radius used by the per-step input-neighbour test.

  • endpoint_extension_step (float) – Walk step length (m).

  • endpoint_extension_max_length (float) – Hard cap on per-endpoint extension length (m).

  • _coord_center (np.ndarray or None) – Coordinate centring offset forwarded by SCE.

  • _X_orig_f64 (np.ndarray or None) – Float64 reservoir of the input curve points in the centred frame.

  • _kdtree_provider (callable or None) – Callable returning the lazily-built input-cloud KDTree (None triggers a per-call rebuild fallback).

  • _sce (SimpleCurveExtractor) – Back-reference to the SimpleCurveExtractor instance for cross-class invocations of methods that remain on SCE.

__init__(merge_radius=None, z_tolerance=2.0, min_voxel_size=0.1, min_curve_length=None, min_segment_length=0.5, chain_radius_factor=1.0, nthreads=-1, snap_enable=False, snap_max_shift=None, snap_min_neighbors=1, snap_smoothing_iterations=2, snap_target_radius=None, snap_pull_radius=None, snap_densify_step=0.5, snap_truncate_iterations=2, hallucination_drop_enable=False, hallucination_drop_radius=None, hallucination_drop_threshold=0.5, hallucination_drop_densify_step=0.5, hallucination_min_supported_length=None, endpoint_extension_enable=False, endpoint_extension_radius=None, endpoint_extension_step=0.5, endpoint_extension_max_length=None, coord_center=None, x_orig_f64=None, kdtree_provider=None, sce=None)

Snapshot the SCE configuration relevant to the coverage-refinement and post-optimisation cleanup cascade.

refine_coverage(smooth_curves, metadata, X_orig, skel_vs, spacing, zc_sigma)

Identify uncovered regions of the input curve point cloud and extract supplementary curves.

For each spatial cluster of input curve points that are far from any existing output curve, the full C++ sub-pipeline (stabilize, skeleton, prune, refine) is run to produce additional curves. This handles quarry path edges where only one side was captured by the initial extraction.

Uses only existing parameters: no new ones.

Parameters:
  • smooth_curves (list of dict) – Current output curves.

  • metadata (list of dict) – Per-curve metadata.

  • X_orig (np.ndarray) – Centered input curve points.

  • skel_vs (float) – Skeleton voxel size.

  • spacing (float) – PCHIP resampling spacing.

  • zc_sigma (float) – Z-consistency sigma.

Returns:

(smooth_curves, metadata) with supplementary curves appended.

Return type:

tuple

run_post_opt_cleanup(smooth_curves, metadata, X_orig)

Run the H32..H37 post-optimization cleanup cascade, called at the end of the SCE _step_global_optimization phase. The order is fixed and reproduces the original inline block verbatim:

  1. H32 (a) – drop hallucinated features.

  2. H36 – snap/truncate iteration loop: snap_truncate_iterations cycles of snap-to-input followed by truncate.

  3. H36 – extend polyline endpoints through any remaining supported input neighbourhood.

  4. H36 – post-extension truncation iterated to a fixed point via until_stable_count() (cap from the module-level _ENDPOINT_TRUNCATE_FIXED_POINT constant).

  5. H32 (b) – relabel Z-disjoint CIDs (delegated to SimpleCurveExtractor).

  6. H36 final – split endpoint-pair gaps (delegated to SimpleCurveExtractor).

  7. H37 – merge orphan fragments (delegated to SimpleCurveExtractor).

Extracted in iter4 (Phase 6, L-16). Not intended for use outside the SCE _step_global_optimization phase.

Parameters:
  • smooth_curves (list of dict) – Polylines (mutated by the inner cleanup chain).

  • metadata (list of dict) – Per-feature metadata dicts (mutated by the inner cleanup chain).

  • X_orig (np.ndarray) – Centred curve-class input points (Nx3); reused for snap, truncate, and endpoint-extension passes.

Returns:

(smooth_curves, metadata).

Return type:

tuple

snap_polylines_to_input(smooth_curves, metadata, X_orig)

H30 – shift interior polyline vertices toward the input curve-class points that are currently MISSED by every polyline (beyond snap_target_radius from any polyline).

Each missed point pulls its nearest interior vertex (within snap_pull_radius); per-vertex shifts are the average of those pull vectors, capped at snap_max_shift, smoothed by a 3-point moving average snap_smoothing_iterations times along the polyline, and reverted whenever the SI guard detects a new intra-polyline crossing. Endpoints are never shifted – their positions are load-bearing for the same-CID merge cascade and for T-junction classification upstream of the metrics pipeline.

Motivation: ~65 % of missed curve-class points at the 4 m threshold are “INTERIOR-adjacent”: a polyline passes through their neighbourhood but sits 4-8 m away. An early iteration of this pass targeted the full-input centroid, which lowered coverage by -0.51 pp because it pushed already-covered points on the other side of the polyline outside the 4 m window while only marginally reducing the distance to missed points on the near side. Targeting only MISSED points makes the shift monotone w.r.t. coverage: if the cap and guards are satisfied, the shift strictly improves coverage at the moved vertex (up to the ~0.2 % of shifts the SI guard reverts).

Pass ordering: after the Stage 7j merge cascade, before the H25 sanitize. The local SI guard (per-polyline, first-hit C++ check) reverts shifts that introduce intra-polyline crossings; the sanitizer handles any residual cases. Same- CID inter-feature crossings are not guarded here – the Stage 7g/quater resolve pass already addressed those earlier and H25 baseline has SI = 0.

Hyperparameter sweep result on the production cloud (inicorta.laz, 3 m / 2 m split metric radii at H31): baseline cov 95.24 % at 4 m / 4 m metric -> cov 99.26 % at 3 m, dev 0.74 % at 2 m, SI 0 (preserved), gaps 1 (preserved). The current defaults at the production fixture min_voxel_size = 0.1 (snap_max_shift = 3.5, snap_passes = 30, snap_post_passes = 20, snap_pull_radius = 40.0, snap_target_radius = 3.0) sit at or just beyond the saturation knee of the H31 sweep. Phase-9 SR-1 rewired snap_max_shift, snap_pull_radius and snap_target_radius to derive from min_voxel_size (coefficients \(35\), \(400\), \(30\) respectively) so the values above are reproduced automatically on the fixture and scale with the input voxel resolution on other clouds.

Parameters:
  • smooth_curves (list of dict) – Current polylines.

  • metadata (list of dict) – Per-polyline metadata.

  • X_orig (np.ndarray) – Centered curve-class input points (N x 3 in the same coordinate frame as sc['points']).

Returns:

Snapped (smooth_curves, metadata).

Return type:

tuple

drop_hallucinated_features(smooth_curves, metadata, X_orig)

H32 – split features whose hallucinated polyline length exceeds hallucination_drop_threshold of their total length into supported sub- polylines.

A densified vertex (3D linear interpolation along the polyline at hallucination_drop_densify_step spacing) is hallucinated when no input curve-class point lies within hallucination_drop_radius. The per-feature score is the length-weighted ratio (segment with both endpoints hallucinated counts fully; one endpoint hallucinated counts for half), matching the validation metric.

Features at or below the threshold are kept verbatim. Features above the threshold are split: each maximal contiguous run of supported densified vertices becomes a new feature, the first such run inheriting the original CURVE_ID and the rest receiving fresh CURVE_ID values so no phantom same-CID gap pair is created. Sub-polylines whose 2D length is below hallucination_min_supported_length are dropped. If every supported run on a feature is too short, the whole feature is dropped (legacy behaviour from the H32 first iteration).

Parameters:
  • smooth_curves (list of dict) – Current polylines.

  • metadata (list of dict) – Per-polyline metadata.

  • X_orig (np.ndarray) – Centered curve-class input points (N x 3 in the same coordinate frame as sc['points']).

Returns:

(smooth_curves, metadata) with hallucinated material removed.

Return type:

tuple

extend_polyline_endpoints(smooth_curves, metadata, X_orig)

H36 – extend each polyline endpoint outward along its tangent through supported regions.

The snap pass deliberately never moves endpoints because they are load-bearing for the H3 merge cascade and the T-junction classification. After the snap-truncate loop saturates, however, a thin layer of input curve-class points remains MISSED just past the polyline endpoints. This pass walks each endpoint outward in steps of endpoint_extension_step while:

  1. the next step still has at least one input curve-class point within endpoint_extension_radius (support guard), AND

  2. the new segment does not cross any segment of any other polyline already in the output (crossing guard).

The walk stops at the first violation or after endpoint_extension_max_length metres. By construction the extension can introduce neither hallucinated material (support guard) nor new self-/inter-feature crossings (crossing guard).

Parameters:
  • smooth_curves (list of dict) – Current polylines.

  • metadata (list of dict) – Per-polyline metadata.

  • X_orig (np.ndarray) – Centered curve-class input points (N x 3 in the same coordinate frame as sc['points']).

Returns:

(smooth_curves, metadata) with extensions appended/prepended.

Return type:

tuple

static densify_polyline_3d_arc(pts, step)

Densify an N x 3 polyline with linear 3D interpolation, spacing measured along the 3D arc. This matches the validation metric’s 3D densification: a steep z-jump segment that is short in XY but long in 3D gets enough intermediate samples for the per-vertex support test to flag the runaway, while a gently-sloping segment gets the same sampling rate as a flat one.

The arithmetic runs in float64 even when the input is float32 (the SCE default when structure_space_bits = 32); float32 input densification at the segment-length boundary differs from the metric’s float64 densification by up to ~1 ULP, which is enough to flip the hallucinated/supported classification on a borderline vertex (H36 diagnostic).

Parameters:
  • pts (np.ndarray) – (N, 3) polyline.

  • step (float) – 3D arc-length spacing (m).

Returns:

Densified (M, 3) polyline.

Return type:

np.ndarray

static walk_endpoint(endpoint, neighbour, step, max_len, input_tree, radius, feat_idx, crosses_any, register_segment)

Walk outward from endpoint along the tangent away from neighbour while each step has input support within radius AND does not cross any other polyline segment already registered in crosses_any.

After each accepted step the new segment is immediately registered via register_segment so that subsequent steps on the same walk can detect crossings with the in-flight extension.

Parameters:
  • endpoint (np.ndarray) – Tip of the polyline (3D).

  • neighbour (np.ndarray) – Penultimate vertex of the polyline (3D); together with endpoint it defines the outgoing tangent.

  • step (float) – Step length (m).

  • max_len (float) – Maximum walk length (m).

  • input_treeKDTree over input curve- class XY coordinates.

  • radius (float) – Support neighbourhood radius (m).

  • feat_idx (int) – Index of the polyline being extended; passed to crosses_any so its own original segments are excluded from the crossing test.

  • crosses_any – Callable (a, b, fi) -> bool returning True when segment a-b crosses any registered segment not owned by feature fi.

  • register_segment – Callable (fi, a, b) that adds segment a-b to the spatial index of feature fi so subsequent crossing checks see it.

Returns:

List of (x, y, z) extension points in walk order (first new vertex first).

Return type:

list of np.ndarray

static polyline_si_cell_size(pts)

Compute the SI-grid cell size for a single polyline using the same max(5 * median(seg_len), 1e-3) rule that polyline_has_si() (and the legacy per-polyline scan) uses. Lifted out of polyline_has_si() so the batched-SI fast path in snap_polylines_to_input() can compute all cell sizes up-front and feed a single pyvl3dpp.curve_self_intersection_batch_d call.

The arithmetic preserves the input dtype so the 5*median(seg_len) value stays bit-identical with the per-polyline call (a float32 input produces a cell size computed from float32 medians; a float64 input from float64 medians).

Parameters:

pts (np.ndarray) – Polyline points (N x ?, only the XY columns are used).

Returns:

Cell size as a float; falls back to 1e-3 when the polyline has no segments.

Return type:

float

static polyline_has_si(pts)

Return True when pts has at least one 2D self-crossing. Uses the same C++ first-hit scan as remove_self_ix on the SimplePolylineSanitizer collaborator, but stops at the boolean answer.

Parameters:

pts (np.ndarray) – Polyline points (N x 3).

Returns:

True if any crossing is found.

Return type:

bool

static until_stable_count(fn, smooth_curves, metadata, max_iter)

Iterate fn(smooth_curves, metadata) up to max_iter times, breaking when the curve count is unchanged across a single iteration.

Mirrors the original inline pattern verbatim:

  • The pre-iteration count is captured BEFORE calling fn (n_before = len(smooth_curves)).

  • fn is then invoked exactly once per iteration.

  • The break test compares the new len(smooth_curves) against n_before; if equal, the loop exits without running another fn call.

Extracted in iter4 (Phase 6, L-16) for use in run_post_opt_cleanup()’s post-extension truncation fixed point.

Parameters:
  • fn (callable) – Callable taking (smooth_curves, metadata) and returning the updated tuple.

  • smooth_curves (list of dict) – Curve dicts.

  • metadata (list of dict) – Per-segment metadata dicts.

  • max_iter (int) – Maximum number of iterations.

Returns:

(smooth_curves, metadata) after the loop.

Return type:

tuple