Data mining

Data miners are components that receive an input point cloud and extract features characterizing it, typically in a point-wise fashion. Data miners (Miner) can be included inside pipelines to generate features that can later be used to train a machine learning model to perform a classification or regression task on the points.

Geometric features miner

The GeomFeatsMiner uses Jakteristics as backend to compute point-wise geometric features. The point-wise features are computed considering spherical neighborhoods of a given radius. The JSON below shows how to define a GeomFeatsMiner.

{
    "miner": "GeometricFeatures",
    "in_pcloud": null,
    "out_pcloud": null,
    "radius": 0.3,
    "fnames": ["linearity", "planarity", "surface_variation", "verticality", "anisotropy"],
    "frenames": ["linearity_r0_3", "planarity_r0_3", "surface_variation_r0_3", "verticality_r0_3", "anisotropy_r0_3"],
    "nthreads": -1
}

The JSON above defines a GeomFeatsMiner that computes the linearity, planarity, surface variation, verticality, and anisotropy geometric features considering a \(30\,\mathrm{cm}\) radius for the spherical neighborhood. The computed features will be named from the feature names and the neighborhood radius. Parallel regions will be computed using all available threads.

Arguments

– in_pcloud: When the data miner is used outside a pipeline, this argument can be used to specify which point cloud must be loaded to compute the geometric features on it. In pipelines, the input point cloud is considered to be the point cloud at the current pipeline’s state.
– out_pcloud: When the data miner is used outside a pipeline, this argument can be used to specify where to write the output point cloud with the computed geometric features. Otherwise, it is better to use a Writer to export the point cloud after the data mining.
– radius: The radius for the spherical neighborhood.
– fnames: The list with the names of the features that must be computed. Supported features are: ["eigenvalue_sum", "omnivariance", "eigenentropy", "anisotropy", "planarity", "linearity", "PCA1", "PCA2", "surface_variation", "sphericity", "verticality"]
– frenames: The list of names for the generated features. If it is not given, then the generated features will be automatically named.
– nthreads: How many threads use to compute parallel regions. The value -1 means as many threads as supported in parallel (typically including virtual cores).

Output

The figure below represents the planarity and verticality features mined for a spherical neighborhood with \(30\,\mathrm{cm}\) radius. The point cloud for this example corresponds to the Paris point cloud from the Paris-Lille-3D dataset.

Figure representing some point-wise geometric features. — Visualization of the planarity (left) and verticality (right) computed in the Paris point cloud from the Paris-Lille-3D dataset using spherical negibhorhoods with \(30\,\mathrm{cm}\) radius.

Geometric features miner ++

The GeomFeatsMinerPP represents an improved version with better workload balancing and more geometric features compared to the GeomFeatsMiner. The main improvements are a flexible neighborhood definition that includes spheres, cylinders, rectangular prisms, and k-nearest neighbors (knn); the possibility to choose between singular value decomposition SVD and principal component analysis (PCA) as the factorization strategy for the structure space matrix representing the neighborhood; a Taubin method for improved quadric fitting, and the inclusion of roughness and second order geometric features like the mean and Gaussian curvatures. The JSON below shows how to define a GeomFeatsMinerPP.

{
    "miner": "GeometricFeaturesPP",
    "first_order_strategy": "pca",
    "second_order_strategy": "taubin_bias",
    "idw_p": 2,
    "idw_eps": 11,
    "foc_K": 0,
    "foc_sigma": 3.14159265358979323846264338327950288419716939937510,
    "neighborhood": {
      "type": "sphere",
      "radius": 33,
      "k": 1024,
      "lower_bound": 0,
      "upper_bound": 0
    },
    "fnames": [
        "linearity", "planarity", "sphericity", "surface_variation", "omnivariance", "anisotropy",
        "eigenentropy", "normalized_eigenentropy", "eigenvalue_sum", "roughness",
        "esval3", "full_eigen_index", "gauss_curv_full", "mean_curv_full", "full_quad_dev",
        "full_abs_algdist", "full_sq_algdist", "full_laplacian", "full_mean_qdev", "full_gradient_norm",
        "full_eigensum", "full_eigenentropy", "full_normalized_eigenentropy", "full_omnivariance",
        "full_hypersphericity", "full_hyperanisotropy", "full_spectral", "full_frobenius", "full_schatten",
        "full_linear_norm", "full_cross_norm", "full_square_norm", "full_abs_bias",
        "full_maxcurv", "full_mincurv", "full_maxabscurv", "full_minabscurv",
        "full_umbilic_dev", "full_rmsc", "full_umbilicality", "full_gauss_umbilicality", "full_shape_index",
        "full_min_eigenval", "saddleness", "minpg", "maxpg", "bipgnorm",
        "normalized_minpg", "normalized_maxpg", "normalized_bipgnorm",
        "gradient_axis_curv", "normalized_gac", "full_normalized_gac"
      ],
    "frenames": [
        "linear_r33", "planar_r33", "spheric_r33", "surfvar_r33", "omnivar_r33", "anisotropy_r33",
        "eigentropy_r33", "eigsum_r33", "roughness_r33", "esval3_r33", "eigidx_r33",
        "full_gauss_r33", "full_mean_r33", "full_qdev_r33", "abs_algdist_r33", "sq_algdist_r33",
        "laplacian_r33", "full_meanqdev_r33", "full_gradnorm_r33",
        "full_eigsum_r33", "full_eigentropy_r33", "full_omnivar_r33", "hyperspher_r33", "hyperanisotropy_r33",
        "spectral_r33", "frobenius_r33", "schatten_r33",
        "linear_norm_r33", "cross_norm_r33", "square_norm_r33", "full_abs_bias_r33",
        "maxcurv_r33", "mincurv_r33", "maxabscurv_r33", "minabscurv_r33",
        "umbilic_dev_r33", "rmsc_r33", "umbilicality_r33", "gumbilicality_r33", "shape_index_r33",
        "full_min_eigval_r33", "saddleness_r33", "minpg_r33", "maxpg_r33", "bipgnorm_r33",
        "normalized_minpg_r33", "normalized_maxpg_r33", "normalized_bipgnorm_r33",
        "gac_r33", "normalized_gac_r33", "fnormed_gc_r33"
    ],
    "non_degenerate_eigenthreshold": 0.00001,
    "tikhonov_parameter": 0.0000001,
    "nthreads": -1
}

The JSON above defines a GeomFeatsMinerPP that computes the linearity, planarity, sphericity, surface variation, omnivariance, anisotropy, eigenentropy, sum of eigenvalues, roughness, gaussian curvature, mean curvature, quadratic deviation, Laplacian, shape index, Gaussian umbilicality, and many more on a spherical neighborhood with \(33\,\mathrm{mm}\) radius. The computed features will be named from the feature names and the neighborhood radius. The computations will be carried in parallel using all the available cores. The eigenthreshold to detect degenerate cases for quadric fitting is set to \(10^{-5}\), the parameter for the Tikhonov regularization is set to \(10^{-7}\). The first order geometric descriptors are computed with PCA, the second order geometric descriptors use a biased quadric model fit with Taubin method and apply Inverse Distance Weighting (IDW) with \(p=2\) and \(\epsilon=11\). There is no Fibonacci orthodromic correction of the IDW weights because "foc_K": 0.

Arguments

– first_order_strategy

The strategy for the linear fit. Whether "svd" to consider the singular values and singular vectors from \(X = \pmb{U}\pmb{\Sigma}\pmb{V}^{\intercal} \in \mathbb{R}^{m \times 3}\) for a 3D neighborhood of m points, where the columns of \(\pmb{V} \in \mathbb{R}^{3 \times 3}\) are the singular vectors and \(\operatorname{diag}\left(\pmb{\Sigma}\right) \in \mathbb{R}^{3}\) gives the singular values; or "pca" to consider the eigenvalues and eigenvectors of \(\pmb{X}^{\intercal}\pmb{X}\). For "svd" the matrix \(\pmb{X}\) is centered and for "pca" it is centered and divided by \(m-1\) to be a covariance matrix. The recommended approach is "pca".

– second_order_strategy

The strategy for the quadric fit. It can be either "svd", "pca", "taubin" to fit an unbiased quadric \(g(x, y, z) = w_1x + w_2y + w_3z + w_4xy + w_5xz + w_6yz + w_7x^2 + w_8y^2 + w_9z^2\) or "svd_bias", "pca_bias", "taubin_bias" to fit a biased quadric \(g(x, y, z) = w_1x + w_2y + w_3z + w_4xy + w_5xz + w_6yz + w_7x^2 + w_8y^2 + w_9z^2 + w_{10}\) . The recommended approach is "taubin_bias", yet faster approaches like "pca" or "pca_bias" might be preferred if only second order geometric descriptors based on quadric eigenvalues will be considered. For the computation of second order geometric descriptors involving curvatures the Taubin method provides a more stable solution incorporating a second matrix obtained by summing the outer products of the partial derivatives of the polynomial coordinates.

– idw_p

The \(p\) parameter governing the Inverse Distance Weighting (IDW) strategy to weight the contribution of each point in the neighborhood to the quadric fit. If \(p=0\) no IDW will be applied at all.

– idw_eps

The \(\epsilon\) parameter governing the Inverse Distance Weighting (IDW). It represents the min accepted value so distances below it will be automatically replaced.

– foc_K

The number of points in the spherical Fibonacci support. The greater the better but it will lead to higher execution times (i.e., it increases the computational cost).

– foc_sigma

The hard cut threshold for the Fibonacci orthodromic correction between a point in a centered neighborhood \(\pmb{x}_{j*} \in \mathbb{R}^{3}\) and a point from the Fibonacci support \(\pmb{q}_{k*} \in \mathbb{R}^{3}\).

\[\omega(\pmb{x}_{j*}, \pmb{q}_{k*}) = \max \left\{ 0, \sigma - \arccos\left(\dfrac{ \langle\pmb{x}_{j*}, \pmb{q}_{k*}\rangle }{ \lVert\pmb{x}_{j*}\rVert } \right) \right\}\]

See the C++ documentation for further mathematical details.

– non_degenerate_eigenthreshold

The threshold to detect degenerated cases for the quadric fit depending on the min eigenvalue of the linear fit. When the min linear eigenvalue is below this threshold, the case is considered as degenerate. Recommended values for 32 bits decimal arithmetic are \(10^{-5}\) for the raw case (i.e., without smoothing and inverse distance weighting) and alternatively \(10^{-6}\) for cases with smoothing and inverse distance weighting. Note that degenerate cases are understood as planes with zero curvature when computing second order geometric descriptors.

– tikhonov_parameter

Parameter governing the Tikhonov regularization. It is typically necessary on real data to assure the Cholesky decomposition (and any other similar operation) is numerically feasible. Recommended value for 32 bits decimal arithmetic is \(10^{-7}\).

– neighborhood

The neighborhood definition.

– type: Supported neighborhood types are "cylinder", "bounded_cylinder", "sphere", "rectangular2d", "rectangular3d", "knn2d", "knn", "boundedknn2d", and "boundedknn".
– radius: For neighborhoods with a single radius (e.g., spherical neighborhoods), it gives the radius. For neighborhoods with many radii (e.g., rectangular neighborhoods) it gives the first radius (typically along \(\pmb{e_1}\), i.e., \(x\)-axis). For bounded knn neighborhoods the radius also gives the radial bound, i.e., the max distance to consider a neighbor as near.
– k: The number of neighbors for knn neighborhoods.
– lower_bound: The lower bound along the vertical axis of a cylindrical neighborhood or a rectangular 2D neighborhood. Also, the length along \(\pmb{e_2}\), i.e., \(y\)-axis, for 3D rectangular neighborhoods.
– upper_bound: The upper bound along the vertical axis of a cylindrical neighborhood or a rectangular 3D neighborhood. Also, the length along \(\pmb{e_3}\), i.e., \(z\)-axis, for 3D rectangular neighborhoods.

– fnames

The list with the names of the features that must be computed. Supported features are: ["linearity", "planarity", "sphericity", "surface_variation", "roughness", "verticality", "altverticality", "sqverticality", "horizontality", "sqhorizontality", "eigenvalue_sum", "omnivariance", "eigenentropy", "anisotropy", "PCA1", "PCA2", "number_of_neighbors", "fom", "nx", "ny", "nz", "esval1", "esval2", "esval3", "gauss_curv_full", "mean_curv_full", "full_quad_dev", "full_abs_algdist", "full_sq_algdist", "full_laplacian", "full_mean_qdev", "full_gradient_norm", "full_eigensum", "full_eigenentropy", "full_omnivariance", "full_hypersphericity", "full_hyperanisotropy", "full_spectral", "full_frobenius", "full_schatten", "full_coeff_norm", "full_linear_norm", "full_cross_norm", "full_square_norm", "full_deg2_norm", "full_nlinear_norm", "full_ncross_norm", "full_nsquare_norm", "full_ndeg2_norm", "full_bias_term", "full_abs_bias", "full_maxcurv", "full_mincurv", "full_maxabscurv", "full_minabscurv", "full_umbilic_dev", "full_rmsc", "full_umbicality", "full_gauss_umbilicality", "full_shape_index", "full_eigen_index", "full_min_eigenval", "full_max_eigenval", "minpg", "maxpg", "avg", "normalized_minpg", "normalized_maxpg", "normalized_avg", "saddleness", "absolute_hg", "hgnorm", "normalized_hgn", "bipgnorm", "normalized_bipgnorm", "gradient_axis_curv", "normalized_gac", "full_normalized_gac", "hessian_frobenius", "hessian_absdet", "full_abs_quadraticity"]

– frenames

The list of names for the generated features. If it is not given, then the generated features will be automatically named.

– nthreads

How many threads use to compute parallel regions. The value -1 means as many threads as supported in parallel (typically including virtual cores).

Output

The figure below represents the references (blue for vessel and red for aneurysm), and some second order geometric descriptors like the linear norm, shape index, and Gaussian umbilicality. The point cloud for this example was generated by merging four different aneurysms from the IntrA dataset which can be downloaded from Google drive. Note that the point clouds were smoothed using a SimpleStructureSmotherPP (see Simple structure smoother ++ documentation ) with IDW strategy using \(p=1\), \(\epsilon=3\), and a spherical neighborhood with a radius of \(11\;\mathrm{mm}\).

Figure representing the references together with some second order geometric descriptors. — Visualization of four aneurysms. Vessel points are represented in blue, aneurysms in red. The color maps represent the distribution of the linear norm, the shape index, and the Gaussian umbilicality.

Height features miner

The HeightFeatsMiner supports the computation of height-based features. These features assume that the \(z\) axis corresponds to the vertical axis and derive features depending on the \(z\) values of many neighborhoods. The neighborhoods are centered on support points. Finally, each point in the point cloud will take the features from the closest support point. The JSON below shows how to define a HeightFeatsMiner:

{
    "miner": "HeightFeatures",
    "support_chunk_size": 50,
    "support_subchunk_size": 10,
    "pwise_chunk_size": 1000,
    "nthreads": 12,
    "neighborhood": {
        "type": "Rectangular2D",
        "radius": 50.0,
        "separation_factor": 0.35
    },
    "outlier_filter": null,
    "fnames": ["floor_distance", "ceil_distance"]
}

The JSON above defines a HeightFeatsMiner that computes the distance to the floor (lowest point) and to the ceil (highest point). It considers a rectangular neighborhood for the support points with side length \(50 \times 2 = 100\) meters. Not outlier filter is applied.

Arguments

– support_chunk_size: The number of support points per chunk for parallel computations.
– support_subchunk_size: The number of simultaneous neighborhoods considered when computing a chunk. It can be used to prevent memory exhaustion scenarios.
– pwise_chunk_size: The number of points per chunk when computing the height features for the points in the point cloud (not the support points).
– nthreads: How many threads must be used for parallel computations (-1 means as many threads as available cores).
– neighborhood: The neighborhood definition. The type can be either "Rectangular2D" (the radius describes half of the side) or "Cylinder" (the radius describes the disk of the cylinder). The separation factor governs the separation of the support points considering the radius. Note that if separation factor is set to zero, then the height features will be computed on a point-wise fashion. See GridSubsamplingPreProcessor for more details.
– outlier_filter: The strategy to filter outlier points (it can be None). Supported strategies are "IQR" and "stdev". The "IQR" strategy considers the interquartile range and discards any height value outside \([Q_1-3\mathrm{IQR}/2, Q_3+3\mathrm{IQR}/2]\). The "stdev" strategy discards any height value outside \([\mu - 3\sigma, \mu + 3\sigma]\) where \(\mu\) is the mean and \(\sigma\) is the standard deviation.
– fnames: The name of the height features that must be computed. Supported height features are: ["floor_coordinate", "floor_distance", "ceil_coordinate", "ceil_distance", "height_range", "mean_height", "median_height", "height_quartiles", "height_deciles", "height_variance", "height_stdev", "height_skewness", "height_kurtosis"]

Output

The figure below represents the floor distance mined for a spherical Rectangular2D neighborhood with \(50\) meters radius. The point cloud from this example corresponds to the March2018 validation point cloud from the Hessigheim dataset.

Figure representing the floor distance height feature. — Visualization of the floor distance height feature computed for the Hessigheim March2018 validation point cloud using using a Rectangular2D neighborhood with \(50\,\mathrm{m}\) radius.

Height features miner ++

The HeightFeatsMinerPP represents an improved version with better computational efficiency than the HeightFeatsMiner. Using this component inside pipelines can be done with the JSON specification described for the Height features miner. The only difference is that "miner" argument must be set to "HeightFeaturesPP" instead of "HeightFeatures".

HSV from RGB miner

The HSVFromRGBMiner can be used when red, green, and blue color channels are available for the points in the point cloud. It will generate the corresponding hue (H), saturation (S), and value (V) components derived from the available RGB information. The JSON below shows how to define a HSVFromRGBMiner:

{
    "miner": "HSVFromRGB",
    "hue_unit": "radians",
    "frenames": ["HSV_Hrad", "HSV_S", "HSV_V"]
}

The JSON above defines a HSVFromRGBMiner that computes the HSV representation of the original RGB color components.

Arguments

– hue_unit: The unit for the hue (H) component. It can be either "radians" or "degrees".
– frenames: The name for the output features. If not given, they will be ["HSV_H", "HSV_S", "HSV_V"] by default.

Output

The figure below represents the saturation (S) computed for the March2018 validation point cloud from the Hessigheim dataset.

Figure representing the saturation (S). — Figure representing the saturation (S) in the March2018 validation point cloud of the Hessigheim dataset.

Smooth features miner

The SmoothFeatsMiner can be used to derive smooth features from already available features. The mean, weighted mean, and Guassian Radial Basis Function (RBF) strategies can be used for this purpose. The JSON below shows how to define a SmoothFeatsMiner:

{
    "miner": "SmoothFeatures",
    "nan_policy": "propagate",
    "chunk_size": 1000000,
    "subchunk_size": 1000,
    "neighborhood": {
        "type": "sphere",
        "radius": 0.25
    },
    "input_fnames": ["Reflectance", "HSV_Hrad", "HSV_S", "HSV_V"],
    "fnames": ["mean"],
    "nthreads": 12
}

The JSON above defines a SmoothFeatsMiner that computes the smooth reflectance, and HSV components considering a spherical neighborhood with \(25\,\mathrm{cm}\) radius. The strategy consists of computing the mean value for each neighborhood. The computations are run in parallel using 12 threads.

Arguments

– nan_policy: It can be "propagate" (default) so NaN features will be included in computations (potentially leading to NaN smooth features). Alternatively, it can be "replace" so NaN values are replaced with the feature-wise mean for each neighborhood. However, using "replace" leads to longer executions times. Therefore, "propagate" should be used always that NaN handling is not necessary.
– chunk_size: How many points per chunk must be considered for parallel computations.
– subchunk_size: How many neighborhoods per iteration must be considered when computing a chunk. It can be useful to prevent memory exhaustion scenarios.
– neighborhood: The definition of the neighborhood to be used. Supported neighborhoods are "knn" (for which a "k" value must be given), "sphere" (for which a "radius" value must be given), and "cylinder" (the "radius" refers to the disk of the cylinder).

– weighted_mean_omega: The \(\omega\) parameter for the weighted mean strategy (see SmoothFeatsMiner for a description of the maths).

– gaussian_rbf_omega: The \(\omega\) parameter for the Gaussian RBF strategy (see SmoothFeatsMiner for a description of the maths).

– input_fnames: The names of the features that must be smoothed.

– fnames: The names of the smoothing strategies to be used. Supported strategies are "mean", "weighted_mean", and "gaussian_rbf".

– frenames: The desired names for the generated output features. If not given, the names will be automatically derived.

– nthreads: The number of threads to be used for parallel computations (-1 means as many threads as available cores).

Output

The figure below represents the smoothed saturation computed for two spherical neighborhoods with \(25\,\mathrm{cm}\) and \(3\,\mathrm{m}\) radius, respectively. The point cloud is the March2018 validation one from the Hessigheim dataset.

Figure representing the smoothed saturation for two different spherical neighborhoods. — Figure representing the smoothed saturation for two different spherical neighborhoods with \(25\,\mathrm{cm}\) and \(3\,\mathrm{m}\) radius, respectively.

Smooth features miner ++

The class SmoothFeatsMinerPP can be used to derive smooth features from already available features. It is similar to the SmoothFeatsMiner class documented above. The main difference is that the C++ version supports more neighborhood definitions like 2D and 3D rectangular regions, 2D k-nearest neighbors, and bounded cylinders. The JSON below shows how to define a SmoothFeatsMinerPP:

{
  "in_pcloud": [
    "/ext4/lidar_data/semantic3d/laz/domfountain_station1_xyz_intensity_RGB.laz"
  ],
  "out_pcloud": [
    "out/smooth_feats_pp/*"
  ],
  "sequential_pipeline": [
    {
        "miner": "FPSDecorated",
        "fps_decorator": {
            "num_points": "m/10",
            "fast": true,
            "num_encoding_neighbors": 1,
            "num_decoding_neighbors": 1,
            "release_encoding_neighborhoods": false,
            "threads": -1,
            "representation_report_path": "*repr.las"
        },
        "decorated_miner": {
            "miner": "SmoothFeaturesPP",
            "nan_policy": "replace",
            "neighborhood": {
                "type": "boundedcylinder",
                "radius": 0.3,
                "lower_bound": -0.45,
                "upper_bound": 0.45
            },
            "weighted_mean_omega": 0.01,
            "gaussian_rbf_omega": 0.1977,
            "input_fnames": ["intensity"],
            "fnames": ["mean", "weighted_mean", "gaussian_rbf"],
            "frenames": ["inten_meanpp", "inten_wmeanpp", "inten_grbfpp"],
            "nthreads": -1
        }
    },
    {
        "writer": "Writer",
        "out_pcloud": "*domfountain_station1_feats.las"
    }
  ]
}

The JSON above defines a SmoothFeatsMinerPP that computes the smooth intensity considering a bounded cylinder with a disk of radius \(0.3\) and discarding any point that is \(0.45\) below or above the center of the neighborhood. Note that before applying the C++ data miner, a receptive field is computed to speedup the computation even further.

Arguments

– nan_policy: See smooth features miner documentation.
– neighborhood: The definition of the neighborhood to be used. Supported neighborhoods are "knn" (for which a "k" value must be given), "knn2d" (a k-nearest neighbor considering the \(x\) and \(y\) coordinates only), "sphere" (for which a "radius" value must be given), "cylinder" (the "radius" refers to the disk of the cylinder), "boundedcylinder" (the difference in the \(z\) coordinate with respect to the center point must be inside the given interval), "rectangular3d" (the "radius" defines the axis-wise half length for each edge), and "rectangular2d" (the rectangular region is defined for the \(x\) and \(y\) coordinates only).
– weighted_mean_omega: See smooth feautres miner documentation.
– gaussian_rbf_omega: See smooth features miner documentation.
– input_fnames: See smooth features miner documentation.
– fnames: See smooth features miner documentation.
– frenames: See smooth features miner documentation.
– nthreads: See smooth features miner documentation.

Output

The figure below represents the smoothed intensity computed in the bounded cylinder neighborhood defined in the example above. The left side of the figure represents the raw intensity, the right side the smoothed one (through a mean filter), so they can be compared. The data used in this example is the domfountain station1 point cloud from the Semantic3D dataset.

Figure representing the raw and smoothed (mean) intensity for a bounded cylindrical neighborhood. — Figure representing the raw and smoothed (mean) intensity for a bounded cylindrical neighborhood with \(0.3\) radius and a \(0.45\) distance threshold for both the lower and upper bounds.

Recount miner

The RecountMiner can be used to derive features based on counting the number of points. In doing so, many condition-based filters can be applied to filter the points. Furthermore, the recount of points can be used as a feature directly but also to derive the relative frequency, the surface density (points per area), the volume density (points per volume), and the number of vertical segments along a cylinder that contain at least one point passing the filters. The JSON below shows how to define a RecountMiner:

{
    "miner": "Recount",
    "chunk_size": 100000,
    "subchunk_size": 1000,
    "nthreads": 16,
    "neighborhood": {
        "type": "cylinder",
        "radius": 3.0
    },
    "input_fnames": ["vegetation", "tower", "PointWiseEntropy", "Prediction"],
    "filters": [
        {
            "filter_name": "pdensity",
            "ignore_nan": false,
            "absolute_frequency": true,
            "relative_frequency": false,
            "surface_density": true,
            "volume_density": true,
            "vertical_segments": 0,
            "conditions": null
        },
        {
            "filter_name": "maybe_tower",
            "ignore_nan": true,
            "absolute_frequency": true,
            "relative_frequency": true,
            "surface_density": true,
            "volume_density": true,
            "vertical_segments": 0,
            "conditions": [
                {
                    "value_name": "tower",
                    "condition_type": "greater_than_or_equal_to",
                    "value_target": 0.333333

                }
            ]
        },
        {
            "filter_name": "as_tower",
            "ignore_nan": true,
            "absolute_frequency": true,
            "relative_frequency": true,
            "surface_density": true,
            "volume_density": true,
            "vertical_segments": 8,
            "conditions": [
                {
                    "value_name": "Prediction",
                    "condition_type": "equals",
                    "value_target": 4
                }
            ]
        },
        {
            "filter_name": "unsure_veg",
            "ignore_nan": true,
            "absolute_frequency": true,
            "relative_frequency": true,
            "surface_density": false,
            "volume_density": false,
            "vertical_segments": 0,
            "conditions": [
                {
                    "value_name": "Prediction",
                    "condition_type": "equals",
                    "value_target": 2
                },
                {
                    "value_name": "PointWiseEntropy",
                    "condition_type": "greater_than_or_equal_to",
                    "value_target": 0.1
                }
            ]
        },
        {
            "filter_name": "unsure_veg2",
            "ignore_nan": true,
            "absolute_frequency": true,
            "relative_frequency": true,
            "surface_density": false,
            "volume_density": false,
            "vertical_segments": 0,
            "conditions": [
                {
                    "value_name": "Prediction",
                    "condition_type": "equals",
                    "value_target": 2
                },
                {
                    "value_name": "vegetation",
                    "condition_type": "less_than",
                    "value_target": 0.666667
                }
            ]
        }
    ]
}

The JSON above defines a RecountMiner that computes features from a previously classified point cloud. First, it computes the absolute frequency, and the densities considering all points. Then, it computes the frequencies and densities for points whose likelihood to be a tower is equal to or above \(0.\overline{3}\). Afterwards, the frequencies, densities, and counts how many of eight vertical segments contain at least one point, considering points predicted as tower. Later, the frequencies for points that have been predicted as vegetation and have a point-wise entropy greater than or equal to \(0.1\). Finally, the frequencies for points predicted as vegetation with a likelihood less than \(0.\overline{6}\).

Arguments

– chunk_size: How many points per chunk must be considered for parallel computations.
– subchunk_size: How many neighborhoods per iteration must be considered when computing a chunk. It can be useful to prevent memory exhaustion scenarios.

– nthreads: The number of threads to be used for parallel computations (-1 means as many threads as available cores).

– neighborhood: The definition of the neighborhood to be used. Supported neighborhoods are "knn" (for which a "k" value must be given), "sphere" (for which a "radius" value must be given), and "cylinder" (the "radius" refers to the disk of the cylinder).

– input_fnames: The names of the features to be considered when filtering the points.

– filters

A list with all the filters that must be computed. One set of output features will be generated for each filter. Any filter can consist of none or many conditions. The filters can be defined such that:

– filter_name: The name for the filter. It will be used to name the generated features.

– ignore_nan: A flag governing how to handle nans. When set to true, the filters will ignore points with nan values, i.e., they will not be counted.

– absolute_frequency: Whether to generate a feature with the absolute frequency or raw count (true) or not (false). The generated feature will be named by appending "_abs" to the filter name.

– relative_frequency: Whether to generate a feature with the relative frequency (true) or not (false). The generated feature will be named by appending "_rel" to the filter name.

– surface_density: Whether to generate a feature by dividing the number of points by the surface area. The surface density is computed assuming the area of a circle. The radius of the circle will be the given one when using spherical or cylindrical neighborhoods but it will be derived as the distance between the center point and the furthest neighbor for knn neighborhoods. The generated feature will be named by appending "_sd" to the filter name.

– volume_density: Whether to generate a feature by dividing the number of points by the volume. The volume is computed assuming a sphere. The radius of the sphere will be the given one when using spherical neighborhoods but it will be derived as the distance between the center point and the furthest neighbor for knn neighborhoods. For cylindrical neighborhoods, a circle will be considered instead of the sphere, and the volume will be computed as the area of the circle along the boundaries of the vertical axis. The generated feature will be named by appending "_vd" to the filter name.

– vertical_segments: Whether to generate a feature by dividing the neighborhood into linearly spaced segments along the vertical axis and counting how many partitions contain at least one point satisfying the conditions. The generated feature will be named by appending "_vs" to the filter name.

– conditions: The list of conditions is a list of elements defined in the same way as the conditions of the Advanced input but without the action parameter, that is always assumed to be "preserve".

Output

The generated output is a point cloud that includes the recount-based features.

Recount miner ++

The class RecountMinerPP can be used to derive recount features. It is similar to the RecountMiner class documented above. The main difference is that the C++ version supports more neighborhood definitions like 2D and 3D rectangular regions, 2D k-nearest neighbors, and bounded cylinders. It also supports different recounts like 2D and 3D sectors, rings, and radial boundaries (spherical shells). The JSON below shows how to define a RecountMinerPP:

{
  "in_pcloud": [
    "/ext4/lidar_data/semantic3d/laz/domfountain_station1_xyz_intensity_rgb_smallcut.laz"
  ],
  "out_pcloud": [
    "out/vl3dpp/domfountain_station1_feats.las"
  ],
  "sequential_pipeline": [
    {
        "miner": "GeometricFeatures",
        "radius": 0.1,
        "fnames": ["linearity", "planarity"],
        "frenames": ["line_r0_1", "plan_r0_1"],
        "nthreads": 12
    },
    {
        "miner": "RecountPP",
        "nthreads": -1,
        "neighborhood": {
            "type": "cylinder",
            "radius": 0.1
        },
        "input_fnames": ["line_r0_1", "plan_r0_1", "classification"],
        "filters": [
            {
                "filter_name": "pdenspp",
                "ignore_nan": false,
                "absolute_frequency": true,
                "relative_frequency": false,
                "surface_density": false,
                "volume_density": true,
                "vertical_segments": 32,
                "conditions": null
            },
            {
                "filter_name": "cardenspp",
                "ignore_nan": true,
                "absolute_frequency": false,
                "relative_frequency": true,
                "surface_density": true,
                "volume_density": false,
                "vertical_segments": 0,
                "conditions": [
                    {
                        "value_name": "classification",
                        "condition_type": "equals",
                        "value_target": 7

                    }
                ]
            },
            {
                "filter_name": "plandenspp",
                "ignore_nan": true,
                "absolute_frequency": true,
                "relative_frequency": true,
                "surface_density": true,
                "volume_density": true,
                "vertical_segments": 0,
                "rings": 0,
                "radial_boundaries": 0,
                "sectors2D": 0,
                "sectors3D": 0,
                "conditions": [
                    {
                        "value_name": "plan_r0_1",
                        "condition_type": "greater_than_or_equal_to",
                        "value_target": 0.5

                    }
                ]
            },
            {
                "filter_name": "linedenspp",
                "ignore_nan": true,
                "absolute_frequency": true,
                "relative_frequency": true,
                "surface_density": true,
                "volume_density": true,
                "vertical_segments": 32,
                "rings": 8,
                "radial_boundaries": 8,
                "sectors2D": 16,
                "sectors3D": 32,
                "conditions": [
                    {
                        "value_name": "line_r0_1",
                        "condition_type": "greater_than_or_equal_to",
                        "value_target": 0.5

                    }
                ]
            }
        ]
    }
  ]
}

The JSON above defines a RecountMinerPP that computes a couple of geometric features (linearity and planarity) for later computing recounts based on those features. The first recount ("pdens") considers all the points, the second recount considers all the points classified as car ("cardens"), the third one uses the planarity to select the points that have at least a value \(0.5\), and the last one considers the points with a linearity of at least \(0.5\).

Arguments

– nthreads

See Recount miner documentation.

– neighborhood

See Recount miner documentation.

– input_fnames

See Recount miner documentation.

– filters

See Recount miner documentation.

– filter_name See Recount miner documentation.

– ignore_nan: See Recount miner documentation.

– absolute_frequency See Recount miner documentation.

– relative_frequency See Recount miner documentation.

– surface_density See Recount miner documentation.

– volume_density See Recount miner documentation.

– vertical_segments See Recount miner documentation.

– conditions See Recount miner documentation.

– rings How many concentric and linearly spaced rings (annuli) must be analyzed around the studied point. Non empty concentric rings will be counted. The generated feature will be named by appending "rin" to the filter name.

– radial_boundaries How many concentric and linearly spaced radial boundaries (also known as spherical shells) must be analyzed around the studied point. Non empty spherical shells will be counted. The generated feature will be named by appending "rb" to the filter name.

– sectors2D How many 2D sectors (on the horizontal plane, i.e., the one defined by the \(x\) and \(y\) coordinates) must be analyzed around the studied point. Non empty 2D sectors will be counted. The generated feature will be named by appending "st2" to the filter name.

– sectors3D How many 3D sectors must be analyzed around the studied point. Non empty 3D sectors will be counted. The generated feature will be named by appending "st3" to the filter name.

Output

The generated output is a point cloud that includes the recount-based features. The figure below shows the linearity (left side) and the radial boundaries recount (right side) from the example above. The data used in this example is the domfountain station1 point cloud from the Semantic3D dataset.

Figure representing the linearity and a recount of the spherical shells based on points filtered by a linearity condition. — Figure representing the linearity computed from spherical neighborhoods of radius \(0.1\) in the left side. The right side shows the recount of spherical shells (radial boundaries) containing at least one point with a linearity greater than or equal to \(0.5\).

Take closest miner

The TakeClosestMiner can be used to derive features from another point cloud. It works by defining a pool of point clouds such that the closest neighbor between the input point cloud and any point cloud in the pool will be considered. Then, the features for each point will be taken from its closest neighbor.

{
    "miner": "TakeClosestMiner",
    "fnames": [
        "HSV_Hrad", "HSV_S", "HSV_V",
        "floor_distance_r50.0_sep0.35",
        "eigenvalue_sum_r0.3", "omnivariance_r0.3", "eigenentropy_r0.3",
        "anisotropy_r0.3", "planarity_r0.3", "linearity_r0.3",
        "PCA1_r0.3", "PCA2_r0.3",
        "surface_variation_r0.3", "sphericity_r0.3", "verticality_r0.3",
    ],
    "pcloud_pool": [
        "/home/point_clouds/point_cloud_A.laz",
        "/home/point_clouds/point_cloud_B.laz",
        "/home/point_clouds/point_cloud_C.laz"
    ],
    "distance_upper_bound": 0.1,
    "nthreads": 12
}

The JSON above defines a TakeClosestMiner that finds the features of the closest point in a pool of three point clouds. Neighbors further than \(0.1\,\mathrm{m}\) will not be considered, even if they are the closest neighbor.

Arguments

– fnames: The names of the features that must be taken from the closest neighbor in the pool.
– frenames: An optional list with the name of the output features. When not given, the output features will be named as specified by fnames.
– y_default: An optional value to be considered as the default label/class. If not given, it will be the max integer supported by the system.
– pcloud_pool: A list with the paths to the point clouds composing the pool.
– distance_upper_bound: The max distance threshold. Neighbors further than this distance will be ignored.
– nthreads: The number of threads for parallel queries.

Output

The generated output is a point cloud where the features correspond to the closest neighbor in the pool, assuming there is at least one neighbor that is closer than the distance upper bound.

Decorators

Furthest point sampling decorator

The FPSDecoratorTransformer can be used to decorate a data miner such that the computations can take place in a transformed space of reduced dimensionality. Typically, the domain of a data miner is the entire point cloud, let us say \(m\) points. When using a FPSDecoratedMiner this domain will be transformed to a subset of the original point cloud with \(R\) points, such that \(m \geq R\). Decorating a data miner with this decorator can be useful to reduce its execution time.

{
    "miner": "FPSDecorated",
    "fps_decorator": {
        "num_points": "m/3",
        "fast": true,
        "num_encoding_neighbors": 1,
        "num_decoding_neighbors": 1,
        "release_encoding_neighborhoods": false,
        "threads": 16,
        "representation_report_path": "*/fps_repr/geom_r3_representation.las"
    },
    "decorated_miner": {
        "miner": "GeometricFeatures",
        "in_pcloud": null,
        "out_pcloud": null,
        "radius": 3.0,
        "fnames": ["linearity", "planarity", "surface_variation", "verticality", "anisotropy", "PCA1", "PCA2"],
        "frenames": ["linearity_r3", "planarity_r3", "surface_variation_r3", "verticality_r3", "anisotropy_r3", "PCA1_r3", "PCA2_r3"],
        "nthreads": 16
    }
}

Arguments

– fps_decorator

The specification of the furthest point sampling (FPS) decoration carried out through the FPSDecoratorTransformer.

– num_points: The target number of points \(R\) for the transformed point cloud. It can be an integer or an expression that will be evaluated with \(m\) representing the number of points of the original point cloud, e.g., "m/2" will downscale the point cloud to half the number of points.
– fast: Whether to use exact furthest point sampling (false) or a faster stochastic approximation (true).

– num_encoding_neighbors: How many closest neighbors in the original point cloud are considered for each point in the transformed point cloud to reduce from the original space to the transformed one.

– num_decoding_neighbors: How many closest neighbors in the transformed point cloud are considered for each point in the original point cloud to propagate back from the transformed space to the original one.

– release_encoding_neighborhoods: Whether the encoding neighborhoods can be released after computing the transformation (true) or not (false). Releasing these neighborhoods means the FPSDecoratorTransformer.reduce() method must not be called, otherwise errors will arise. Setting this flag to true can help saving memory when needed.

– threads: The number of parallel threads to consider for the parallel computations. Note that -1 means using as many threads as available cores.

– representation_report_path: Where to export the transformed point cloud. In general, it should be null to prevent unnecessary operations. However, it can be enabled (by given any valid path to write a point cloud file) to visualize the points that are seen by the data miner.

– decorated_miner

A typical data mining specification. See the Geometric features miner for an example.

Minimum distance decimator decorator

The MinDistDecimatorDecorator can be used to decorate a data miner such that the computations can take place in a transformed space of reduced dimensionality. Typically, the domain of a data miner is the entire point cloud, let us say \(m\) points. When using a MinDistDecoratedMiner this domain will be transformed to a subset of the original point cloud with \(R \leq m\) points. This representation is achieved by assuring that no point is closer to its closest neighbor than a given minimum distance threshold. Decorating a data miner with this decorator can be useful to reduce its execution time.

{
    "miner": "MinDistDecorated",
    "mindist_decorator": {
        "min_distance": 0.02,
        "num_encoding_neighbors": 1,
        "num_decoding_neighbors": 1,
        "release_encoding_neighborhoods": false,
        "nthreads": -1,
        "representation_report_path": "*_representation.las"
    },
    "decorated_miner": {
        "miner": "GeometricFeaturesPP",
        "radius": 0.3,
        "fnames": [
            "linearity", "planarity", "surface_variation", "verticality", "anisotropy", "PCA1", "PCA2"
        ],
        "frenames": [
            "linearity_r0_3", "planarity_r0_3", "surface_variation_r0_3", "verticality_r0_3", "anisotropy_r0_3", "PCA1_r0_3", "PCA2_r0_3"
        ],
        "nthreads": -1
    }
}

Arguments

– mindist_decorator

The specificaiton of the minimum distance decimation-based decorator carried out through MinDistDecimatorDecorator.

– min_distance: The minimum distance threshold such that no pair of closest neighbors will be closer than this distance.

– num_encoding_neighbors: See the num_encoding_neighbors documentation of FPSDecoratedMiner.

– num_decoding_neighbors: See the num_decoding_neighbors documentation of FPSDecoratedMiner.

– release_encoding_neighborhoods: See the release_encoding_neighborhoods documentation of FPSDecoratedMiner.

– threads: See the threads documentation of FPSDecoratedMiner.

– representation_report_path: See the representation_report_path documentation of FPSDecoratedMiner.

– decorated_miner

A typical data mining specification. See the Geometric features miner for an example.

Simple smoother decorator

The SimpleSmootherDecoratorTransformer can be used to decorate a data miner such that the computations can take place in a transformed space where the points are replaced by smooth versions with less abrupt changes. When using a SimpleSmoothDecoratedMiner the data mining will be computed on the smoothed representation of the point cloud. The C++ implementation accessible through SimpleStructureSmootherPP enables the fast computation of the many smoothing strategies.

{
    "miner": "SimpleSmoothDecorated",
    "simple_smoother_decorator": {
        "neighborhood": {
            "type": "sphere",
            "radius": 0.015,
            "k": 1
        },
        "strategy":{
            "type": "idw",
            "parameter": 2,
            "min_distance": 0.0015
        },
        "correction": null,
        "nthreads": -1,
        "representation_report_path": "*_smoothed.las"
    },
    "decorated_miner": {
        "miner": "GeometricFeaturesPP",
        "radius": 0.3,
        "fnames": [
            "linearity", "planarity", "surface_variation", "verticality", "anisotropy", "PCA1", "PCA2"
        ],
        "frenames": [
            "linearity_r0_3", "planarity_r0_3", "surface_variation_r0_3", "verticality_r0_3", "anisotropy_r0_3", "PCA1_r0_3", "PCA2_r0_3"
        ],
        "nthreads": -1
    }
}

Arguments

– simple_smoother_decorator

The specification of the smoother decorator carried out through SimpleSmootherDecoratorTransformer.

– neighborhood

The neighborhood specification. See SimpleStructureSmootherPP.__init__() for further details.

– type: The type of neighborhood. Supported types are "knn", "knn2d", "sphere" and "cylinder".
– radius: The radius for spherical and cylindrical neighborhoods.
– k: The number of nearest neighbors in knn-based neighborhoods.

– strategy

The smoothing strategy. See SimpleStructureSmootherPP.__init__() for further details.

– type: Either "mean", "idw" (inverse distance weighting), or "rbf" (Gaussian radial basis function).
– parameter: The parameter governing whether the inverse distance weighting or the Gaussian radial basis function equations.
– min_distance: The min distance clip value to avoid division by zero when using inverse distance weighting.

– correction

The specification governing the Fibonacci correction. See SimpleStructureSmootherPP.__init__() for further details.

– K: The \(K \in \mathbb{Z}_{>0}\) parameter governing how many points we must have in the Fibonacci support.
– sigma: The cut threshold for the weighting function.

– nthreads

See the threads documentation of FPSDecoratedMiner.

– decorated_miner

A typical data mining specification. See the Geometric features miner for an example.