mining package

Submodules

mining.decorated_miner module

class mining.decorated_miner.DecoratedMiner(**kwargs)

Bases: Miner

Author:: Alberto M. Esmoris Pena

Abstract class providing the common logic for miner decorators.

Variables:: decorated_miner_spec (dict) – The specification of the decorated miner.

static extract_miner_args(spec)

Extract the arguments to initialize/instantiate a DecoratedMiner from a key-word specification.

Parameters:: spec – The key-word specification containing the arguments.
Returns:: The arguments to initialize/instantiate a DecoratedMiner.

__init__(**kwargs): Initialization for any instance of type DecoratedMiner.

get_decorated_fnames(): Get the feature names (fnames) from the decorated miner. :return: The feature names from the decorated miner. :rtype: list of str

get_decorated_frenames()

Get the feature renames (frenames) from the decorated miner.

Returns:: The feature renames from the decorated miner, i.e., the names for the mined features.
Return type:: list of str

mining.fps_decorated_miner module

class mining.fps_decorated_miner.FPSDecoratedMiner(**kwargs)

Bases: SamplingDecoratedMiner

Author:: Alberto M. Esmoris Pena

Decorator for data miners that makes the data mining process on an FPS-based representation of the point cloud.

The FPS Decorated Miner (FPSDecoratedMiner) constructs a representation of the point cloud, then it runs the data mining process on this representation and, finally, it propagates the features back to the original point cloud.

See FPSDecoratorTransformer.

Variables:

decorated_miner_spec (dict) – See DecoratedMiner.
decorated_miner (Miner) – The decorated miner object.
fps_decorator_spec (dict) – The specification of the FPS transformation defining the decorator.
fps_decorator (FPSDecoratorTransformer) – The FPS decorator to be applied on input point clouds.

static extract_miner_args(spec)

Extract the arguments to initialize/instantiate a FPSDecoratedMiner from a key-word specification.

Parameters:: spec – The key-word specification containing the arguments.
Returns:: The arguments to initialize/instantiate a FPSDecoratedMiner.

__init__(**kwargs): Initialization for any instance of type FPSDecoratedMiner.

mine(pcloud): Decorate the main data mining logic to work on the representation. See Miner and Miner.mine().

mining.geom_feats_miner module

class mining.geom_feats_miner.GeomFeatsMiner(**kwargs)

Bases: Miner

Author:: Alberto M. Esmoris Pena

Basic geometric features miner. See Miner.

Variables:

radius (float) – The radius (often in meters) attribute. Radius is 0.3 by default.
fnames (list) – The list of feature names (fnames) attribute. [‘linearity’, ‘planarity’, ‘sphericity’] by default.
nthreads (int) – The number of threads to be used for the parallel computation of the geometric features. Note using -1 (default value) implies using as many threads as available cores.
frenames (list) – Optional attribute to specify how to rename the mined features.

static extract_miner_args(spec)

Extract the arguments to initialize/instantiate a GeomFeatsMiner from a key-word specification.

Parameters:: spec – The key-word specification containing the arguments.
Returns:: The arguments to initialize/instantiate a GeomFeatsMiner.

__init__(**kwargs)

Initialize an instance of GeomFeatsMiner.

The radius and feature names (fnames) are always assigned during initialization. Their default values are 0.3 and the list [‘linearity’, ‘planarity’, ‘sphericity’], respectively.

The number of threads (nthreads or n_jobs) is also assigned during initialization with a default value of -1 which means use as many threads as available cores.

Parameters:: kwargs – The attributes for the GeomFeatsMiner that will also be passed to the parent.

mine(pcloud)

Mine geometric features from the given point cloud. See Miner and mining.Miner.mine().

Parameters:: pcloud (PointCloud) – The point cloud to be mined.
Returns:: The point cloud extended with geometric features.
Return type:: PointCloud

mining.geom_feats_minerpp module

class mining.geom_feats_minerpp.GeomFeatsMinerPP(**kwargs)

Bases: GeomFeatsMiner

Author:: Alberto M. Esmoris Pena

C++ version of the GeomFeatsMiner data miner. It supports more features than its Python counterpart. More concretely, supported features are:

– "linearity"

– "planarity"

– "sphericity"

– "surface_variation"

– "roughness"

– "verticality"

– "altverticality"

– "sqverticality"

– "horizontality"

– "sqhorizontality"

– "eigenvalue_sum"

– "omnivariance"

– "eigenentropy"

– "normalized_eigenentropy"

– "anisotropy"

– "PCA1"

– "PCA2"

– "number_of_neighbors"

– "fom"

– "nx"

– "ny"

– "nz"

– "esval1"

– "esval2"

– "esval3"

– "gauss_curv_full"

– "mean_curv_full"

– "full_quad_dev"

– "full_abs_algdist"

– "full_sq_algdist"

– "full_laplacian"

– "full_mean_qdev"

– "full_gradient_norm"

– "full_eigensum"

– "full_eigenentropy"

– "full_normalized_eigenentropy"

– "full_omnivariance"

– "full_hypersphericity"

– "full_hyperanisotropy"

– "full_spectral"

– "full_frobenius"

– "full_schatten"

– "full_coeff_norm"

– "full_linear_norm"

– "full_nlinear_norm"

– "full_cross_norm"

– "full_ncross_norm"

– "full_square_norm"

– "full_nsquare_norm"

– "full_bias_term"

– "full_abs_bias"

– "full_maxcurv"

– "full_mincurv"

– "full_maxabscurv"

– "full_minabscurv"

– "full_umbilic_dev"

– "full_rmsc"

– "full_umbilicality"

– "full_gauss_umbilicality"

– "full_shape_index"

– "full_eigen_index"

– "full_min_eigenval"

– "full_max_eigenval"

– "minpg"

– "maxpg"

– "avg"

– "nomralized_minpg"

– "normalized_maxpg"

– "normalized_avg"

– "saddleness"

– "absolute_hg"

– "hgnorm"

– "normalized_hgn"

– "bipgnorm"

– "normalized_bipgnorm"

– "gradient_axis_curv"

– "normalized_gac"

– "full_normalized_gac"

– "hessian_frobenius"

– "hessian_absdet"

– "full_abs_quadraticity"

The mathematical details of the features are given in the C++ documentation.

See Miner.

static extract_miner_args(spec)

Extract the arguments to initialize/instantiate a GeomFeatsMinerPP from a key-word specification.

Parameters:: spec – The key-word specification containing the arguments.
Returns:: The arguments to initialize/instantiate a GeomFeatsMinerPP.

__init__(**kwargs)

Initialize an instance of GeomFeatsMinerPP.

Parameters:: kwargs – The attributes for the GeomFeatsMinerPP that will also be passed to the parent.

mine(pcloud)

Mine geometric features from the given point cloud. See Miner and mining.Miner.mine().

Parameters:: pcloud (PointCloud) – The point cloud to be mined.
Returns:: The point cloud extended with geometric features.
Return type:: PointCloud

mining.height_feats_miner module

class mining.height_feats_miner.HeightFeatsMiner(**kwargs)

Bases: Miner

Author:: Alberto M. Esmoris Pena

Basic height features miner. See Miner.

Variables:

support_chunk_size (int) – How many tasks (support points) per chunk must be considered when computing the support neighborhoods (i.e., the neighborhoods centered at the support points). If it is zero, then all the points are considered at once.
support_subchunk_size (int) – How many support neighborhoods inside a given chunk must be considered when computing the features in parallel. It must be at least one, i.e., \(>0\).
pwise_chunk_size (int) – How many tasks (points) per chunk must be considered when computing the height features for each point in the point cloud. If it is zero, then all the points are considered at once.
neighborhood (dict) –
The neighborhood definition. For example:
```
{
    "type": "cylinder",
    "radius": 50,
    "separation_factor": 0.7
}
```
In this definition, the radius (often in meters) describes either the disk of a cylinder or half the side of a rectangular region along the vertical axis. Note that separation factor can be set to zero. In this case, the height features will be computed point-wise.
outlier_filter (str or None) – The outlier filter to be applied (if any).
fnames (list) – The list of height features that must be mined. [‘floor_distance’] by default.
nthreads (int) – The number of threads to be used for the parallel computation of the geometric features. Note using -1 (default value) implies using as many threads as available cores.
frenames (list) – Optional attribute to specify how to rename the mined features.

static extract_miner_args(spec)

Extract the arguments to initialize/instantiate a HeightFeatsMiner from a key-word specification.

Parameters:: spec – The key-word specification containing the arguments.
Returns:: The arguments to initialize/instantiate a HeightFeatsMiner.

__init__(**kwargs)

Initialize an instance of HeightFeatsMiner.

The neighborhood definition and feature names (fnames) are always assigned during initialization. The default neighborhood is a cylinder with a disk of radius 50.

Parameters:: kwargs (dict) – The attributes for the HeightFeatsMiner that will also be passed to the parent.

mine(pcloud)

Mine height features from the given point cloud. See Miner and mining.Miner.mine().

Parameters:: pcloud – The point cloud to be mined.
Returns:: The point cloud extended with height features.
Return type:: PointCloud

compute_height_features(X)

Compute the height features for the given matrix of coordinates \(\pmb{X} \in \mathbb{R}^{m \times 3}\).

Parameters:: X (np.ndarray) – The matrix of coordinates.
Returns:: The computed features.
Return type:: np.ndarray

compute_height_features_on_support(X, sup_X, kdt)

Compute the height features on each support neighborhood.

Parameters:

X – The matrix of coordinates representing the input point cloud.
sup_X – The center point for each support neighborhood.
kdt – The KDTree representing the input point cloud on (x, y) only (i.e., 2D).

Returns:

The support points for non-empty neighborhoods and the height features for each support point of a non-empty neighborhood.

Return type:

tuple (np.ndarray, np.ndarrray)

compute_pwise_height_features(X, sup_F, kdt)

Compute the height features for each point in the point cloud.

Parameters:

X – The matrix of coordinates representing the input point cloud.
sup_F – The features for each support point.
kdt – The KDTree representing the support points.

Returns:

The height features for each point in the point cloud.

Return type:

np.ndarray

select_support_height_functions()

Select height functions from specified feature names (fnames). These functions will be computed on the vertical coordinates of the neighborhood for each support point.

Returns:: List of functions to extract height features from a vector of vertical coordinates. Each feature is a map from a vector of arbitrary dimensionality representing height coordinates to a single scalar.
Return type:: list

select_height_functions()

Select height functions from specified feature names (fnames). Some of these features are taken directly from the support neighborhood, others are derived as a function of the point and the corresponding support neighborhood.

Returns:: Two lists. The first one is a list of functions to extract height features from a pair of values. The first value represents the vertical coordinate of the point in the point cloud and the second value represents a given height feature corresponding to the closest support point (or a vector of features, e.g., quartiles or deciles). The second list represents the dimensionality of each feature, i.e., one for most features because they correspond to a single scalar but greatear than one for vectorial features (e.g., 3 for quartiles or 9 for deciles).
Return type:: tuple of list

mining.height_feats_minerpp module

class mining.height_feats_minerpp.HeightFeatsMinerPP(**kwargs)

Bases: HeightFeatsMiner

Author:: Alberto M. Esmoris Pena

C++ version of the HeightFeatsMiner data miner.

See Miner.

static extract_miner_args(spec)

Extract the arguments to initialize/instantiate a HeightFeatsMinerPP from a key-word specification.

Parameters:: spec – The key-word specification containing the arguments.
Returns:: The arguments to initialize/instantiate a HeightFeatsMinerPP.

__init__(**kwargs)

Initialize an instance of HeightFeatsMinerPP.

See HeightFeatsMiner.__init__().

compute_height_features(X)

Compute the height features using the C++ implementation.

See HeightFeatsMiner.compute_height_features().

mining.hsv_from_rgb_miner module

class mining.hsv_from_rgb_miner.HSVFromRGBMiner(**kwargs)

Bases: Miner

Author:: Alberto M. Esmoris Pena

Mine Hue, Saturation and Value (HSV) components representing color from available Red, Green, Blue (RGB) components. See Miner.

Variables:: frenames (list) – Optional attribute to specify how to rename the features representing the HSV components. The first element corresponds to Hue, the second to Saturation, and the third one to Value.

static extract_miner_args(spec)

Extract the arguments to initialize/instantiate a HSVFromRGBMiner from a key-word specification.

Parameters:: spec – The key-word specification containing the arguments.
Returns:: The arguments to initialize/instantiate a HSVFromRGBMiner.

__init__(**kwargs)

Initialize an instance of HSVFromRGBMiner.

Parameters:: kwargs – The attributes for the HSVFromRGBMiner that will also be passed to the parent.

mine(pcloud)

Mine geometric features from the given point cloud.

Parameters:: pcloud (PointCloud) – The point cloud to be miend
Returns:: The point cloud extended with HSV color components as features.
Return type:: PointCloud

static RGB_to_HSV(R, G, B, hue_unit='radians')

Transform the received RGB components in \([0, 1]\) to HSV. If RGB components are given in \([0, 255]\) they will be automatically mapped to \([0, 1]\). Also, if RGB components are given in \([0, 65535]\) they will be automatically mapped to \([0, 1]\).

Parameters:

R – The red component for each point.
G – The green component for each point.
B – The blue component for each point.

Returns:

A tuple of three arrays representing Hue (H), Saturation (S) and Value (V).

Return type:

tuple of np.ndarray

mining.min_dist_decorated_miner module

class mining.min_dist_decorated_miner.MinDistDecoratedMiner(**kwargs)

Bases: SamplingDecoratedMiner

Author:: Alberto M. Esmoris Pena

Decorator for data miners that makes the data ming process on a minimum distance decimation-based representation of the point cloud.

The minimum distance decorated miner (MinDistDecoratedMiner) constructs a representation of the point cloud, then it runs the data mining process on this representation and, finally, it propagates the features back to the original point cloud.

See MinDistDecoratedMiner.

Variables:

decorated_miner_spec (dict) – See DecoratedMiner.
decorated_miner (Miner) – The decorated miner object.
mindist_decorator_spec (dict) – The specification of the minimum distance decimatiion defining the decorator.
mindist_decorator (MinDistDecimatorDecorator) – The minimum distance decimator decorator to be applied on input point clouds.

static extract_miner_args(spec)

Extract the arguments to initialize/instantiate a MinDistDecoratedMiner from a key-word specification.

Parameters:: spec – The key-word specification containing the arguments.
Returns:: The arguments to initialize/instantiate a MinDistDecoratedMiner.

__init__(**kwargs): Initialization for any instance of type MinDistDecoratedMiner.

mine(pcloud): Decorate the main data mining logic to work on the representation. See Miner and Miner.mine().

mining.miner module

exception mining.miner.MinerException(message='')

Bases: VL3DException

Author:: Alberto M. Esmoris Pena

Class for exceptions related to data mining components. See VL3DException.

__init__(message='')

class mining.miner.Miner(**kwargs)

Bases: object

Author:: Alberto M. Esmoris Pena

Interface governing any miner.

__init__(**kwargs)

Initialize a Miner.

Parameters:: kwargs – The key-word arguments for the initialization of any Miner. It must contain the name of the data mining to be computed.

abstractmethod mine(pcloud)

Mine features from a given input point cloud.

Parameters:: pcloud – The input point cloud for which features must be mined.
Returns:: The point cloud extended with the mined features.
Return type:: PointCloud

static get_structure_space_matrix(pcloud)

Obtain the structure space matrix (i.e., matrix of point-wise coordinates) considering the mining config.

If the structure space must be represented with less than 64 bits, then it will be shifted before the conversion (the bounding box center defines the translation vector) to prevent coordinate corruption when the input coordinates are given in a CRS with high numbers.

Parameters:: pcloud (PointCloud) – The point cloud whose structure space matrix must be obtained.
Returns:: The structure space matrix representing the point cloud.
Return type:: np.ndarray

static get_feature_type()

Determine the data type to be used to represent the feature space.

Returns:: The type to be used to represent the features.
Return type:: np.dtype

static get_feature_space_matrix(pcloud, fnames)

Obtain the feature space matrix (i.e., matrix of point-wise features) considering the mining config.

Parameters:

pcloud (PointCloud) – The point cloud whose feature space matrix must be obtained.
fnames (list of str) – The names of the features.

Returns:

The feature space matrix representing the point cloud.

Return type:

np.ndarray

mining.recount_miner module

class mining.recount_miner.RecountMiner(**kwargs)

Bases: Miner

Author:: Alberto M. Esmoris Pena

Recount miner. See Miner.

The recount miner considers each point in the point cloud \(\pmb{x_{i*}}\) and finds either the knn or its spherical neighborhood \(\mathcal{N}\). Now, let \(j\) index the points in the neighborhood. For then, a given feature \(f\) (or reference value \(y\), e.g., classification label) can be used to filter the points (e.g., selecting \(\pmb{x}_{i*} \in \mathcal{N}\) such that the j-th feature for the i-th points satisfies \({f_i > \tau}\), for a given threshold \(\tau\). Finally, all the points in the filtered neighborhood can be counted in terms of absolute and relative frequency, and also with respect to the surface or the volume of the neighborhood (given by the radius of the spherical neighborhood or the distance wrt the closest nearest neighbor for knn neighborhoods).

static extract_miner_args(spec)

Extract the arguments to initialize/instantiate a RecountMiner from a key-word specification.

Parameters:: spec – The key-word specification containing the arguments.
Returns:: The arguments to initialize/instantiate a RecountMiner.

__init__(**kwargs)

Initialize an instance of RecountMiner.

The neighborhood definition and feature names (fnames) are always assigned during initialization. The default neighborhood is a knn neighborhood with \(k=16\).

Parameters:: kwargs (dict) – The attributes for the RecountMiner that will also be passed to the parent.

mine(pcloud)

Mine recounts on filtered neighborhoods from the given point cloud.

Parameters:: pcloud – The point cloud to be mined.
Returns:: The point cloud extended with recounts.
Return type:: PointCloud

get_recount_names_from_filter(f)

Obtain the new feature names generated by the filter.

Parameters:: f (dict) – The filter specification.
Returns:: The names of the new features generated by the filter.
Return type:: list

fname_to_feature_index(fname)

Obtain the feature index corresponding to the given fname.

Parameters:: fname (str) – The name of the feature which index must be found.
Returns:: The index of the feature with the given name.
Return type:: int

compute_recount(X, F, kdt, neighborhood_f, neighborhood_radius, X_chunk, F_chunk, chunk_idx)

Compute the recounts for a given chunk.

Parameters:

X – The structure space matrix (i.e., the matrix of coordinates).
F – The feature space matrix (i.e., the matrix of features).
kdt – The KDTree representing the entire point cloud.
neighborhood_f – The function to extract neighborhoods for the points in the chunk.
neighborhood_radius – The radius of the spherical neighborhood or None to be computed from the points (e.g., for knn neighborhoods).
X_chunk – The structure space matrix of the chunk.
F_chunk – The feature space matrix of the chunk.
chunk_idx – The index of the chunk.

Returns:

The recount features computed for the chunk.

Return type:

np.ndarray

compute_filter(f, X, F, X_sub, I, r)

Compute the given filter on the neighborhoods of a given chunk.

Parameters:

f (dict) – The specification of the filter to be computed.
X (np.ndarray) – The matrix of coordinates representing the input point cloud.
F (np.ndarray) – The matrix of features representing the intput point cloud.
X_sub (np.ndarray) – The matrix of coordinates representing the subchunk which recount features must be computed.
I (list of list of int) – The list of lists of indices such that the i-th list contains the indices of the points in X that belong to the neighborhood of the i-th point in X_sub.
r (float or None) – The radius of the spherical neighborhood, None if it must be computed from the points in the neighborhood.

Returns:

The recount features for the points in X_sub.

Return type:

np.ndarray

recount_absolute_frequency(F): Count the number of points.

recount_relative_frequency(F, total_pts): The number of points after filtering divided by the total number of points before filtering.

recount_surface_density(F, X2D, x, r)

The number of points after filtering divided by the area of the neighborhood.

If the neighborhood is a spherical one with radius \(r\) then the area will be given by \(\pi r^2\). If the neighborhood is a knn one then the area will be given by \(\pi \left(\dfrac{d^*}{2}\right)^2\), where \(d^*\) is the distance between the \((x, y)\) coordinates of the center point and the furthest one.

recount_volume_density(F, X, x, r)

The number of points after filtering divided by the volume of the neighborhood.

If the neighborhood is a spherical one with radius \(r\) then the volume will be given by \(\dfrac{4}{3}\pi r^2\). If the neighborhood is a knn one then the area will be given by \(\dfrac{4}{3}\pi \left(\dfrac{d^*}{2}\right)^2\), where \(d^*\) is the distance between the \((x, y, z)\) coordinates of the center point and the furthest one. When using a cylinder, the radius will be considered to compute the area and the volume will be computed considering the vertical boundaries of the cylinder such that \(\pi r^2 (z^*-z_*)\) where \(z_*\) is the min vertical coordinate and \(z^*\) is the max vertical coordinate.

Note that, for cylindrical neighborhoods, if there is no difference between the max and the min vertical coordinate, then the maximum integer will be returned, effectively avoiding a division by zero.

recount_vertical_segments(F, z, num_segments): The number of vertical segments along a vertical cylinder that contain at least one point.

static apply_conditions(I, F, conditions)

Apply the conditions to filter out all the points that do not satisfy one or more of them.

Parameters:

I (list) – The indices for the current neighborhood.
F (np.ndarray) – The features
conditions (list) – The conditions to be applied.

Returns:

The indices of the current neighborhood that satisfy the conditions.

Return type:

list

static apply_condition(f, condition)

Check whether the condition is satisfied for each given point.

Parameters:

f – The feature vector where the condition must be checked.
condition – The specification of the condition to be checked.

Returns:

The mask with True for points that satisfy the condition, False otherwise.

Return type:

np.ndarray of bool

get_decorated_fnames()

Obtain the names of the recount features.

Returns:: List with the names of the recount features.
Return type:: list of str

mining.recount_minerpp module

class mining.recount_minerpp.RecountMinerPP(**kwargs)

Bases: RecountMiner

Author:: Alberto M. Esmoris Pena

C++ version of the RecountMiner data miner.

It supports more neighborhoods like 2D k-nearest neighbors, bounded cylindrical neighborhoods, and 2D and 3D rectangular neighborhoods.

It also supports more recount-based features per filter like ring-based features, radial boundaries, 2D sectors, and 3D sectors.

See Miner and RecountMiner.

static extract_miner_args(spec)

Extract the arguments to initialize/instantiate a RecountMinerPP from a key-word specification.

Parameters:: spec – The key-word specification containing the arguments.
Returns:: The arguments to initialize/instantiate a RecountMinerPP.

__init__(**kwargs)

Initialize an instance of RecountMinerPP.

See RecountMiner.__init__().

mine(pcloud)

Mine recount features from the given point cloud.

Parameters:: pcloud – The point cloud to be mined.
Returns:: The point cloud extended with recount-based features.
Return type:: PointCloud

extract_cpp_conditions()

Obtain the conditions as arguments for the C++ recount miner. Each filter can have many conditions and each condition is represented by the index of the considered feature, the condition type, and the target value.

See the RecountMiner documentation for more information.

Returns:: Three different lists of lists. The first one for feature indices (integer), the second one for the condition types (string), and the third one for the target values (list of numbers). The first element of each list is a list whose elements define the different conditions for the corresponding filter.
Return type:: tuple of three lists of lists

get_recount_names_from_filter(f)

Override the RecountMiner.get_recount_names_from_filter() to support the extra features generated by the C++ version.

See RecountMiner and RecountMiner.get_recount_names_from_filter().

mining.sampling_decorated_miner module

class mining.sampling_decorated_miner.SamplingDecoratedMiner(**kwargs)

Bases: DecoratedMiner

Author:: Alberto M. Esmoris Pena

Abstract class providing the common logic for miner decorators that resample the point cloud.

See DecoratedMiner.

static extract_miner_args(spec)

Extract the arguments to initialize/instantiate a SamplingDecoratedMiner from a key-word specification.

Parameters:: spec – The key-word specification containing the arguments.
Returns:: The arguments to initialize/instantiate a SamplingDecoratedMiner.

__init__(**kwargs): Initialization for any instance of type SamplingDecoratedMiner.

mining.simple_smooth_decorated_miner module

class mining.simple_smooth_decorated_miner.SimpleSmoothDecoratedMiner(**kwargs)

Bases: Miner

Author:: Alberto M. Esmoris Pena

Decorator for data miners that makes the data mining process on a smoothed representation of the point cloud’s structure space.

The simple smooth decorated miner (SimpleSmoothDecoratedMiner) constructs a representation of the point cloud, then it runs the data mining process on this representation and, finally, it assigns the features to the original point cloud.

See SimpleSmootherDecoratorTransformer.

Variables:

decorated_miner_spec (dict) – The specification of the decorated miner.
decorated_miner (Miner) – The decorated miner object.
simple_smoother_decorator_spec (dict) – The specification of the simple smoother defining the decorator.
simple_smoother_decorator (SimpleSmootherDecoratorTransformer) – The simple smoother decorator to be applied on input point clouds.

static extract_miner_args(spec)

Extract the arguments to initialize/instantiate a SimpleSmoothDecoratedMiner from a key-word specification.

Parameters:: spec (dict) – The key-word specification containing the arguments.
Returns:: The arguments to initialize/instantiate a SimpleSmoothDecoratedMiner.

__init__(**kwargs): Initialization for any instance of type SimpleSmoothDecoratedMiner.

mine(pcloud): Decorate the main data mining logic to work on the representation. See Miner and Miner.mine().

mining.smooth_feats_miner module

class mining.smooth_feats_miner.SmoothFeatsMiner(**kwargs)

Bases: Miner

Author:: Alberto M. Esmoris Pena

Basic smooth features miner. See Miner.

The smooth features miner considers each point in the point cloud \(\pmb{x_{i*}}\) and finds either the knn or its spherical neighborhood \(\mathcal{N}\). Now, let \(j\) index the points in the neighborhood. For then, a given feature \(f\) can be smoothed by considering all the points in the neighborhood. In the most simple way, the smoothed feature \(\hat{f}\) can be computed as a mean:

\[\hat{f}_i = \dfrac{1}{\lvert\mathcal{N}\rvert} \sum_{j=1}^{\lvert\mathcal{N}\rvert}{f_j}\]

Alternatively, the feature can be smoothed considering a weighted mean where the closest points with respect to \(\pmb{x_{i*}}\) have a greater weight, such that:

\[\hat{f}_i = \dfrac{1}{D}\sum_{j=1}^{\lvert\mathcal{N}\rvert}{d_j f_j}\]

Where \(d^*=\max_{j} \; \left\{\lVert\pmb{x_{i*}} - \pmb{x_{j*}}\rVert : j = 1,\ldots,\lvert\mathcal{N}\rvert \right\}\), \(d_j = d^* - \lVert{\pmb{x_{i*}}-\pmb{x_{j*}}}\rVert + \omega\), and \(D = \sum_{j=1}^{\mathcal{N}}{d_j}\).

Moreover, a Gaussian Radial Basis Function (RBF) can be used to smooth the features in a given neighborhood such that:

\[\hat{f}_i = \dfrac{1}{D} \sum_{j=1}^{\lvert\mathcal{N}\rvert}{ \exp\left[ - \dfrac{\lVert{\pmb{x_{i*}} - \pmb{x_{j*}}}\rVert^2}{\omega^2} \right] f_j }\]

Where \(D = \displaystyle\sum_{j=1}^{\lvert\mathcal{N}\rvert}{\exp\left[-\dfrac{\lVert\pmb{x_{i*}}-\pmb{x_{j*}}\rVert^2}{\omega^2}\right]}\) .

One useful tip to configure a Gaussian RBF with respect to the unitary case, i.e., \(\exp\left(-\dfrac{1}{\omega^2}\right)\) is to define the \(\omega\) parameter of the non-unitary case as \(\varphi = \sqrt{\omega^2 r^2}\) where \(r\) is the radius of the neighborhood. For example, to use a sphere neighborhood of radius 5 so that a point at 5 meters of the center will have a contribution corresponding to a point at one meter in the unitary case is to use \(\varphi = \sqrt{\omega^2 5^2}\) as the new \(\omega\) for the Gaussian RBF.

Variables:

chunk_size (int) – How many points per chunk must be considered when computing the data mining in parallel.
subchunk_size (int) – How many neighborhoods per iteration must be considered when compting a chunk. It is useful to prevent memory exhaustion when considering many big neighborhoods at the same time.
neighborhood (dict) –
The definition of the neighborhood to be used. It can be a KNN neighborhood:
```
{
    "type": "knn",
    "k": 16
}
```
But it can also be a spherical neighborhood:
```
{
    "type": "sphere",
    "radius": 0.25
}
```
weighted_mean_omega (float) – The \(\omega\) parameter for the weighted mean strategy.
gaussian_rbf_omega (float) – The \(\omega\) parameter for the Gaussian RBF strategy.
nan_policy (str) – The policy specifying how to handle NaN values in the feature space. It can be "propagate" to propagate NaN values or "replace" to replace NaN values by the mean of numerical values.
input_fnames (list) – The list with the name of the input features that must be smoothed.
fnames (list) – The list with the name of the smooth strategies to be computed.
frenames (list) – The name of the output features.
nthreads – The number of threads for parallel execution (-1 means as many threads as available cores).

Vartype:

nthreads: int

static extract_miner_args(spec)

Extract the arguments to initialize/instantiate a SmoothFeatsMiner from a key-word specification.

Parameters:: spec – The key-word specification containing the arguments.
Returns:: The arguments to initialize/instantiate a SmoothFeatsMiner.

__init__(**kwargs)

Initialize an instance of SmoothFeatsMiner.

The neighborhood definition and feature names (fnames) are always assigned during initialization. The default neighborhood is a knn neighborhood with \(k=16\).

Parameters:: kwargs (dict) – The attributes for the SmoothFeatsMiner that will also be passed to the parent.

mine(pcloud)

Mine smooth features from the given point cloud.

Parameters:: pcloud – The point cloud to be mined.
Returns:: The point cloud extended with smooth features.
Return type:: PointCloud

compute_smooth_features(X, F, kdt, neighborhood_f, smooth_funs, X_chunk, F_chunk, chunk_idx)

Compute the smooth features for a given chunk.

Parameters:

X – The structure space matrix (i.e., the matrix of coordinates).
F – The feature space matrix (i.e., the matrix of features).
kdt – The KDTree representing the entire point cloud.
neighborhood_f – The function to extract neighborhoods for the points in the chunk.
smooth_funs – The functions to compute the requested smooth features.
X_chunk – The structure space matrix of the chunk.
F_chunk – The feature space matrix of the chunk.
chunk_idx – The index of the chunk.

Returns:

The smooth features computed for the chunk.

Return type:

np.ndarray

mean_f(X, F, X_sub, I)

Mine the smooth features using the mean.

Parameters:

X – The matrix of coordinates representing the input point cloud.
F – The matrix of features representing the intput point cloud.
X_sub – The matrix of coordinates representing the subchunk which smooth features must be computed.
I – The list of lists of indices such that the i-th list contains the indices of the points in X that belong to the neighborhood of the i-th point in X_sub.

Returns:

The smooth features for the points in X_sub.

weighted_mean_f(X, F, X_sub, I)

Mine the smooth features using the weighted mean.

For the parameters and the return see smooth_feats_miner.SmoothFeatsMiner.mean_f() because the parameters and the return are the same but computed with a different strategy.

gaussian_rbf(X, F, X_sub, I)

Mine the smooth features using the Gaussian Radial Basis Function.

For the parameters and the return see smooth_feats_miner.SmoothFeatsMiner.mean_f() because the parameters and the return are the same but computed with a different strategy.

static knn_neighborhood_f(miner, kdt, X_sub)

The k nearest neighbors (KNN) neighborhood function.

Parameters:

kdt – The KDT representing the entire point cloud (X).
X_sub – The points whose neighborhoods must be found.

Returns:

The k indices of the nearest neighbors in X for each point in X_sub.

static sphere_neighborhood_f(miner, kdt, X_sub)

The spherical neighborhood function.

Parameters:

kdt – The KDT representing the entire point cloud (X)
X_sub – The points whose neighborhoods must be found.

Returns:

The indices of the points in X that belong to the spherical neighborhood for each point in X_sub.

static cylinder_neighborhood_f(miner, kdt, X_sub)

The cylinder neighborhood function.

Parameters:

kdt – The KDT representing the entire point cloud (X).
X_sub – The points whose neighborhood must be found.

Returns:

The indices of the points in X that belong to the cylindrical neighborhood for each point in X_sub.

static nan_policy_propagate_f(F)

Apply the NaN policy that propagates the matrix of features with no handling at all.

Parameters:: F (np.ndarray) – The matrix of features given as input.
Returns:: The transformed matrix of features.
Return type:: np.ndarray

static nan_policy_replace_f(F)

Apply the NaN policy that replaces the NaN values in the matrix of features by the mean of the numerical values for the corresponding feature (each column of the matrix is assumed to be an independent feature).

NOTE that this method will modify the F matrix inplace, apart from returning it.

Parameters:: F (np.ndarray) – The matrix of features given as input.
Returns:: The transformed matrix of features.
Return type:: np.ndarray

static prepare_mining(miner, X)

Prepare the data miner to handle the neighborhoods and build a KDTree to speed up the spatial queries.

Parameters:

miner – The miner to be prepared.
X (np.ndarray) – The structure space matrix, i.e., the matrix of coordinates representing the point cloud.

Returns:

The neighborhood radius, the function for spatial queries, and the serialized KDTree.

Return type:

float, callable, bytes

static prepare_chunks(miner, X)

Prepare the chunks for the parallel computation of the data mining.

Parameters:

miner – The miner for which the chunks must be prepared.
X (np.ndarray) – The structure space matrix, i.e., the matrix of coordinates representing the point cloud.

Returns:

The number of chunks and the chunk size.

Return type:

int, int

static prepare_chunk(miner, X_chunk, kdt)

Compute the subchunk configuration and deserialize the KDTree so the chunk can be computed.

Parameters:

miner – The miner for which the chunks must be prepared.
X_chunk – The structure space matrix representing the chunk.
kdt – The serialized KDTree.

Returns:

Number of subchunks, subchunk size, and deserialized KDTree

Return type:

int, int, KDTree

static prepare_subchunk(miner, subchunk_idx, subchunk_size, X_chunk, kdt, neighborhood_f)

Prepare the given subchunk, so it can be passed to the method that mines the features on a given subchunk.

Parameters:

miner – The miner for which the chunks must be prepared.
subchunk_idx – The index of the subchunk to be prepared.
subchunk_size – The size of the subchunk.
X_chunk – The structure space matrix representing the chunk.
kdt – The deserialized KDTree to speed up the spatial queries.
neighborhood_f – The neighborhood function defining the spatial queries.

Returns:

The structure space matrix representing the subchunk, and the indices of the neighborhoods.

Return type:

np.ndarray, list of list

static prepare_mined_features_chunk(Fhat_chunk, Fhat_sub)

Prepare the mined features from the chunk considering the current mined features and those for the current subchunk.

Parameters:

Fhat_chunk – The already mined features so far.
Fhat_sub – The features for the current subchunk.

Returns:

The mines features for the chunk.

Return type:

np.ndarray

mining.smooth_feats_minerpp module

class mining.smooth_feats_minerpp.SmoothFeatsMinerPP(**kwargs)

Bases: SmoothFeatsMiner

Author:: Alberto M. Esmoris Pena

C++ version of the SmoothFeatsMiner data miner.

It also supports more neighborhoods like 2D k-nearest neighbors, bounded cylindrical neighborhoods, and 2D and 3D rectangular neighborhoods.

See Miner and SmoothFeatsMiner.

static extract_miner_args(spec)

Extract the arguments to initialize/instantiate a SmoothFeatsMinerPP from a key-word specification.

Parameters:: spec – The key-word specification containing the arguments.
Returns:: The arguments to initialize/instantiate a SmoothFeatsMinerPP.

__init__(**kwargs)

Initialize an instance of SmoothFeatsMinerPP.

See SmoothFeatsMiner.__init__().

mine(pcloud)

Mine smooth features from the given point cloud.

Parameters:: pcloud – The point cloud to be mined.
Returns:: The point cloud extended with smooth features.
Return type:: PointCloud

mining.take_closest_miner module

class mining.take_closest_miner.TakeClosestMiner(**kwargs)

Bases: Miner

Author:: Alberto M. Esmoris Pena

Take closest miner. See Miner.

The take closest miner considers a pool of point clouds and for each point in the input point cloud takes the requested features from the closest neighbor in the entire pool. It can be useful, for example, to have a set of mined point clouds and take just some points for training that have been manually labeled in the non mined point clouds (e.g., uncertainty point clouds, see ClassificationUncertaintyEvaluator).

Variables:

fnames (list of str) – The names of the features that must be taken from the closest neighbor in the pool.
frenames (list of str) – The names of the features in the output point cloud. When not given (i.e., None), they will be the feature names.
y_default (int) – The default value for classification labels. If not given, it is the max value for integers np.iinfo(int).max.
pcloud_pool (list of str) – The list of paths to the point clouds composing the pool.
distance_upper_bound (float) – The max supported distance. It can be used to prune tree searches to speed-up the computations.
nthreads (int) – The number of threads for the parallel closest neighbors query. Using -1 implies considering as many threads as available cores.

static extract_miner_args(spec)

Extract the arguments to initialize/instantiate a TakeClosestMiner from a key-word specification.

Parameters:: spec – The key-word specification containing the arguments.
Returns:: The arguments to initialize/instantiate a TakeClosestMiner.

__init__(**kwargs)

Initialize an instance of TakeClosestMiner.

Parameters:: kwargs (dict) – The attributes for the TakeClosestMiner that will also be passed to the parent.

mine(pcloud)

Mine feature from closest neighbor in pool.

Parameters:: pcloud – The point cloud to be mined.
Returns:: The point cloud extended with taken features.
Return type:: PointCloud

Module contents

author:: Alberto M. Esmoris Pena

The mining package contains the logic to mine features from point clouds and potential complementary data sources.