src.mining.smooth_feats_miner

Classes

SmoothFeatsMiner(**kwargs)

class src.mining.smooth_feats_miner.SmoothFeatsMiner(**kwargs)
Author:

Alberto M. Esmoris Pena

Basic smooth features miner. See Miner.

The smooth features miner considers each point in the point cloud \(\pmb{x_{i*}}\) and finds either the knn or its spherical neighborhood \(\mathcal{N}\). Now, let \(j\) index the points in the neighborhood. For then, a given feature \(f\) can be smoothed by considering all the points in the neighborhood. In the most simple way, the smoothed feature \(\hat{f}\) can be computed as a mean:

\[\hat{f}_i = \dfrac{1}{\lvert\mathcal{N}\rvert} \sum_{j=1}^{\lvert\mathcal{N}\rvert}{f_j}\]

Alternatively, the feature can be smoothed considering a weighted mean where the closest points with respect to \(\pmb{x_{i*}}\) have a greater weight, such that:

\[\hat{f}_i = \dfrac{1}{D}\sum_{j=1}^{\lvert\mathcal{N}\rvert}{d_j f_j}\]

Where \(d^*=\max_{j} \; \left\{\lVert\pmb{x_{i*}} - \pmb{x_{j*}}\rVert : j = 1,\ldots,\lvert\mathcal{N}\rvert \right\}\), \(d_j = d^* - \lVert{\pmb{x_{i*}}-\pmb{x_{j*}}}\rVert + \omega\), and \(D = \sum_{j=1}^{\mathcal{N}}{d_j}\).

Moreover, a Gaussian Radial Basis Function (RBF) can be used to smooth the features in a given neighborhood such that:

\[\hat{f}_i = \dfrac{1}{D} \sum_{j=1}^{\lvert\mathcal{N}\rvert}{ \exp\left[ - \dfrac{\lVert{\pmb{x_{i*}} - \pmb{x_{j*}}}\rVert^2}{\omega^2} \right] f_j }\]

Where \(D = \displaystyle\sum_{j=1}^{\lvert\mathcal{N}\rvert}{\exp\left[-\dfrac{\lVert\pmb{x_{i*}}-\pmb{x_{j*}}\rVert^2}{\omega^2}\right]}\) .

One useful tip to configure a Gaussian RBF with respect to the unitary case, i.e., \(\exp\left(-\dfrac{1}{\omega^2}\right)\) is to define the \(\omega\) parameter of the non-unitary case as \(\varphi = \sqrt{\omega^2 r^2}\) where \(r\) is the radius of the neighborhood. For example, to use a sphere neighborhood of radius 5 so that a point at 5 meters of the center will have a contribution corresponding to a point at one meter in the unitary case is to use \(\varphi = \sqrt{\omega^2 5^2}\) as the new \(\omega\) for the Gaussian RBF.

Variables:
  • chunk_size (int) – How many points per chunk must be considered when computing the data mining in parallel.

  • subchunk_size (int) – How many neighborhoods per iteration must be considered when compting a chunk. It is useful to prevent memory exhaustion when considering many big neighborhoods at the same time.

  • neighborhood (dict) –

    The definition of the neighborhood to be used. It can be a KNN neighborhood:

    {
        "type": "knn",
        "k": 16
    }
    

    But it can also be a spherical neighborhood:

    {
        "type": "sphere",
        "radius": 0.25
    }
    

  • weighted_mean_omega (float) – The \(\omega\) parameter for the weighted mean strategy.

  • gaussian_rbf_omega (float) – The \(\omega\) parameter for the Gaussian RBF strategy.

  • nan_policy (str) – The policy specifying how to handle NaN values in the feature space. It can be "propagate" to propagate NaN values or "replace" to replace NaN values by the mean of numerical values.

  • input_fnames (list) – The list with the name of the input features that must be smoothed.

  • fnames (list) – The list with the name of the smooth strategies to be computed.

  • frenames (list) – The name of the output features.

  • nthreads – The number of threads for parallel execution (-1 means as many threads as available cores).

Vartype:

nthreads: int

static extract_miner_args(spec)

Extract the arguments to initialize/instantiate a SmoothFeatsMiner from a key-word specification.

Parameters:

spec – The key-word specification containing the arguments.

Returns:

The arguments to initialize/instantiate a SmoothFeatsMiner.

__init__(**kwargs)

Initialize an instance of SmoothFeatsMiner.

The neighborhood definition and feature names (fnames) are always assigned during initialization. The default neighborhood is a knn neighborhood with \(k=16\).

Parameters:

kwargs (dict) – The attributes for the SmoothFeatsMiner that will also be passed to the parent.

mine(pcloud)

Mine smooth features from the given point cloud.

Parameters:

pcloud – The point cloud to be mined.

Returns:

The point cloud extended with smooth features.

Return type:

PointCloud

compute_smooth_features(X, F, kdt, neighborhood_f, smooth_funs, X_chunk, F_chunk, chunk_idx)

Compute the smooth features for a given chunk.

Parameters:
  • X – The structure space matrix (i.e., the matrix of coordinates).

  • F – The feature space matrix (i.e., the matrix of features).

  • kdt – The KDTree representing the entire point cloud.

  • neighborhood_f – The function to extract neighborhoods for the points in the chunk.

  • smooth_funs – The functions to compute the requested smooth features.

  • X_chunk – The structure space matrix of the chunk.

  • F_chunk – The feature space matrix of the chunk.

  • chunk_idx – The index of the chunk.

Returns:

The smooth features computed for the chunk.

Return type:

np.ndarray

mean_f(X, F, X_sub, I)

Mine the smooth features using the mean.

Parameters:
  • X – The matrix of coordinates representing the input point cloud.

  • F – The matrix of features representing the intput point cloud.

  • X_sub – The matrix of coordinates representing the subchunk which smooth features must be computed.

  • I – The list of lists of indices such that the i-th list contains the indices of the points in X that belong to the neighborhood of the i-th point in X_sub.

Returns:

The smooth features for the points in X_sub.

weighted_mean_f(X, F, X_sub, I)

Mine the smooth features using the weighted mean.

For the parameters and the return see smooth_feats_miner.SmoothFeatsMiner.mean_f() because the parameters and the return are the same but computed with a different strategy.

gaussian_rbf(X, F, X_sub, I)

Mine the smooth features using the Gaussian Radial Basis Function.

For the parameters and the return see smooth_feats_miner.SmoothFeatsMiner.mean_f() because the parameters and the return are the same but computed with a different strategy.

static knn_neighborhood_f(miner, kdt, X_sub)

The k nearest neighbors (KNN) neighborhood function.

Parameters:
  • kdt – The KDT representing the entire point cloud (X).

  • X_sub – The points whose neighborhoods must be found.

Returns:

The k indices of the nearest neighbors in X for each point in X_sub.

static sphere_neighborhood_f(miner, kdt, X_sub)

The spherical neighborhood function.

Parameters:
  • kdt – The KDT representing the entire point cloud (X)

  • X_sub – The points whose neighborhoods must be found.

Returns:

The indices of the points in X that belong to the spherical neighborhood for each point in X_sub.

static cylinder_neighborhood_f(miner, kdt, X_sub)

The cylinder neighborhood function.

Parameters:
  • kdt – The KDT representing the entire point cloud (X).

  • X_sub – The points whose neighborhood must be found.

Returns:

The indices of the points in X that belong to the cylindrical neighborhood for each point in X_sub.

static nan_policy_propagate_f(F)

Apply the NaN policy that propagates the matrix of features with no handling at all.

Parameters:

F (np.ndarray) – The matrix of features given as input.

Returns:

The transformed matrix of features.

Return type:

np.ndarray

static nan_policy_replace_f(F)

Apply the NaN policy that replaces the NaN values in the matrix of features by the mean of the numerical values for the corresponding feature (each column of the matrix is assumed to be an independent feature).

NOTE that this method will modify the F matrix inplace, apart from returning it.

Parameters:

F (np.ndarray) – The matrix of features given as input.

Returns:

The transformed matrix of features.

Return type:

np.ndarray

static prepare_mining(miner, X)

Prepare the data miner to handle the neighborhoods and build a KDTree to speed up the spatial queries.

Parameters:
  • miner – The miner to be prepared.

  • X (np.ndarray) – The structure space matrix, i.e., the matrix of coordinates representing the point cloud.

Returns:

The neighborhood radius, the function for spatial queries, and the serialized KDTree.

Return type:

float, callable, bytes

static prepare_chunks(miner, X)

Prepare the chunks for the parallel computation of the data mining.

Parameters:
  • miner – The miner for which the chunks must be prepared.

  • X (np.ndarray) – The structure space matrix, i.e., the matrix of coordinates representing the point cloud.

Returns:

The number of chunks and the chunk size.

Return type:

int, int

static prepare_chunk(miner, X_chunk, kdt)

Compute the subchunk configuration and deserialize the KDTree so the chunk can be computed.

Parameters:
  • miner – The miner for which the chunks must be prepared.

  • X_chunk – The structure space matrix representing the chunk.

  • kdt – The serialized KDTree.

Returns:

Number of subchunks, subchunk size, and deserialized KDTree

Return type:

int, int, KDTree

static prepare_subchunk(miner, subchunk_idx, subchunk_size, X_chunk, kdt, neighborhood_f)

Prepare the given subchunk, so it can be passed to the method that mines the features on a given subchunk.

Parameters:
  • miner – The miner for which the chunks must be prepared.

  • subchunk_idx – The index of the subchunk to be prepared.

  • subchunk_size – The size of the subchunk.

  • X_chunk – The structure space matrix representing the chunk.

  • kdt – The deserialized KDTree to speed up the spatial queries.

  • neighborhood_f – The neighborhood function defining the spatial queries.

Returns:

The structure space matrix representing the subchunk, and the indices of the neighborhoods.

Return type:

np.ndarray, list of list

static prepare_mined_features_chunk(Fhat_chunk, Fhat_sub)

Prepare the mined features from the chunk considering the current mined features and those for the current subchunk.

Parameters:
  • Fhat_chunk – The already mined features so far.

  • Fhat_sub – The features for the current subchunk.

Returns:

The mines features for the chunk.

Return type:

np.ndarray