clustering package

Subpackages

Submodules

clustering.bivariate_critical_clusterer module

class clustering.bivariate_critical_clusterer.BivariateCriticalClusterer(**kwargs)

Bases: Clusterer

Author:

Alberto M. Esmoris Pena

Bivariate critical clustering consists of considering a third variable as a bivariate function of the first and second variables, e.g., the Monge form of a 3D surface \(z = \hat{z}(x, y)\). Then, it is possible to analyze the cylindrical neighborhood for each point in the point cloud \(\mathcal{N}(\pmb{x}_{i*}) = \left\{\pmb{x}_{k*} \in \mathcal{B}_{r}^{2}(\pmb{x}_{i*}) : 1 \leq k \leq m \right\}\), where \(\pmb{X} \in \mathbb{R}^{m \times 3}\) would be a point cloud of \(m \in \mathbb{Z}_{>0}\) points in three variables and \(\mathcal{B}_r^{2}(\pmb{x}_{i*})\) is the ball of radius \(r\) centered at the \(i\)-th point of the point cloud in a 2D Euclidean space (considering only the first and second variables).

First, a vector \(\pmb{z}^* \in \mathbb{R}^m\) must be computed such that \(z^*_i = \max_{x_{k3}} \quad \mathcal{N}(\pmb{x}_{i*})\), i.e., \(z^*_i\) will be the max value of the third variable in the neighborhood of the \(i\)-th point. Now, each point will be considered a maximum if and only if \(x_{i3} = z_{i}\). Alternatively, it can also be computed such that \(z^*_i = \min_{x_{k3}} \quad \mathcal{N}(\pmb{x}_{i*})\), that is why it is called critical clustering, because later on one cluster will be built for each critical point.

There are two ways to cluster the points to its corresponding critical point. The first one is the nearest neighbor method and the second one is the recursive application of a Euclidean distance-based region growing algorithm.

For the nearest neighbor method, each point is associated to the cluster of its closest neighbor in the space of the first two variables.

For the recursive region growing, the cylindrical neighborhood of each critical point is considered for a given radius \(r_a \in \mathbb{R}_{>0}\). Then, for each new point added to the cluster its cylindrical neighborhood with radius \(r_b \in \mathbb{R}_{>0}\) will be considered. Any point in this neighborhood will be added to the cluster and their neighborhoods will be recursively explored until no more points can be added. Optionally, a nearest neighbor correction can be applied to avoid having points without a cluster. It consists of applying the nearest neighbor method to any point that has not been clustered yet.

Variables:
  • precluster_name – The name of the attribute to be considered as the precluster. If None, then all points will be considered.

  • precluster_domain (list) – The domain of the precluster, i.e., the precluster labels to be considered.

  • critical_type (str) – The type of critical point to be considered, either "min" or "max".

  • radius (float) – The radius \(r\) for the neighborhoods when looking for critical points.

  • filter_criticals (bool) – Whether to filter the critical points (True) or not (False). Critical points will be filtered such that in their neighborhoods there is only a single critical point, i.e., when there is more than one critical point in a neighborhood only one of them will be preserved. This scenario is will typically happen when there are at least two points at the same height in the neighborhood.

  • x (str) – The name of the first variable.

  • y (str) – The name of the second variable.

  • z (str) – The name of the third variable.

  • strategy (dict) – The specification for the clustering strategy. It must be either a specification of the nearest neighbor method or the region growing algorithm.

  • chunk_size (int) – How many points per chunk for the parallel computations. If zero is given, then all the points will belong to the same chunk.

  • nthreads (int) – How many chunks will be run in parallel at the same time. If \(-1\) is given, then all threads will be used.

  • kdt_nthreads (int) – How many threads will be used to speedup the spatial queries through the KDTrees. Note that the final number of parallel jobs will be \(\text{nthreads} \times \text{kdt_nthreads}\).

static extract_clustering_args(spec)

Extract the arguments to initialize/instantiate a BivariateCriticalClusterer from a key-word specification.

Parameters:

spec – The key-word specification containing the arguments.

Returns:

The arguments to initialize/instantiate a BivariateCriticalClusterer.

__init__(**kwargs)

Initialize an instance of BivariateCriticalClusterer.

Parameters:

kwargs – The attributes of the BivariateCriticalClusterer that will also be passed to the parent.

fit(pcloud)

The BivariateCriticalClusterer does not require any fit at all. See Clusterer and Clusterer.fit().

cluster(pcloud)

Apply bivariate critical clustering to the given point cloud.

See Clusterer and Clusterer()

extract_variables(pcloud)

Extract relevant variables (those specified through self.x, self.y, and self.z) from the given point cloud.

Parameters:

pcloud (PointCloud) – The PointCloud from where the variables must be extracted.

find_criticals(X)

Find the critical points in the given point cloud, each representing a cluster.

Parameters:

X (np.ndarray) – The 3D point cloud where the critical points must be found. The first and second column give the variables in the plane and the third column gives the variable for which critical points must be found.

compute_critical_clusters(X, crit_mask)

Compute the final clusters according to the given strategy specification.

See BivariateCriticalClusterer.compute_nearest_neighbor_method() and BivariateCriticalClusterer.compute_recursive_region_growing() .

Parameters:

X (np.ndarray) – The 3D point cloud where from which points the clusters must be computed. The first and second column give the variables in the plane and the third column gives the variable for which critical points must be found.

add_critical_labels_to_point_cloud(pcloud, crit_mask)

Add labels to the critical points. Note that zero means the point is not a critical one.

Parameters:
  • pcloud (PointCloud) – The point cloud to which the critical labels must be added.

  • crit_mask (np.ndarray) – The boolean mask where critical point are flagged as True.

Returns:

The input point cloud (updated inplace to add the critical labels).

Return type:

PointCloud

static convert_cluster_labels_type(c, crit_mask=None)

Convert the integer type of the given cluster labels to use as few bytes as possible (on an element wise fashion) to represent the clusters.

Parameters:
  • c (np.ndarray) – The point-wise cluster labels.

  • crit_mask (np.ndarray or None) – The point-wise boolean mask identifying the critical points. If given, the number of required bytes will be estimated from this mask instead of the cluster labels. If not given, the number of required bytes will be estimated from the cluster labels.

compute_nearest_neighbor_method(X, crit_mask)

Apply the nearest neighbor method to compute the clusters from the critical points.

Parameters:
  • X (np.ndarray) – The point cloud to be clustered.

  • crit_mask (np.ndarray of bool) – The boolean mask specifying whether a point is a critical point (true) or not (false).

Returns:

The point-wise cluster labels.

Return type:

np.ndarray

compute_recursive_region_growing(X, crit_mask)

Apply the recursive region growing algorithm to compute the clusters from the critical points.

Parameters:
  • X (np.ndarray) – The point cloud to be clustered.

  • crit_mask (np.ndarray of bool) – The boolean mask specifying whether a point is a critical point (true) or not (false).

Returns:

The point-wise cluster labels.

Return type:

np.ndarray

compute_region_growing_first_stage(X2D, z, c, max_iters, growing_radius, correction)

Compute the first stage of the region growing strategy. This first stage starts with finding the anti-critical points. If the searched critical points are the maximum, the anti-critical points will be the minimum, and vice versa. In the first stage, the clusters will grow by considering points in any neighborhood of a previously clustered point that is at the same height or below the center (if the critical points are maximum) or at the same height or above the center (if the critical points are minimum).

Note that the output of the first stage can be corrected by computing the 2D concave hull of each cluster to consider any point inside this concave hull as part of the cluster.

See BivariateCriticalClusterer.compute_recursive_region_growing().

Parameters:
  • X2D (np.ndarray) – The 2D structure space representing the point cloud (it includes the \(x\) and \(y\) coordinates).

  • z (np.ndarray) – The point-wise vertical coordinates.

  • c (np.ndarray) – The point-wise cluster labels.

  • max_iters (int) – The maximum number of iterations for the first stage clustering.

  • growing_radius (float) – The radius for the first stage of the growing regions. The neighborhoods in the search space will be cylinders governed by this radius.

  • correction (bool) – Boolean flag governing whether the concave hull-based correction must be applied to the first stage clustering (True) or not (False).

static apply_first_stage_correction(c, c_dom, X2D, non_clustered, non_clustered_X2D, kdt)

Method that chunk-wise applies the concave hull-based correction to the first stage of the region growing clustering.

See BivariateCriticalClusterer.compute_region_growing_first_stage().

Parameters:
  • c (np.ndarray) – The point-wise cluster labels.

  • c_dom (list) – The \([a, b]\) interval of cluster labels defining the domain for the current chunk.

  • X2D (np.ndarray) – The 2D structure space representing the point cloud (it includes the \(x\) and \(y\) coordinates).

  • non_clustered (np.ndarray) – The array with the indices of points that do not belong to a cluster yet.

  • non_clustered_X2D (np.ndarray) – The 2D structure space representing the non-clustered points.

  • kdt – The KDTree built on the non-clustered 2D structure space.

compute_region_growing_second_stage(c, X2D, growing_radius, max_iters)

Compute the second stage of the region growing strategy. This second stage consists of growing the clusters by considering the neighborhood of each point in the cluster and adding the points in this neighborhood to the cluster until it is not possible to grow further considering the radius governing the neighborhood.

See BivariateCriticalClusterer.compute_recursive_region_growing().

Parameters:
  • c (np.ndarray) – The point-wise cluster labels.

  • X2D (np.ndarray) – The 2D structure space representing the point cloud (it includes the \(x\) and \(y\) coordinates).

  • growing_radius (float) – The radius for the first stage of the growing regions. The neighborhoods in the search space will be cylinders governed by this radius.

  • max_iters (int) – The maximum number of iterations for the first stage clustering.

clustering.clusterer module

exception clustering.clusterer.ClusteringException(message='')

Bases: VL3DException

Author:

Alberto M. Esmoris Pena

Class for exceptions related to clustering components See VL3DException.

__init__(message='')
class clustering.clusterer.Clusterer(**kwargs)

Bases: object

Author:

Alberto M. Esmoris Pena.

Interface governing any clustering component.

Variables:
  • cluster_name (str) – The name for the computed clusters. It will be used to reference the cluster column in the output point cloud.

  • post_clustering (None or list of callable)

static extract_clustering_args(spec)

Extract the arguments to initialize/instantiate a Clusterer from a key-word specification.

Parameters:

spec – The key-word specification containing the arguments.

Returns:

The arguments to initialize/instantiate a Clusterer.

__init__(**kwargs)

Initialize a Clusterer.

Parameters:

kwargs – The key-word arguments for the initialization of any Clusterer. It must contain the name of the cluster to be computed.

fit(pcloud)

Fit a clustering model to a given input point cloud.

Parameters:

pcloud – The input point cloud to be used to fit the clustering model.

Returns:

The clusterer itself, for fluent programming purposes.

Return type:

Clusteror

abstractmethod cluster(pcloud)

Clustering from a given input point cloud.

Parameters:

pcloud – The input point cloud for which clusters must be found.

Returns:

The point cloud extended with the clusters.

Return type:

PointCloud

post_process(pcloud)

Run the post-processing pipeline on the given input point cloud.

Parameters:

pcloud (PointCloud) – The input point cloud for the components in the post-processing pipeline.

Returns:

The post-processed point cloud. Sometimes it will be exactly the same input point cloud because some post-processing components generate their output directly to a file.

Return type:

PointCloud

fit_cluster_and_post_process(pcloud, out_prefix=None)

Compute the fitting, clustering, and post-processing as a whole.

See Clusterer.fit(), Clusterer.cluster(), and Clusterer.post_process().

Parameters:
  • pcloud (PointCloud) – The input point cloud to be used to fit the clustering model.

  • out_prefix (str or None) – If given, it will be used to replace the default output prefix of the clusterer inside the call’s context.

Returns:

The point cloud extended with the clusters.

Return type:

PointCloud

add_cluster_labels_to_point_cloud(pcloud, c)

Add given cluster labels to the given point cloud.

Parameters:
  • pcloud (PointCloud) – The point cloud to which the cluster labels must be added.

  • c (np.ndarray) – The cluster labels.

Returns:

The input point cloud (updated inplace to add the cluster labels).

Return type:

PointCloud

clustering.dbscan_clusterer module

class clustering.dbscan_clusterer.DBScanClusterer(**kwargs)

Bases: Clusterer

Author:

Alberto M. Esmorís Pena

DBScan clustering on the structure space \(\pmb{X} \in \mathbb{R}^{m \times n_x}\). It supports filtering by discrete categorical values (e.g., classifications), i.e., one DBScan on the subspace of the Euclidean space that contains only points belonging to a given cluster (classes, and categorical predictions are clusters in this context).

More formally, let \(\pmb{x_{i*}} \in \mathbb{R}^{n_x}\) be a point in the structure space, with \(y_i \in \mathbb{Z}_{\geq 0}\) the integer that represents the cluster to which point \(i\) belongs.

This DBScan clustering component can be applied once to all points \(\pmb{X} \in \mathbb{R}^{m \times 3}\). Alternatively, it can be applied \(K \in \mathbb{Z}_{>1}\) times. In this last case, consider \(\pmb{X_1} \in \mathbb{R}^{m_1 \times n_x}, \ldots, \pmb{X_K} \in \mathbb{R}^{m_K \times n_x}\) as the \(K\) structure spaces, and compute a DBScan on each of them. The \(m_k\) points in \(\pmb{X_k}_{m_k \times n_x}\) must represent the set of points \(\biggl\{{\pmb{x_j*} : y_j = k}\biggr\}\).

Variables:
  • precluster_name (str or None) – The name of the attribute to be considered as the precluster. If None, then all points will be considered at once instead of partitioned by previous clusters.

  • precluster_domain (list or tuple of str) – The domain of the precluster, i.e., the precluster labels to be considered. If not given, then any unique precluster label will be considered.

  • min_points (int) – The minimum number of points in the neighborhood so the center point can be considered a kernel point.

  • radius – The radius of the neighborhood (typically a spherical neighborhood) for spatial queries.

static extract_clustering_args(spec)

Extract the arguments to initialize/instantiate a DBScanClusterer from a key-word specification.

Parameters:

spec – The key-word specification containing the arguments.

Returns:

The arguments to initialize/instantiate a DBScanClusterer.

__init__(**kwargs)

Initialize an instance of DBScanClusterer.

Parameters:

kwargs – The attributes of the DBScanClusterer that will also be passed to the parent.

fit(pcloud)

The DBScanClusterer does not require any fit at all. See Clusterer and Clusterer.fit().

cluster(pcloud)

Apply DBScan clustering to the given point cloud.

See Clusterer and Clusterer.cluster().

do_dbscan(X, c, cluster_idx)

Compute a density-based spatial clustering of applications with noise (DBSCAN).

Parameters:
  • X (np.ndarray) – The input structure space.

  • c (np.ndarray) – The vector of point-wise cluster labels for the points in X.

  • cluster_idx – The cluster index for the first cluster.

Returns:

The least cluster index greater than the highest cluster index assigned to any point.

Return type:

int

clustering.fps_decorated_clusterer module

class clustering.fps_decorated_clusterer.FPSDecoratedClusterer(**kwargs)

Bases: SamplingDecoratedClusterer

Author:

Alberto M. Esmoris Pena

Decorator for clusterers that makes the clustering process on an FPS-based representation of the point cloud.

The FPS Decorated Clusterer (FPSDecoratedClusterer) constructs a representation of the point cloud, then it runs the clustering process on this representation and, finally, it propagates the clusters back to the original point cloud.

See FPSDecoratorTransformer and SamplingDecoratedClusterer.

Variables:
  • decorated_clusterer_spec (dict) – See SamplingDecoratedClusterer.

  • fps_decorator_spec (dict) – The specification of the FPS transformation defining the decorator.

  • fps_decorator (FPSDecoratorTransformer) – The FPS decorator to be applied on input point clouds.

static extract_clustering_args(spec)

Extract the arguments to initialize/instantiate a FPSDecoratedClusterer from a key-word specification.

Parameters:

spec – The key-word specification containing the arguments.

Returns:

The arguments to initialize/instantiate a FPSDecoratedClusterer.

Return type:

dict

__init__(**kwargs)

Initialization for any instance of type FPSDecoratedClusterer.

transform_pcloud(pcloud)

Transform the given input point cloud to its FPS representation.

Parameters:

pcloud (PointCloud) – The input point cloud to be transformed.

Returns:

The FPS representation of the input point cloud.

Return type:

PointCloud

propagate(fps_pcloud, pcloud)

Propagate the cluster labels from the FPS version of the point cloud back to the original one.

See SamplingDecoratedClusterer._propagate().

clustering.min_dist_decorated_clusterer module

class clustering.min_dist_decorated_clusterer.MinDistDecoratedClusterer(**kwargs)

Bases: SamplingDecoratedClusterer

Author:

Alberto M. Esmoris Pena

Decorator for clusterers that makes the clustering process on a minimum distance decimation-based representation of the point cloud.

The minimum distance decorated clusterer (MinDistDecoratedClusterer) constructs a representation of the point cloud, then it runs the clustering process on this representation and, finally, it propagates the clusters back to the original point cloud.

See MinDistDecimatorDecorator and SamplingDecoratedClusterer.

Variables:
  • decorated_clusterer_spec (dict) – The specification of the decorated clusterer.

  • mindist_decorator_spec (dict) – The specification of the minimum distance decimation defining the decorator.

  • mindist_decorator (MinDistDecimatorDecorator) – The minimum distance decimator decorator to be applied on input point clouds.

static extract_clustering_args(spec)

Extract the arguments to initialize/instantiate a MinDistDecoratedClusterer from a key-word specification.

Parameters:

spec – The key-word specification containing the arguments.

Returns:

The arguments to initialize/instantiate a MinDistDecoratedClusterer.

Return type:

dict

__init__(**kwargs)

Initialization for any instance of type MinDistDecoratedClusterer.

transform_pcloud(pcloud)

Transform the given input point cloud to its minimum distance decimation-based representation.

Parameters:

pcloud (PointCloud) – The input point cloud to be transformed.

Returns:

The minimum distance decimation-based representation of the input point cloud.

Return type:

PointCloud

propagate(mdd_pcloud, pcloud)

Propagate the cluster labels from the FPS version of the point cloud back to the original one.

See SamplingDecoratedClusterer._propagate().

clustering.sampling_decorated_clusterer module

class clustering.sampling_decorated_clusterer.SamplingDecoratedClusterer(**kwargs)

Bases: Clusterer

Author:

Alberto M. Esmoris Pena

Abstract class providing the common logic for clusterer decorators that resample the point cloud.

Variables:

decorated_clusterer_spec (dict) – The specification of the decorated clusterer.

static extract_clustering_args(spec)

Extract the arguments to initialize/instantiate a FPSDecoratedClusterer from a key-word specification.

Parameters:

spec – The key-word specification containing the arguments.

Returns:

The arguments to initialize/instantiate a FPSDecoratedClusterer.

Return type:

dict

__init__(**kwargs)

Initialization for any instance of type SamplingDecoratedClusterer.

fit(pcloud)

Fit a clustering model to a given input point cloud.

In doing so, the clustering model is fit to a sampled representation of the input point cloud.

See Clusterer and Clusterer.fit().

cluster(pcloud)

Clustering from a given input point cloud.

In doing so, the clustering model is computed on a sampled representation of the input point cloud.

See Clusterer and Clusterer.cluster().

post_process(pcloud)

Run the post-processing pipeline on the given input point cloud.

In doing so, the post-processing pipeline is computed on a sampled representation of the input point cloud.

See Clusterer and Clusterer.post_process().

fit_cluster_and_post_process(pcloud, out_prefix=None)

Compute the fitting, clustering, and post-processing as a whole.

In doing so, the sampled representation of the input point cloud is computed once and used to fit, cluster, and post-process with the propagations applied after all the previous methods have been called.

See Clusterer and Clusterer.fit_cluster_and_post_process().

abstractmethod propagate(sampled_pcloud, pcloud)

Method that must be implemented by any non-abstract class extending SamplingDecoratedCluterer to provide the logic to propagate the cluster labels from the sampled point cloud back to the original one. In many cases the SamplingDecoratedClusterer._propagate() method has all the necessary logic provided that it is called with the adequate decorator.

Parameters:
  • sampled_pcloud (PointCloud) – The sampled point cloud whose cluster labels must be propagated to the original point cloud.

  • pcloud (PointCloud) – The original point cloud that will receive the cluster labels propagated from the sampled point cloud.

Returns:

The original point cloud updated with the cluster labels from the sampled point cloud.

Return type:

PointCloud.

Module contents

author:

Alberto M. Esmoris Pena

The model package contains the logic to handle clustering models and algorithms to segment point clouds. In general, clustering can also be seen as unsupervised learning for point clouds.