src.clustering.bivariate_critical_clusterer

Classes

BivariateCriticalClusterer(**kwargs)

class src.clustering.bivariate_critical_clusterer.BivariateCriticalClusterer(**kwargs)

Author:: Alberto M. Esmoris Pena

Bivariate critical clustering consists of considering a third variable as a bivariate function of the first and second variables, e.g., the Monge form of a 3D surface \(z = \hat{z}(x, y)\). Then, it is possible to analyze the cylindrical neighborhood for each point in the point cloud \(\mathcal{N}(\pmb{x}_{i*}) = \left\{\pmb{x}_{k*} \in \mathcal{B}_{r}^{2}(\pmb{x}_{i*}) : 1 \leq k \leq m \right\}\), where \(\pmb{X} \in \mathbb{R}^{m \times 3}\) would be a point cloud of \(m \in \mathbb{Z}_{>0}\) points in three variables and \(\mathcal{B}_r^{2}(\pmb{x}_{i*})\) is the ball of radius \(r\) centered at the \(i\)-th point of the point cloud in a 2D Euclidean space (considering only the first and second variables).

First, a vector \(\pmb{z}^* \in \mathbb{R}^m\) must be computed such that \(z^*_i = \max_{x_{k3}} \quad \mathcal{N}(\pmb{x}_{i*})\), i.e., \(z^*_i\) will be the max value of the third variable in the neighborhood of the \(i\)-th point. Now, each point will be considered a maximum if and only if \(x_{i3} = z_{i}\). Alternatively, it can also be computed such that \(z^*_i = \min_{x_{k3}} \quad \mathcal{N}(\pmb{x}_{i*})\), that is why it is called critical clustering, because later on one cluster will be built for each critical point.

There are two ways to cluster the points to its corresponding critical point. The first one is the nearest neighbor method and the second one is the recursive application of a Euclidean distance-based region growing algorithm.

For the nearest neighbor method, each point is associated to the cluster of its closest neighbor in the space of the first two variables.

For the recursive region growing, the cylindrical neighborhood of each critical point is considered for a given radius \(r_a \in \mathbb{R}_{>0}\). Then, for each new point added to the cluster its cylindrical neighborhood with radius \(r_b \in \mathbb{R}_{>0}\) will be considered. Any point in this neighborhood will be added to the cluster and their neighborhoods will be recursively explored until no more points can be added. Optionally, a nearest neighbor correction can be applied to avoid having points without a cluster. It consists of applying the nearest neighbor method to any point that has not been clustered yet.

Variables:

precluster_name – The name of the attribute to be considered as the precluster. If None, then all points will be considered.
precluster_domain (list) – The domain of the precluster, i.e., the precluster labels to be considered.
critical_type (str) – The type of critical point to be considered, either "min" or "max".
radius (float) – The radius \(r\) for the neighborhoods when looking for critical points.
filter_criticals (bool) – Whether to filter the critical points (True) or not (False). Critical points will be filtered such that in their neighborhoods there is only a single critical point, i.e., when there is more than one critical point in a neighborhood only one of them will be preserved. This scenario is will typically happen when there are at least two points at the same height in the neighborhood.
x (str) – The name of the first variable.
y (str) – The name of the second variable.
z (str) – The name of the third variable.
strategy (dict) – The specification for the clustering strategy. It must be either a specification of the nearest neighbor method or the region growing algorithm.
chunk_size (int) – How many points per chunk for the parallel computations. If zero is given, then all the points will belong to the same chunk.
nthreads (int) – How many chunks will be run in parallel at the same time. If \(-1\) is given, then all threads will be used.
kdt_nthreads (int) – How many threads will be used to speedup the spatial queries through the KDTrees. Note that the final number of parallel jobs will be \(\text{nthreads} \times \text{kdt_nthreads}\).

static extract_clustering_args(spec)

Extract the arguments to initialize/instantiate a BivariateCriticalClusterer from a key-word specification.

Parameters:: spec – The key-word specification containing the arguments.
Returns:: The arguments to initialize/instantiate a BivariateCriticalClusterer.

__init__(**kwargs)

Initialize an instance of BivariateCriticalClusterer.

Parameters:: kwargs – The attributes of the BivariateCriticalClusterer that will also be passed to the parent.

fit(pcloud): The BivariateCriticalClusterer does not require any fit at all. See Clusterer and Clusterer.fit().

cluster(pcloud)

Apply bivariate critical clustering to the given point cloud.

See Clusterer and Clusterer()

extract_variables(pcloud)

Extract relevant variables (those specified through self.x, self.y, and self.z) from the given point cloud.

Parameters:: pcloud (PointCloud) – The PointCloud from where the variables must be extracted.

find_criticals(X)

Find the critical points in the given point cloud, each representing a cluster.

Parameters:: X (np.ndarray) – The 3D point cloud where the critical points must be found. The first and second column give the variables in the plane and the third column gives the variable for which critical points must be found.

compute_critical_clusters(X, crit_mask)

Compute the final clusters according to the given strategy specification.

See BivariateCriticalClusterer.compute_nearest_neighbor_method() and BivariateCriticalClusterer.compute_recursive_region_growing() .

Parameters:: X (np.ndarray) – The 3D point cloud where from which points the clusters must be computed. The first and second column give the variables in the plane and the third column gives the variable for which critical points must be found.

add_critical_labels_to_point_cloud(pcloud, crit_mask)

Add labels to the critical points. Note that zero means the point is not a critical one.

Parameters:

pcloud (PointCloud) – The point cloud to which the critical labels must be added.
crit_mask (np.ndarray) – The boolean mask where critical point are flagged as True.

Returns:

The input point cloud (updated inplace to add the critical labels).

Return type:

PointCloud

static convert_cluster_labels_type(c, crit_mask=None)

Convert the integer type of the given cluster labels to use as few bytes as possible (on an element wise fashion) to represent the clusters.

Parameters:

c (np.ndarray) – The point-wise cluster labels.
crit_mask (np.ndarray or None) – The point-wise boolean mask identifying the critical points. If given, the number of required bytes will be estimated from this mask instead of the cluster labels. If not given, the number of required bytes will be estimated from the cluster labels.

compute_nearest_neighbor_method(X, crit_mask)

Apply the nearest neighbor method to compute the clusters from the critical points.

Parameters:

X (np.ndarray) – The point cloud to be clustered.
crit_mask (np.ndarray of bool) – The boolean mask specifying whether a point is a critical point (true) or not (false).

Returns:

The point-wise cluster labels.

Return type:

np.ndarray

compute_recursive_region_growing(X, crit_mask)

Apply the recursive region growing algorithm to compute the clusters from the critical points.

Parameters:

X (np.ndarray) – The point cloud to be clustered.
crit_mask (np.ndarray of bool) – The boolean mask specifying whether a point is a critical point (true) or not (false).

Returns:

The point-wise cluster labels.

Return type:

np.ndarray

compute_region_growing_first_stage(X2D, z, c, max_iters, growing_radius, correction)

Compute the first stage of the region growing strategy. This first stage starts with finding the anti-critical points. If the searched critical points are the maximum, the anti-critical points will be the minimum, and vice versa. In the first stage, the clusters will grow by considering points in any neighborhood of a previously clustered point that is at the same height or below the center (if the critical points are maximum) or at the same height or above the center (if the critical points are minimum).

Note that the output of the first stage can be corrected by computing the 2D concave hull of each cluster to consider any point inside this concave hull as part of the cluster.

See BivariateCriticalClusterer.compute_recursive_region_growing().

Parameters:

X2D (np.ndarray) – The 2D structure space representing the point cloud (it includes the \(x\) and \(y\) coordinates).
z (np.ndarray) – The point-wise vertical coordinates.
c (np.ndarray) – The point-wise cluster labels.
max_iters (int) – The maximum number of iterations for the first stage clustering.
growing_radius (float) – The radius for the first stage of the growing regions. The neighborhoods in the search space will be cylinders governed by this radius.
correction (bool) – Boolean flag governing whether the concave hull-based correction must be applied to the first stage clustering (True) or not (False).

static apply_first_stage_correction(c, c_dom, X2D, non_clustered, non_clustered_X2D, kdt)

Method that chunk-wise applies the concave hull-based correction to the first stage of the region growing clustering.

See BivariateCriticalClusterer.compute_region_growing_first_stage().

Parameters:

c (np.ndarray) – The point-wise cluster labels.
c_dom (list) – The \([a, b]\) interval of cluster labels defining the domain for the current chunk.
X2D (np.ndarray) – The 2D structure space representing the point cloud (it includes the \(x\) and \(y\) coordinates).
non_clustered (np.ndarray) – The array with the indices of points that do not belong to a cluster yet.
non_clustered_X2D (np.ndarray) – The 2D structure space representing the non-clustered points.
kdt – The KDTree built on the non-clustered 2D structure space.

compute_region_growing_second_stage(c, X2D, growing_radius, max_iters)

Compute the second stage of the region growing strategy. This second stage consists of growing the clusters by considering the neighborhood of each point in the cluster and adding the points in this neighborhood to the cluster until it is not possible to grow further considering the radius governing the neighborhood.

See BivariateCriticalClusterer.compute_recursive_region_growing().

Parameters:

c (np.ndarray) – The point-wise cluster labels.
X2D (np.ndarray) – The 2D structure space representing the point cloud (it includes the \(x\) and \(y\) coordinates).
growing_radius (float) – The radius for the first stage of the growing regions. The neighborhoods in the search space will be cylinders governed by this radius.
max_iters (int) – The maximum number of iterations for the first stage clustering.