src.clustering.dbscan_clusterer

Classes

DBScanClusterer(**kwargs)

class src.clustering.dbscan_clusterer.DBScanClusterer(**kwargs)
Author:

Alberto M. Esmorís Pena

DBScan clustering on the structure space \(\pmb{X} \in \mathbb{R}^{m \times n_x}\). It supports filtering by discrete categorical values (e.g., classifications), i.e., one DBScan on the subspace of the Euclidean space that contains only points belonging to a given cluster (classes, and categorical predictions are clusters in this context).

More formally, let \(\pmb{x_{i*}} \in \mathbb{R}^{n_x}\) be a point in the structure space, with \(y_i \in \mathbb{Z}_{\geq 0}\) the integer that represents the cluster to which point \(i\) belongs.

This DBScan clustering component can be applied once to all points \(\pmb{X} \in \mathbb{R}^{m \times 3}\). Alternatively, it can be applied \(K \in \mathbb{Z}_{>1}\) times. In this last case, consider \(\pmb{X_1} \in \mathbb{R}^{m_1 \times n_x}, \ldots, \pmb{X_K} \in \mathbb{R}^{m_K \times n_x}\) as the \(K\) structure spaces, and compute a DBScan on each of them. The \(m_k\) points in \(\pmb{X_k}_{m_k \times n_x}\) must represent the set of points \(\biggl\{{\pmb{x_j*} : y_j = k}\biggr\}\).

Variables:
  • precluster_name (str or None) – The name of the attribute to be considered as the precluster. If None, then all points will be considered at once instead of partitioned by previous clusters.

  • precluster_domain (list or tuple of str) – The domain of the precluster, i.e., the precluster labels to be considered. If not given, then any unique precluster label will be considered.

  • min_points (int) – The minimum number of points in the neighborhood so the center point can be considered a kernel point.

  • radius – The radius of the neighborhood (typically a spherical neighborhood) for spatial queries.

static extract_clustering_args(spec)

Extract the arguments to initialize/instantiate a DBScanClusterer from a key-word specification.

Parameters:

spec – The key-word specification containing the arguments.

Returns:

The arguments to initialize/instantiate a DBScanClusterer.

__init__(**kwargs)

Initialize an instance of DBScanClusterer.

Parameters:

kwargs – The attributes of the DBScanClusterer that will also be passed to the parent.

fit(pcloud)

The DBScanClusterer does not require any fit at all. See Clusterer and Clusterer.fit().

cluster(pcloud)

Apply DBScan clustering to the given point cloud.

See Clusterer and Clusterer.cluster().

do_dbscan(X, c, cluster_idx)

Compute a density-based spatial clustering of applications with noise (DBSCAN).

Parameters:
  • X (np.ndarray) – The input structure space.

  • c (np.ndarray) – The vector of point-wise cluster labels for the points in X.

  • cluster_idx – The cluster index for the first cluster.

Returns:

The least cluster index greater than the highest cluster index assigned to any point.

Return type:

int