src.clustering.postproc.cluster_marker

Classes

ClusterMarker(**kwargs)

class src.clustering.postproc.cluster_marker.ClusterMarker(**kwargs)
Author:

Alberto M. Esmoris Pena

Clustering post-processor that computes a point representing each cluster. Note that if the clusters are georeferenced, the point will also be georeferenced. See ClusteringPostProcessor.

Variables:
  • strategy (str) – The strategy to compute the point representing each cluster. Supported strategies are "centroid" (see ClusterMaker.compute_cluster_centroid()), "midrange" (see ClusterMarker.compute_cluster_midrange()), "medianoid" (see ClusterMarker.compute_cluster_medianoid()), "medoid" (see ClusterMarker.compute_cluster_medoid()), and "geometric_median" (see ClusterMarker.computer_cluster_geometric_median()).

  • epsg (int or None) – The number in the European Petroleum Survey Group (EPSG) standard representing a coordinate reference system (CRS). If given, it will be used to export a .prj file when using shapefile as output format. This .prj file will contain the well known text (WKT) representing the projection information to georeference the markers in the shapefile.

  • nthreads (int) – The number of threads to be used for the parallel computation of the markers. Note that -1 implies using as many threads as available cores in the system.

  • output_path (str) – the path where the marks will be exported. The output type will be CSV for any extension but shape file for .shp extension.

__init__(**kwargs)

Initialize a ClusterMarker post-processor.

See ClusteringPostProcessor.__init__().

Parameters:

kwargs – The key-word arguments for the initialization of the ClusterMarker.

__call__(clusterer, pcloud, out_prefix=None)

Post-process the given point cloud with clusters to compute the cluster-wise marks.

Parameters:
  • clusterer (Clusterer) – The clusterer that generated the clusters.

  • pcloud (PointCloud) – The point cloud to be post-processed.

  • out_prefix (str or None) – The output prefix in case path expansion must be applied.

Returns:

The post-processed point cloud.

Return type:

PointCloud

compute_cluster_centroid(X)

Compute the centroid as the mark representing the cluster.

\[\pmb{c} = \dfrac{1}{m_c} \sum_{i=1}^{m_c}{\pmb{x}_{i*}}\]
Parameters:

X (np.ndarray) – The structure space matrix representing the cluster \(\pmb{X} \in \mathbb{R}^{m_c \times 3}\).

Returns:

The point representing the cluster, i.e., the cluster’s mark.

Return type:

np.ndarray

compute_cluster_midrange(X)

Compute the midrange as the mark representing the cluster.

\[c_j = \dfrac{ \min \; \{x_{ij}\}_{i=1}^{m_c} + \max \; \{x_{ij}\}_{i=1}^{m_c} }{ 2 }\]
Parameters:

X (np.ndarray) – The structure space matrix representing the cluster \(\pmb{X} \in \mathbb{R}^{m_c \times 3}\).

Returns:

The point representing the cluster, i.e., the cluster’s mark.

Return type:

np.ndarray

compute_cluster_medianoid(X)

Compute the medianoid as the mark representing the cluster.

\[c_j = \operatorname{median}(\pmb{x}_{*j})\]
Parameters:

X (np.ndarray) – The structure space matrix representing the cluster \(\pmb{X} \in \mathbb{R}^{m_c \times 3}\).

Returns:

The point representing the cluster, i.e., the cluster’s mark.

Return type:

np.ndarray

compute_cluster_medoid(X)

Compute the medoid as the mark representing the cluster.

\[\pmb{c} = \operatorname*{arg min}_{\pmb{x}_{i*}} \quad \sum_{k=1}^{m_c}{ \lVert\pmb{x}_{i*} - \pmb{x}_{k*}\rVert^2 }\]
Parameters:

X (np.ndarray) – The structure space matrix representing the cluster \(\pmb{X} \in \mathbb{R}^{m_c \times 3}\).

Returns:

The point representing the cluster, i.e., the cluster’s mark.

Return type:

np.ndarray

compute_cluster_geometric_median(X)

Compute the approximated geometric median using the Weisfeld’s algorithm.

\[\pmb{c}_{k+1} = \left( \sum_{i=1}^{m_c}{ \lVert\pmb{x}_{i*} - \pmb{c}_k\rVert^{-1} }\right)^{-1} \sum_{i=1}^{m_c}{ \lVert\pmb{x}_{i*} - \pmb{c}_k\rVert^{-1} \pmb{x}_{i*} }\]
Parameters:

X (np.ndarray) – The structure space matrix representing the cluster \(\pmb{X} \in \mathbb{R}^{m_c \times 3}\).

Returns:

The point representing the cluster, i.e., the cluster’s mark.

Return type:

np.ndarray

export_marks(marks, out_prefix=None)

Write the marks representing the clusters to an output file.

Parameters:
  • marks (np.ndarray) – The marks (points) to be exported.

  • out_prefix (str or None) – The output prefix in case path expansion must be applied.

Returns:

Nothing at all, but the cluster marks are written to the output file.

static export_marks_to_shapefile(marks, outpath, epsg=None, proj_str=None)

Assist ClusterMarker.export_marks() in writing shape files.

Parameters:
  • marks (np.ndarray) – The marks to be written.

  • outpath (str) – The path where the shape file must be written.

  • epsg (int or None) – The EPSG code identifying the coordinate reference system.

  • proj_str (str) – The projection string identifying the coordinate reference system.

Returns:

Nothing at all, but the marks are written to the output file.

static export_marks_to_csv(marks, outpath)

Assist ClusterMarker.export_marks() in writing CSV files.

Parameters:
  • marks (np.ndarray) – The marks to be written.

  • outpath (str) – The path where the CSV file must be written.

Returns:

Nothing at all, but the marks are written to the output file.