clustering.postproc package

Submodules

clustering.postproc.cluster_enveloper module

class clustering.postproc.cluster_enveloper.ClusterEnveloper(**kwargs)

Bases: ClusteringPostProcessor

Author:: Alberto M. Esmoris Pena

Clustering post-processor that computes the requested envelopes for each cluster. See ClusteringPostProcessor.

Variables:: envelopes – List of dictionaries defining each envelope.

__init__(**kwargs)

Initialize a ClusterEnveloper post-processor.

See ClusteringPostProcessor.__init__().

Parameters:: kwargs – The key-word arguments for the initialization of the ClusterEnveloper.

__call__(clusterer, pcloud, out_prefix=None)

Post-process the given point cloud with clusters to compute the cluster-wise envelopes.

Parameters:

clusterer (Clusterer) – The clusterer that generated the clusters.
pcloud (PointCloud) – The point cloud to be post-processed.
out_prefix (str or None) – The output prefix in case path expansion must be applied.

Returns:

The post-processed point cloud.

Return type:

PointCloud

compute_aabb_envelope(spec, X, cidx)

Compute the axis-aligned bounding box for the given points.

Parameters:

spec (dict) – The specification of the axis aligned bounding box to be computed.
X (np.ndarray) – The structure space matrix representing the cluster whose axis-aligned bounding box must be found.
cidx (int) – The index of the cluster whose envelope must be computed.

Returns:

Nothing, but the generated axis-aligned bounding box is stored in the member attribute aabbs.

compute_bbox_envelope(spec, X, cidx)

Compute the bounding box for the given points.

Parameters:

spec (dict) – The specification of the bounding box to be computed.
X (np.ndarray) – The structure space matrix representing the cluster whose bounding box must be found.
cidx (int) – The index of the cluster whose envelope must be computed.

Returns:

Nothing, but the generated bounding box is stored in the member attribute bboxes.

export_aabb_envelopes(out_prefix=None)

Export the axis-aligned bounding box envelopes stored as member attributes.

Returns:: Nothing at all, but the bounding boxes are written to files.

export_bbox_envelopes(out_prefix=None)

Export the bounding box envelopes stored as member attributes.

Returns:: Nothing at all, but the bounding boxes are written to files.

clustering.postproc.cluster_marker module

class clustering.postproc.cluster_marker.ClusterMarker(**kwargs)

Bases: ClusteringPostProcessor

Author:: Alberto M. Esmoris Pena

Clustering post-processor that computes a point representing each cluster. Note that if the clusters are georeferenced, the point will also be georeferenced. See ClusteringPostProcessor.

Variables:

strategy (str) – The strategy to compute the point representing each cluster. Supported strategies are "centroid" (see ClusterMaker.compute_cluster_centroid()), "midrange" (see ClusterMarker.compute_cluster_midrange()), "medianoid" (see ClusterMarker.compute_cluster_medianoid()), "medoid" (see ClusterMarker.compute_cluster_medoid()), and "geometric_median" (see ClusterMarker.computer_cluster_geometric_median()).
epsg (int or None) – The number in the European Petroleum Survey Group (EPSG) standard representing a coordinate reference system (CRS). If given, it will be used to export a .prj file when using shapefile as output format. This .prj file will contain the well known text (WKT) representing the projection information to georeference the markers in the shapefile.
nthreads (int) – The number of threads to be used for the parallel computation of the markers. Note that -1 implies using as many threads as available cores in the system.
output_path (str) – the path where the marks will be exported. The output type will be CSV for any extension but shape file for .shp extension.

__init__(**kwargs)

Initialize a ClusterMarker post-processor.

See ClusteringPostProcessor.__init__().

Parameters:: kwargs – The key-word arguments for the initialization of the ClusterMarker.

__call__(clusterer, pcloud, out_prefix=None)

Post-process the given point cloud with clusters to compute the cluster-wise marks.

Parameters:

clusterer (Clusterer) – The clusterer that generated the clusters.
pcloud (PointCloud) – The point cloud to be post-processed.
out_prefix (str or None) – The output prefix in case path expansion must be applied.

Returns:

The post-processed point cloud.

Return type:

PointCloud

compute_cluster_centroid(X)

Compute the centroid as the mark representing the cluster.

\[\pmb{c} = \dfrac{1}{m_c} \sum_{i=1}^{m_c}{\pmb{x}_{i*}}\]

Parameters:: X (np.ndarray) – The structure space matrix representing the cluster \(\pmb{X} \in \mathbb{R}^{m_c \times 3}\).
Returns:: The point representing the cluster, i.e., the cluster’s mark.
Return type:: np.ndarray

compute_cluster_midrange(X)

Compute the midrange as the mark representing the cluster.

\[c_j = \dfrac{ \min \; \{x_{ij}\}_{i=1}^{m_c} + \max \; \{x_{ij}\}_{i=1}^{m_c} }{ 2 }\]

Parameters:: X (np.ndarray) – The structure space matrix representing the cluster \(\pmb{X} \in \mathbb{R}^{m_c \times 3}\).
Returns:: The point representing the cluster, i.e., the cluster’s mark.
Return type:: np.ndarray

compute_cluster_medianoid(X)

Compute the medianoid as the mark representing the cluster.

\[c_j = \operatorname{median}(\pmb{x}_{*j})\]

Parameters:: X (np.ndarray) – The structure space matrix representing the cluster \(\pmb{X} \in \mathbb{R}^{m_c \times 3}\).
Returns:: The point representing the cluster, i.e., the cluster’s mark.
Return type:: np.ndarray

compute_cluster_medoid(X)

Compute the medoid as the mark representing the cluster.

\[\pmb{c} = \operatorname*{arg min}_{\pmb{x}_{i*}} \quad \sum_{k=1}^{m_c}{ \lVert\pmb{x}_{i*} - \pmb{x}_{k*}\rVert^2 }\]

Parameters:: X (np.ndarray) – The structure space matrix representing the cluster \(\pmb{X} \in \mathbb{R}^{m_c \times 3}\).
Returns:: The point representing the cluster, i.e., the cluster’s mark.
Return type:: np.ndarray

compute_cluster_geometric_median(X)

Compute the approximated geometric median using the Weisfeld’s algorithm.

\[\pmb{c}_{k+1} = \left( \sum_{i=1}^{m_c}{ \lVert\pmb{x}_{i*} - \pmb{c}_k\rVert^{-1} }\right)^{-1} \sum_{i=1}^{m_c}{ \lVert\pmb{x}_{i*} - \pmb{c}_k\rVert^{-1} \pmb{x}_{i*} }\]

Parameters:: X (np.ndarray) – The structure space matrix representing the cluster \(\pmb{X} \in \mathbb{R}^{m_c \times 3}\).
Returns:: The point representing the cluster, i.e., the cluster’s mark.
Return type:: np.ndarray

export_marks(marks, out_prefix=None)

Write the marks representing the clusters to an output file.

Parameters:

marks (np.ndarray) – The marks (points) to be exported.
out_prefix (str or None) – The output prefix in case path expansion must be applied.

Returns:

Nothing at all, but the cluster marks are written to the output file.

static export_marks_to_shapefile(marks, outpath, epsg=None, proj_str=None)

Assist ClusterMarker.export_marks() in writing shape files.

Parameters:

marks (np.ndarray) – The marks to be written.
outpath (str) – The path where the shape file must be written.
epsg (int or None) – The EPSG code identifying the coordinate reference system.
proj_str (str) – The projection string identifying the coordinate reference system.

Returns:

Nothing at all, but the marks are written to the output file.

static export_marks_to_csv(marks, outpath)

Assist ClusterMarker.export_marks() in writing CSV files.

Parameters:

marks (np.ndarray) – The marks to be written.
outpath (str) – The path where the CSV file must be written.

Returns:

Nothing at all, but the marks are written to the output file.

clustering.postproc.cluster_selector module

class clustering.postproc.cluster_selector.ClusterSelector(**kwargs)

Bases: ClusteringPostProcessor

Author:: Alberto M. Esmoris Pena

Clustering post-processor that filters the clusters, i.e., discards those that do not satisfy the specified requirements or preserve only those that satisfy some given requisites.

See ClusteringPostProcessor.

Parameters:

filters (list of dict) –

The list with the specification of the preserve and discard actions with their requirements. The structure of each dictionary in the list is as follows:

– attribute

The attribute for which the relational condition/requirement is specified. Supported attributes are:

– "number_of_points": The number of points in the cluster.
– "surface_area": The surface area in the \((x, y)\) plane of each cluster understood as the area of the convex hull that contains the cluster.
– "volume": The volume of each cluster understood as the volume of the 3D convex hull that contains the cluster.
– "surface_density": The number of points divided by the surface area.
– "volume_density": The number of points divided by the volume.
– "x_length": The difference between the max and min points of the cluster along the \(x\)-axis.
– "y_length": The difference between the max and min points of the cluster along the \(y\)-axis.
– "z_length": The difference between the max and min points of the cluster along the \(z\)-axis.

– relational

The relational governing whether the condition/requirement is satisfied or not. Supported relationals are "not_equals" \(x \neq y\), "equals" \(x = y\), "less_than" \(x < y\), "less_than_or_equal_to" \(x \leq y\), "greater_than" :math`x > y`, "greater_than_or_equal_to" \(x \geq y\), "in" \(x \in S\), "not_in" \(x \notin S\), and "inside" \(x \in [a, b] \subset \mathbb{R}\).

– target

The target value for the right hand side of the relational. It can be either an integer, a float or a list. Lists are used for "in", "not_in" and "inside" relationals and concretely for "inside" the list must have exactly two elements.

– action

Either "preserve" to keep those clusters that satisfy the relational condition or "discard" to discard clusters that satisfy the relational condition.

__init__(**kwargs)

Initialize a ClusterSelector post-processor.

See ClusteringPostProcessor.__init__().

Parameters:: kwargs – The key-word arguments for the initialization of the ClusterSelector.

__call__(clusterer, pcloud, out_prefix=None)

Post-process the given point cloud with clusters to discard those clusters that do not pass the requested filters.

Parameters:

clusterer (Clusterer) – The clusterer that generated the clusters.
pcloud (PointCloud) – The point cloud to be post-processed.
out_prefix (str or None) – The output prefix in case path expansion must be applied.

Returns:

The post-processed point cloud.

Return type:

PointCloud

determine_attributes()

Determine the attributes that must be computed for each clustering from the filters specification.

Returns:: A dictionary-like look-up table whose keys are the names of the attributes that must be computed and whose values are the indices of those values in the cluster-wise feature space matrix.
Return type:: dict

compute_attributes(alut, X, c, c_dom)

Compute the attributes for each cluster.

Parameters:

alut (dict) – The attribute’s look-up table as generated by ClusterSelector.determine_attributes().
X (np.ndarray) – The structure space matrix representing the point cloud \(\pmb{X} \in \mathbb{R}^{m \times 3}\).
c (np.ndarray) – The vector of point-wise cluster labels \(\pmb{c} \in \mathbb{R}^{m}\).
c_dom (np.ndarray) – The cluster-wise vector of cluster labels \(\pmb{c}_{\text{dom}} \in \mathbb{R}^{n_c}\).

Returns:

The cluster-wise feature space matrix \(\pmb{F} \in \mathbb{R}^{n_c \times n_f}\) for \(n_c \in \mathbb{Z}_{>0}\) clusters and \(n_f > \mathbb{Z}_{>0}\) attributes.

Return type:

np.ndarray

compute_selection_mask(c, c_dom, alut, F)

Compute the selection mask where True means the cluster must be preserved and False means the cluster must be discarded.

Parameters:

c (np.ndarray) – The point-wise cluster labels.
c_dom (np.ndarray) – The cluster labels.
alut (dict) – The look-up table for the cluster-wise attributes/features as computed by the ClusterSelector.determine_attributes().
F – The feature space matrix of the clusters.

Type:

np.ndarray

Returns:

The cluster-wise selection mask (True means the cluster must be kept, False means it must be discarded).

Return type:

np.ndarray

apply_selection_mask(clusterer, pcloud, c, c_dom, mask)

Apply the selection mask to discard those clusters that does not meet the given requirements. The preserved clusters are updated to have sequential indices as cluster labels (starting at zero, with \(-1\) representing non-clustered points).

Parameters:

pcloud (.PointCloud) – The point cloud that must be updated.
c (np.ndarray) – The point-wise cluster labels.
c_dom (np.ndarray) – The cluster labels.
mask (np.ndarray of bool) – The cluster-wise boolean mask where True means the cluster must be preserved and False means it must be discarded.

Returns:

The updated point cloud and the new domain of the clusters.

compute_number_of_points(X, c, c_dom)

Compute the number of points in each cluster.

See ClusterSelector.compute_attributes().

compute_surface_area(X, c, c_dom)

Compute the area of the convex hull in the \((x, y)\) plane that contains each cluster.

See ClusterSelector.compute_attributes().

compute_volume(X, c, c_dom)

Compute the volume of the 3D convex hull that contains each cluster.

See ClusterSelector.compute_attributes().

compute_surface_density(X, c, c_dom)

Compute the number of points in the cluster divided by the area of the convex hull in the \((x, y)\) plane that contains each cluster.

See ClusterSelector.compute_attributes().

compute_volume_density(X, c, c_dom)

Compute the volume of the 3D convex hull that contains each cluster.

See ClusterSelector.compute_attributes().

compute_x_length(X, c, c_dom)

Compute the difference between the max and min values along the: \(x\)-axis.

See ClusterSelector.compute_attributes().

compute_y_length(X, c, c_dom)

Compute the difference between the max and min values along the: \(y\)-axis.

See ClusterSelector.compute_attributes().

compute_z_length(X, c, c_dom)

Compute the difference between the max and min values along the: \(z\)-axis.

See ClusterSelector.compute_attributes().

clustering.postproc.clustering_post_processor module

class clustering.postproc.clustering_post_processor.ClusteringPostProcessor(**kwargs)

Bases: object

Author:: Alberto M. Esmoris Pena

Interface governing any component of a clustering post-processing pipeline. See Clusterer and Clusterer.post_process().

__init__(**kwargs)

Initialize a ClusteringPostProcessor.

Parameters:: kwargs – The key-word arguments for the initialization of any ClusteringPostProcessor.

abstractmethod __call__(clusterer, pcloud, out_prefix=None)

Abstract method that must be overridden by any concrete (instantiable) component of a clustering post-processing pipeline.

Parameters:

clusterer – The clusterer that called the post-processor.
pcloud – The point cloud that must be post-processed.
out_prefix – The output prefix in case path expansion must be applied.

Returns:

The post-processed point cloud.

Return type:

PointCloud

static build_post_processor(spec)

Build the post-processor from its key-words specification.

Parameters:: spec (dict) – The post-processor specification.
Returns:: Built post-processor.
Return type:: callable

get_cluster_labels(clusterer, pcloud)

Obtain the vector of cluster labels corresponding to the given clusterer.

Parameters:

clusterer (Clusterer) – The clusterer whose labels must be extracted.
pcloud (PointCloud) – The clustered point cloud.

Returns:

The vector of point-wise cluster labels.

Return type:

np.ndarray

set_cluster_labels(clusterer, pcloud, c)

Set the vector of point-wise cluster labels in the point cloud.

Parameters:

clusterer (Clusterer) – The clusterer whose labels must be setted (typically after updating them).
pcloud (PointCloud) – The clustered point cloud.
c (np.ndarray) – The new vector of point-wise cluster labels.

Returns:

Nothing at all, but the point cloud is udpated in place.

get_domain_from_cluster_labels(c)

Obtain the domain (all unique values but noise [default, -1]) of the cluster labels.

Parameters:: c (np.ndarray) – The vector of point-wise cluster labels.
Returns:: The domain of the cluster labels, i.e.,
Return type:: np.ndarray

Module contents

author:: Alberto M. Esmoris Pena

This postproc package contains the logic of post-processing components to be called after computing a clustering.