clustering.postproc package
Submodules
clustering.postproc.cluster_enveloper module
- class clustering.postproc.cluster_enveloper.ClusterEnveloper(**kwargs)
Bases:
ClusteringPostProcessor- Author:
Alberto M. Esmoris Pena
Clustering post-processor that computes the requested envelopes for each cluster. See
ClusteringPostProcessor.- Variables:
envelopes – List of dictionaries defining each envelope.
- __init__(**kwargs)
Initialize a ClusterEnveloper post-processor.
See
ClusteringPostProcessor.__init__().- Parameters:
kwargs – The key-word arguments for the initialization of the ClusterEnveloper.
- __call__(clusterer, pcloud, out_prefix=None)
Post-process the given point cloud with clusters to compute the cluster-wise envelopes.
- Parameters:
clusterer (
Clusterer) – The clusterer that generated the clusters.pcloud (
PointCloud) – The point cloud to be post-processed.out_prefix (str or None) – The output prefix in case path expansion must be applied.
- Returns:
The post-processed point cloud.
- Return type:
- compute_aabb_envelope(spec, X, cidx)
Compute the axis-aligned bounding box for the given points.
- Parameters:
spec (dict) – The specification of the axis aligned bounding box to be computed.
X (
np.ndarray) – The structure space matrix representing the cluster whose axis-aligned bounding box must be found.cidx (int) – The index of the cluster whose envelope must be computed.
- Returns:
Nothing, but the generated axis-aligned bounding box is stored in the member attribute
aabbs.
- compute_bbox_envelope(spec, X, cidx)
Compute the bounding box for the given points.
- Parameters:
spec (dict) – The specification of the bounding box to be computed.
X (
np.ndarray) – The structure space matrix representing the cluster whose bounding box must be found.cidx (int) – The index of the cluster whose envelope must be computed.
- Returns:
Nothing, but the generated bounding box is stored in the member attribute
bboxes.
- export_aabb_envelopes(out_prefix=None)
Export the axis-aligned bounding box envelopes stored as member attributes.
- Returns:
Nothing at all, but the bounding boxes are written to files.
- export_bbox_envelopes(out_prefix=None)
Export the bounding box envelopes stored as member attributes.
- Returns:
Nothing at all, but the bounding boxes are written to files.
clustering.postproc.cluster_marker module
- class clustering.postproc.cluster_marker.ClusterMarker(**kwargs)
Bases:
ClusteringPostProcessor- Author:
Alberto M. Esmoris Pena
Clustering post-processor that computes a point representing each cluster. Note that if the clusters are georeferenced, the point will also be georeferenced. See
ClusteringPostProcessor.- Variables:
strategy (str) – The strategy to compute the point representing each cluster. Supported strategies are
"centroid"(seeClusterMaker.compute_cluster_centroid()),"midrange"(seeClusterMarker.compute_cluster_midrange()),"medianoid"(seeClusterMarker.compute_cluster_medianoid()),"medoid"(seeClusterMarker.compute_cluster_medoid()), and"geometric_median"(seeClusterMarker.computer_cluster_geometric_median()).epsg (int or None) – The number in the European Petroleum Survey Group (EPSG) standard representing a coordinate reference system (CRS). If given, it will be used to export a
.prjfile when using shapefile as output format. This.prjfile will contain the well known text (WKT) representing the projection information to georeference the markers in the shapefile.nthreads (int) – The number of threads to be used for the parallel computation of the markers. Note that -1 implies using as many threads as available cores in the system.
output_path (str) – the path where the marks will be exported. The output type will be CSV for any extension but shape file for
.shpextension.
- __init__(**kwargs)
Initialize a ClusterMarker post-processor.
See
ClusteringPostProcessor.__init__().- Parameters:
kwargs – The key-word arguments for the initialization of the ClusterMarker.
- __call__(clusterer, pcloud, out_prefix=None)
Post-process the given point cloud with clusters to compute the cluster-wise marks.
- Parameters:
clusterer (
Clusterer) – The clusterer that generated the clusters.pcloud (
PointCloud) – The point cloud to be post-processed.out_prefix (str or None) – The output prefix in case path expansion must be applied.
- Returns:
The post-processed point cloud.
- Return type:
- compute_cluster_centroid(X)
Compute the centroid as the mark representing the cluster.
\[\pmb{c} = \dfrac{1}{m_c} \sum_{i=1}^{m_c}{\pmb{x}_{i*}}\]- Parameters:
X (
np.ndarray) – The structure space matrix representing the cluster \(\pmb{X} \in \mathbb{R}^{m_c \times 3}\).- Returns:
The point representing the cluster, i.e., the cluster’s mark.
- Return type:
np.ndarray
- compute_cluster_midrange(X)
Compute the midrange as the mark representing the cluster.
\[c_j = \dfrac{ \min \; \{x_{ij}\}_{i=1}^{m_c} + \max \; \{x_{ij}\}_{i=1}^{m_c} }{ 2 }\]- Parameters:
X (
np.ndarray) – The structure space matrix representing the cluster \(\pmb{X} \in \mathbb{R}^{m_c \times 3}\).- Returns:
The point representing the cluster, i.e., the cluster’s mark.
- Return type:
np.ndarray
- compute_cluster_medianoid(X)
Compute the medianoid as the mark representing the cluster.
\[c_j = \operatorname{median}(\pmb{x}_{*j})\]- Parameters:
X (
np.ndarray) – The structure space matrix representing the cluster \(\pmb{X} \in \mathbb{R}^{m_c \times 3}\).- Returns:
The point representing the cluster, i.e., the cluster’s mark.
- Return type:
np.ndarray
- compute_cluster_medoid(X)
Compute the medoid as the mark representing the cluster.
\[\pmb{c} = \operatorname*{arg min}_{\pmb{x}_{i*}} \quad \sum_{k=1}^{m_c}{ \lVert\pmb{x}_{i*} - \pmb{x}_{k*}\rVert^2 }\]- Parameters:
X (
np.ndarray) – The structure space matrix representing the cluster \(\pmb{X} \in \mathbb{R}^{m_c \times 3}\).- Returns:
The point representing the cluster, i.e., the cluster’s mark.
- Return type:
np.ndarray
- compute_cluster_geometric_median(X)
Compute the approximated geometric median using the Weisfeld’s algorithm.
\[\pmb{c}_{k+1} = \left( \sum_{i=1}^{m_c}{ \lVert\pmb{x}_{i*} - \pmb{c}_k\rVert^{-1} }\right)^{-1} \sum_{i=1}^{m_c}{ \lVert\pmb{x}_{i*} - \pmb{c}_k\rVert^{-1} \pmb{x}_{i*} }\]- Parameters:
X (
np.ndarray) – The structure space matrix representing the cluster \(\pmb{X} \in \mathbb{R}^{m_c \times 3}\).- Returns:
The point representing the cluster, i.e., the cluster’s mark.
- Return type:
np.ndarray
- export_marks(marks, out_prefix=None)
Write the marks representing the clusters to an output file.
- Parameters:
marks (
np.ndarray) – The marks (points) to be exported.out_prefix (str or None) – The output prefix in case path expansion must be applied.
- Returns:
Nothing at all, but the cluster marks are written to the output file.
- static export_marks_to_shapefile(marks, outpath, epsg=None, proj_str=None)
Assist
ClusterMarker.export_marks()in writing shape files.- Parameters:
marks (
np.ndarray) – The marks to be written.outpath (str) – The path where the shape file must be written.
epsg (int or None) – The EPSG code identifying the coordinate reference system.
proj_str (str) – The projection string identifying the coordinate reference system.
- Returns:
Nothing at all, but the marks are written to the output file.
- static export_marks_to_csv(marks, outpath)
Assist
ClusterMarker.export_marks()in writing CSV files.- Parameters:
marks (
np.ndarray) – The marks to be written.outpath (str) – The path where the CSV file must be written.
- Returns:
Nothing at all, but the marks are written to the output file.
clustering.postproc.cluster_selector module
- class clustering.postproc.cluster_selector.ClusterSelector(**kwargs)
Bases:
ClusteringPostProcessor- Author:
Alberto M. Esmoris Pena
Clustering post-processor that filters the clusters, i.e., discards those that do not satisfy the specified requirements or preserve only those that satisfy some given requisites.
- Parameters:
filters (list of dict) –
The list with the specification of the preserve and discard actions with their requirements. The structure of each dictionary in the list is as follows:
- –
attribute The attribute for which the relational condition/requirement is specified. Supported attributes are:
- –
"number_of_points" The number of points in the cluster.
- –
"surface_area" The surface area in the \((x, y)\) plane of each cluster understood as the area of the convex hull that contains the cluster.
- –
"volume" The volume of each cluster understood as the volume of the 3D convex hull that contains the cluster.
- –
"surface_density" The number of points divided by the surface area.
- –
"volume_density" The number of points divided by the volume.
- –
"x_length" The difference between the max and min points of the cluster along the \(x\)-axis.
- –
"y_length" The difference between the max and min points of the cluster along the \(y\)-axis.
- –
"z_length" The difference between the max and min points of the cluster along the \(z\)-axis.
- –
- –
relational The relational governing whether the condition/requirement is satisfied or not. Supported relationals are
"not_equals"\(x \neq y\),"equals"\(x = y\),"less_than"\(x < y\),"less_than_or_equal_to"\(x \leq y\),"greater_than":math`x > y`,"greater_than_or_equal_to"\(x \geq y\),"in"\(x \in S\),"not_in"\(x \notin S\), and"inside"\(x \in [a, b] \subset \mathbb{R}\).- –
target The target value for the right hand side of the relational. It can be either an integer, a float or a list. Lists are used for
"in","not_in"and"inside"relationals and concretely for"inside"the list must have exactly two elements.- –
action Either
"preserve"to keep those clusters that satisfy the relational condition or"discard"to discard clusters that satisfy the relational condition.
- –
- __init__(**kwargs)
Initialize a ClusterSelector post-processor.
See
ClusteringPostProcessor.__init__().- Parameters:
kwargs – The key-word arguments for the initialization of the ClusterSelector.
- __call__(clusterer, pcloud, out_prefix=None)
Post-process the given point cloud with clusters to discard those clusters that do not pass the requested filters.
- Parameters:
clusterer (
Clusterer) – The clusterer that generated the clusters.pcloud (
PointCloud) – The point cloud to be post-processed.out_prefix (str or None) – The output prefix in case path expansion must be applied.
- Returns:
The post-processed point cloud.
- Return type:
- determine_attributes()
Determine the attributes that must be computed for each clustering from the filters specification.
- Returns:
A dictionary-like look-up table whose keys are the names of the attributes that must be computed and whose values are the indices of those values in the cluster-wise feature space matrix.
- Return type:
dict
- compute_attributes(alut, X, c, c_dom)
Compute the attributes for each cluster.
- Parameters:
alut (dict) – The attribute’s look-up table as generated by
ClusterSelector.determine_attributes().X (
np.ndarray) – The structure space matrix representing the point cloud \(\pmb{X} \in \mathbb{R}^{m \times 3}\).c (
np.ndarray) – The vector of point-wise cluster labels \(\pmb{c} \in \mathbb{R}^{m}\).c_dom (
np.ndarray) – The cluster-wise vector of cluster labels \(\pmb{c}_{\text{dom}} \in \mathbb{R}^{n_c}\).
- Returns:
The cluster-wise feature space matrix \(\pmb{F} \in \mathbb{R}^{n_c \times n_f}\) for \(n_c \in \mathbb{Z}_{>0}\) clusters and \(n_f > \mathbb{Z}_{>0}\) attributes.
- Return type:
np.ndarray
- compute_selection_mask(c, c_dom, alut, F)
Compute the selection mask where
Truemeans the cluster must be preserved andFalsemeans the cluster must be discarded.- Parameters:
c (
np.ndarray) – The point-wise cluster labels.c_dom (
np.ndarray) – The cluster labels.alut (dict) – The look-up table for the cluster-wise attributes/features as computed by the
ClusterSelector.determine_attributes().F – The feature space matrix of the clusters.
- Type:
np.ndarray- Returns:
The cluster-wise selection mask (True means the cluster must be kept, False means it must be discarded).
- Return type:
np.ndarray
- apply_selection_mask(clusterer, pcloud, c, c_dom, mask)
Apply the selection mask to discard those clusters that does not meet the given requirements. The preserved clusters are updated to have sequential indices as cluster labels (starting at zero, with \(-1\) representing non-clustered points).
- Parameters:
pcloud (.PointCloud) – The point cloud that must be updated.
c (
np.ndarray) – The point-wise cluster labels.c_dom (
np.ndarray) – The cluster labels.mask (
np.ndarrayof bool) – The cluster-wise boolean mask whereTruemeans the cluster must be preserved andFalsemeans it must be discarded.
- Returns:
The updated point cloud and the new domain of the clusters.
- compute_number_of_points(X, c, c_dom)
Compute the number of points in each cluster.
- compute_surface_area(X, c, c_dom)
Compute the area of the convex hull in the \((x, y)\) plane that contains each cluster.
- compute_volume(X, c, c_dom)
Compute the volume of the 3D convex hull that contains each cluster.
- compute_surface_density(X, c, c_dom)
Compute the number of points in the cluster divided by the area of the convex hull in the \((x, y)\) plane that contains each cluster.
- compute_volume_density(X, c, c_dom)
Compute the volume of the 3D convex hull that contains each cluster.
- compute_x_length(X, c, c_dom)
- Compute the difference between the max and min values along the
\(x\)-axis.
- compute_y_length(X, c, c_dom)
- Compute the difference between the max and min values along the
\(y\)-axis.
- compute_z_length(X, c, c_dom)
- Compute the difference between the max and min values along the
\(z\)-axis.
clustering.postproc.clustering_post_processor module
- class clustering.postproc.clustering_post_processor.ClusteringPostProcessor(**kwargs)
Bases:
object- Author:
Alberto M. Esmoris Pena
Interface governing any component of a clustering post-processing pipeline. See
ClustererandClusterer.post_process().- __init__(**kwargs)
Initialize a ClusteringPostProcessor.
- Parameters:
kwargs – The key-word arguments for the initialization of any ClusteringPostProcessor.
- abstractmethod __call__(clusterer, pcloud, out_prefix=None)
Abstract method that must be overridden by any concrete (instantiable) component of a clustering post-processing pipeline.
- Parameters:
clusterer – The clusterer that called the post-processor.
pcloud – The point cloud that must be post-processed.
out_prefix – The output prefix in case path expansion must be applied.
- Returns:
The post-processed point cloud.
- Return type:
- static build_post_processor(spec)
Build the post-processor from its key-words specification.
- Parameters:
spec (dict) – The post-processor specification.
- Returns:
Built post-processor.
- Return type:
callable
- get_cluster_labels(clusterer, pcloud)
Obtain the vector of cluster labels corresponding to the given clusterer.
- Parameters:
clusterer (
Clusterer) – The clusterer whose labels must be extracted.pcloud (
PointCloud) – The clustered point cloud.
- Returns:
The vector of point-wise cluster labels.
- Return type:
np.ndarray
- set_cluster_labels(clusterer, pcloud, c)
Set the vector of point-wise cluster labels in the point cloud.
- Parameters:
clusterer (
Clusterer) – The clusterer whose labels must be setted (typically after updating them).pcloud (
PointCloud) – The clustered point cloud.c (
np.ndarray) – The new vector of point-wise cluster labels.
- Returns:
Nothing at all, but the point cloud is udpated in place.
- get_domain_from_cluster_labels(c)
Obtain the domain (all unique values but noise [default, -1]) of the cluster labels.
- Parameters:
c (
np.ndarray) – The vector of point-wise cluster labels.- Returns:
The domain of the cluster labels, i.e.,
- Return type:
np.ndarray
Module contents
- author:
Alberto M. Esmoris Pena
This postproc package contains the logic of post-processing components to be called after computing a clustering.