Imputers
Imputers are the components that provide data imputation capabilities. It might be that some computation on a point cloud yields a not valid output, for instance a Not a Number (NaN) value. In those cases, we could benefit from adding some imputer to our pipeline.
For example, one typical step in many point cloud
processing pipelines consists of computing geometric features (see
GeometricFeaturesMiner). In doing so, many 3D neighborhoods are
analyzed by the Singular Value Decomposition (SVD) of a
\(\pmb{X} \in \mathbb{R}^{m \times 3}\) matrix representing the \(m\)
points of the neighborhood. The SVD yields the following factorization
\(\pmb{X} = \pmb{U}\pmb{\Sigma}\pmb{V}^\intercal\) from where the
\(\pmb{V}^\intercal\) singular vectors and the \(\pmb{\Sigma}\)
singular values can be used to derive linearity, planarity, sphericity, and
other geometric features.
Note that this factorization on a 3D space will be problematic if there are not at least three linearly independent equations. Fortunately, most points in a dense point cloud have populated enough neighborhoods. However, some points at the boundaries of the point cloud, points from scanning artifacts, or outlier points by whatever reason, might have poorly populated neighborhoods. Consequently, NaN values might appear for some features. In that case, a data imputation strategy will allow us to define what do with problematic features in our pipeline.
Removal imputer
A RemovalImputer defines a target value and removes from the point
cloud all the points that contain such value in their features.
The JSON below shows an example of RemovalImputer.
{
"imputer": "RemovalImputer",
"fnames": ["AUTO"],
"target_val": "NaN",
"impute_coordinates": false,
"impute_references": false
}
In the JSON above the target value is NaN and all the features considered
at the current state of the pipeline will be considered when searching for the
target value ("AUTO").
Arguments
- –
fnames It can be an arbitrary list of feature names specifying what features consider. Alternatively, it can be a list containing the string “AUTO”. In this case, the feature names will be automatically derived. Typically, the features that were considered by the most recent component that interacted with the features will be selected.
- –
target_val It can be the “NaN” string (it will be understood as NaN), or any integer or decimal number.
- –
impute_coordinates Boolean flag to specify whether to impute the point-wise coordinates (
true) or not (false, default).- –
impute_references Boolean flag to specify whether to impute the reference values (
true), e.g., classes, or not (false).
Univariate imputer
A UnivariateImputer defines a target value and replaces it
considering the values for that feature that do not match the target value.
It is called univariate because it operates on each feature independently.
For example, if we have a feature for five points with values
\((0.3, 0.1, 0.1, \mathrm{NaN}, 0.5)\) we could use a mean-based univariate
imputer to achieve \((0.3, 0.1, 0.1, 0.25, 0.5)\).
The JSON below shows an example of UnivariateImputer.
{
"imputer": "UnivariateImputer",
"fnames": ["AUTO"],
"target_val": "NaN",
"strategy": "mean",
"constant_val": 0,
"impute_coordinates": false,
"impute_references": false
}
In the JSON above the target value is NaN and all the features considered at
the current state of the pipeline will be considered when searching for the
target value ("AUTO"). The NaN values will be replaced by the mean of the
numerical values. The constant_val is not used because the value to replace
the NaN is automatically derived as the mean.
Arguments
- –
fnames It can be an arbitrary list of feature names specifying what features consider. Alternatively, it can be a list containing the string “AUTO”. In this case, the feature names will be automatically derived. Typically, the features that were considered by the most recent component that interacted with the features will be selected.
- –
target_val It can be the “NaN” string (it will be understood as NaN), or any integer or decimal number.
- –
strategy It can be any strategy supported by
sklearn.impute.SimpleImputeras astrategyparameter. See sklearn SimpleImputer.- –
constant_val Defines the new value when the strategy is to replace by a given constant value.
- –
impute_coordinates Boolean flag to specify whether to impute the point-wise coordinates (
true) or not (false, default).- –
impute_references Boolean flag to specify whether to impute the reference values (
true), e.g., classes, or not (false).