src.model.random_forestpp_classification_model

Classes

RandomForestPPClassificationModel(**kwargs)

class src.model.random_forestpp_classification_model.RandomForestPPClassificationModel(**kwargs)
Author:

Alberto M. Esmoris Pena

Random Forest model for classification using the C++ VL3D++ backend. Uses the optimized C++ RandomForest implementation with pre-sorted indices, inline Gini, incremental entropy, global presort, and OpenMP-parallel tree training.

This model is a drop-in replacement for RandomForestClassificationModel. Pipeline specifications using "train": "RandomForestClassifier" can switch to the C++ backend by changing to "train": "RandomForestPPClassifier". The model_args keys are compatible with sklearn naming.

See ClassificationModel and Model.

Variables:
  • model_args (dict) – The arguments for the C++ Random Forest.

  • model (capsule) – The C++ RandomForest object (held via py::capsule).

  • importance_report_path (str) – Path to the file to store the report.

  • decision_plot_path (str) – Path to the file to store decision tree plots.

CRITERION_MAP = {'entropy': 0, 'gini': 1, 'hellinger': 3, 'log_loss': 2}
static extract_model_args(spec)

Extract the arguments to initialize/instantiate a RandomForestPPClassificationModel from a key-word specification.

Parameters:

spec – The key-word specification containing the arguments.

Returns:

The arguments to initialize/instantiate a RandomForestPPClassificationModel

__init__(**kwargs)

Initialize an instance of RandomForestPPClassificationModel.

Parameters:

kwargs – The attributes for the RandomForestPPClassificationModel that will also be passed to the parent.

prepare_model()

Prepare the C++ Random Forest from model_args.

Returns:

The prepared model capsule. Note it is also assigned as the model attribute of the object/instance.

training(X, y, info=True)

The fundamental training logic to train the C++ random forest classifier.

See ClassificationModel and Model. Also see model.Model.training().

on_training_finished(X, y, yhat=None)

See model.Model.on_training_finished().

get_feature_importances()

Compute MDI (Mean Decrease in Impurity) feature importances.

Returns:

Array of shape (n_features,) summing to 1.0.

get_permutation_importances(X, y, n_repeats=5)

Compute permutation feature importance.

Parameters:
  • X – Feature matrix (M x F).

  • y – True labels (M).

  • n_repeats – Number of shuffle repeats per feature.

Returns:

Array of shape (n_features, 2): col 0 = mean, col 1 = std.

save_model(path)

Save the trained model to a binary file.

Parameters:

path – Output file path.

load_model(path)

Load a trained model from a binary file.

Parameters:

path – Input file path.