Rule Classifier#

Classes#

RuleClassifier#

class iguanas.rule_classifier.RuleClassifier[source]#

Bases: pydantic.main.BaseModel, sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

Rule-based classifier that selects the single best rule.

The best rule is selected through the following steps:

Rule generation: candidate rules are extracted from XGBoost decision trees trained across a sweep of scale_pos_weight values.
Performance filtering: rules that fail any condition in metric_thresholds are discarded.
Ranking: the surviving rules are sorted by ranking_metric (descending) and the top-ranked rule is stored in _best_rule_.

Parameters:

estimator (XGBClassifier) – XGBoost classifier used for rule generation.
scale_pos_weights (list[float] | np.ndarray, default=np.array([1.0])) – Array of scale_pos_weight values swept during rule generation.
sample_weights_df (pl.DataFrame | None, default=None) – DataFrame of sample weights used for rule generation.
ranking_metric (str, default="accuracy") – Metric used to rank candidate rules. The single highest-scoring rule is kept. Must be a column produced by compute_metrics.
metric_thresholds (list[dict[str, Any]] | None, default=None) – List of threshold dicts used to filter candidate rules. Each dict must have keys "name" (metric column), "operator" (one of ">=", ">", "<=", "<", "==", "!="), and "value" (numeric threshold). All conditions are combined with AND. If None, the default threshold of apply_and_filter_by_performance is used.

fit(X: polars.DataFrame, y: polars.Series) → iguanas.rule_classifier.RuleClassifier[source]#

Generate, filter, and select the single best rule from training data.

Parameters:

X (pl.DataFrame) – Feature DataFrame. Only numeric columns are used for rule generation.
y (pl.Series) – Binary target series.

Returns:

Fitted classifier instance (self).

Return type:

RuleClassifier

predict(X: polars.DataFrame) → polars.Series[source]#

Predict binary labels using the single best rule.

Parameters:: X (pl.DataFrame) – Feature DataFrame with the same columns seen during fit.
Returns:: Boolean series named “prediction”.
Return type:: pl.Series

predict_proba(X: polars.DataFrame) → polars.Series[source]#

Predict probability using the single best rule.

Rule fires → 1.0
Rule does not fire → 0.0

Parameters:: X (pl.DataFrame) – Feature DataFrame with the same columns seen during fit.
Returns:: Float64 series named “proba” with values in {0.0, 1.0}.
Return type:: pl.Series

fit_predict(X: polars.DataFrame, y: polars.Series) → polars.Series[source]#

Fit classifier and return binary predictions on the same data.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → iguanas.rule_classifier.RuleClassifier#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object