Ruleset Classifier#
Classes#
RulesetClassifier#
- class iguanas.ruleset_classifier.RulesetClassifier[source]#
Bases:
pydantic.main.BaseModel,sklearn.base.BaseEstimator,sklearn.base.ClassifierMixinEnd-to-end rule-based classification pipeline.
The best ruleset is selected through the following steps:
Rule generation: candidate rules are extracted from XGBoost decision trees trained across a sweep of
scale_pos_weightvalues.Performance filtering: rules that fail any condition in
metric_thresholdsare discarded.Correlation filtering: among rules that are correlated above
max_corr, only the one with the highestranking_metricscore is kept.Greedy combination: starting from the single best rule, rules are added one at a time — each iteration picks the candidate that yields the largest improvement in
ranking_metricwhen combined (viacombine_operator) with the already-selected rules. Addition stops when no candidate improves the metric by at leastmin_improvementor whenmax_rulesrules have been selected.
The resulting combined rule expression is stored in
_best_ruleset_as a string (e.g."(rule_A) | (rule_B) | (rule_C)").- Parameters:
estimator (XGBClassifier) – XGBoost classifier used for rule generation.
scale_pos_weights (np.ndarray | list[float], default=np.array([1.0])) – Array of scale_pos_weight values swept during rule generation.
ranking_metric (str, default="accuracy") – Metric used to rank and select candidate rules. Must be a column produced by compute_metrics (e.g. “f1”, “precision”, “recall”).
max_rules (int, default=10) – Maximum number of rules the greedy search may select. Must be > 0.
metric_thresholds (list[dict[str, Any]] | None, default=None) – List of threshold dicts used to filter candidate rules. Each dict must have keys
"name"(metric column),"operator"(one of">=",">","<=","<","==","!="), and"value"(numeric threshold). All conditions are combined with AND. If None, the default threshold ofapply_and_filter_by_performanceis used.max_corr (float, default=0.8) – Maximum pairwise correlation allowed between rules; correlated pairs are pruned to keep only the highest-ranked one. Must be in [0, 1].
combine_operator (str, default="or") – Boolean operator used to combine selected rules: “or” or “and”.
min_improvement (float, default=0.01) – Minimum improvement in ranking_metric required to add a new rule to the combined ruleset during greedy selection.
- fit(X: polars.DataFrame, y: polars.Series) iguanas.ruleset_classifier.RulesetClassifier[source]#
Generate, filter, and select rules from training data.
- Parameters:
X (pl.DataFrame) – Feature DataFrame. Only numeric columns are used for rule generation.
y (pl.Series) – Binary target series.
- Returns:
Fitted pipeline instance (self).
- Return type:
- predict(X: polars.DataFrame) polars.Series[source]#
Predict binary labels for each sample.
A sample is positive if any (OR) or all (AND) selected rules fire, depending on combine_operator.
- Parameters:
X (pl.DataFrame) – Feature DataFrame with the same columns seen during fit.
- Returns:
Boolean series named “prediction”.
- Return type:
pl.Series
- predict_proba(X: polars.DataFrame) polars.Series[source]#
Predict rule-coverage probability for each sample.
Probability is a piecewise-linear function of the number of selected rules that fire for each sample:
0 rules fired → 0.0
1 rule fired → 0.5
all rules fired → 1.0
between 1 and all: linearly interpolated in [0.5, 1.0]
- Parameters:
X (pl.DataFrame) – Feature DataFrame with the same columns seen during fit.
- Returns:
Float64 series named “proba” with values in [0.0, 1.0].
- Return type:
pl.Series
- fit_predict(X: polars.DataFrame, y: polars.Series) polars.Series[source]#
Fit pipeline and return binary predictions on the same data.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') iguanas.ruleset_classifier.RulesetClassifier#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.- Returns:
self – The updated object.
- Return type: