Rule Generation#
Functions#
extract_rule_by_max_gain#
- iguanas.rule_generation.extract_rule_by_max_gain(tree_X: pandas.DataFrame) str[source]#
Extract the rule path to the leaf with maximum gain using bottom-to-top approach.
Finds the leaf node with highest gain value and traces back to the root node, building the rule by reconstructing conditions from child to parent.
- Parameters:
tree_X (pd.DataFrame) – Output from estimator._Booster.trees_to_dataframe() filtered for a single tree. Required columns: Tree, Node, ID, Feature, Split, Yes, No, Missing, Gain, Cover.
- Returns:
Rule string in format (X[“feat1”] >= Split1) & (X[“feat2”] < Split2). Returns empty string if tree is empty or has no valid leaves.
- Return type:
str
extract_rule_with_monotone_constraints#
- iguanas.rule_generation.extract_rule_with_monotone_constraints(tree_X: pandas.DataFrame, monotone_constraints: dict[str, int]) str[source]#
Extract rule path following monotone constraints using top-to-bottom approach.
Starts from root and follows tree structure based on monotone constraints. NOTE: Only applicable if ALL features have a monotone constraint of -1 or +1. Features with constraint 0 will raise a ValueError.
- Parameters:
tree_X (pd.DataFrame) – Output from estimator._Booster.trees_to_dataframe() filtered for a single tree. Required columns: Tree, Node, ID, Feature, Split, Yes, No, Missing.
monotone_constraints (dict[str, int]) –
Dictionary mapping feature names to constraint values:
+1 (positive): follow “No” branch (feature >= threshold)
-1 (negative): follow “Yes” branch (feature < threshold)
0 (none): raises ValueError - not supported
- Returns:
Rule string in format (X[“feat1”] >= Split1) & (X[“feat2”] < Split2). Returns empty string if tree is empty or starts with a leaf.
- Return type:
str
- Raises:
ValueError – If a feature has no constraint defined or has constraint 0.
extract_rules#
- iguanas.rule_generation.extract_rules(estimator: xgboost.sklearn.XGBClassifier, all_features_constrained: bool, **kwargs) pandas.DataFrame[source]#
Generate metrics for rules extracted from XGBoost trees.
- Parameters:
estimator (XGBClassifier) – Fitted XGBoost classifier from which to extract rules.
all_features_constrained (bool) – If True, uses monotone constraint-based extraction (top-to-bottom). If False, uses max gain-based extraction (bottom-to-top).
**kwargs (dict) – Additional parameters for rule extraction and metric calculation (e.g., transformation name, scale_pos_weight value).
- Returns:
DataFrame containing: - rule: Extracted rule as a string - tree: Tree number from which the rule was extracted - scale_pos_weight: Scale_pos_weight value used for this tree
- Return type:
pd.DataFrame
rule_grid_search#
- iguanas.rule_generation.rule_grid_search(estimator: xgboost.sklearn.XGBClassifier, X_train: polars.DataFrame | pandas.DataFrame, y_train: polars.Series | pandas.Series, scale_pos_weights: list[float] | numpy.ndarray, sample_weights_df: polars.DataFrame | pandas.DataFrame | None = None, n_jobs: int = -1, verbose: int = 0) polars.DataFrame[source]#
Perform grid search parallelised over scale_pos_weight values or sample_weights to find optimal rules.
This function systematically trains XGBoost models with different combinations of: - sample weights - scale_pos_weight values
For each combination, it extracts rules from the fitted models and returns them as a Polars DataFrame. The weight transformations loop is parallelized using joblib for improved performance.
- Parameters:
estimator (XGBClassifier) – Base XGBoost classifier to use as a template for rule extraction.
X_train (pl.DataFrame | pd.DataFrame) – Training feature matrix.
y_train (pl.Series | pd.Series) – Training target values.
scale_pos_weights (list | np.ndarray) – Array of scale_pos_weight values to try. Parallelised across workers.
sample_weights_df (pl.DataFrame | pd.DataFrame | None, default=None) – DataFrame mapping transformation names to sample weight arrays. If None, uses baseline weights of 1.0 for all samples.
n_jobs (int, default=-1) – Number of parallel jobs to run. -1 means using all processors.
verbose (int, default=0) –
Controls the verbosity level:
0: silent (no output)
1: progress information (start/end summary)
>=2: detailed progress with live updates from joblib Parallel backend
- Returns:
Same schema as
rule_grid_search(): columns rule, tree, scale_pos_weight, transformation.- Return type:
pl.DataFrame
rule_grid_search_sequential#
- iguanas.rule_generation.rule_grid_search_sequential(estimator: xgboost.sklearn.XGBClassifier, X_train: polars.DataFrame | pandas.DataFrame, y_train: polars.Series | pandas.Series, scale_pos_weights: list[float] | numpy.ndarray, sample_weights_df: polars.DataFrame | pandas.DataFrame | None = None, verbose: int = 0) polars.DataFrame[source]#
Sequential (single-process) variant of rule_grid_search.
Identical behaviour to
rule_grid_search()but runs in a single process without joblib parallelism. Useful for debugging, environments where multiprocessing is unavailable, or small workloads where process-spawn overhead outweighs the benefit of parallelism.- Parameters:
estimator (XGBClassifier) – Base XGBoost classifier to use as a template for rule extraction.
X_train (pl.DataFrame | pd.DataFrame) – Training feature matrix.
y_train (pl.Series | pd.Series) – Training target values.
scale_pos_weights (list | np.ndarray) – Array of scale_pos_weight values to try.
sample_weights_df (pl.DataFrame | pd.DataFrame | None, default=None) – DataFrame mapping transformation names to sample weight arrays. If None, uses baseline weights of 1.0 for all samples.
verbose (int, default=0) – Controls verbosity. 0 = silent, 1 = summary.
- Returns:
Same schema as
rule_grid_search(): columns rule, tree, scale_pos_weight, transformation.- Return type:
pl.DataFrame
rule_grid_search_parallel_weights#
- iguanas.rule_generation.rule_grid_search_parallel_weights(estimator: xgboost.sklearn.XGBClassifier, X_train: polars.DataFrame | pandas.DataFrame, y_train: polars.Series | pandas.Series, scale_pos_weights: list[float] | numpy.ndarray, sample_weights_df: polars.DataFrame | pandas.DataFrame | None = None, n_jobs: int = -1, verbose: int = 0) polars.DataFrame[source]#
Perform grid search over sample weight transformations and scale_pos_weight values to find optimal rules.
This function systematically trains XGBoost models with different combinations of: - sample weights - scale_pos_weight values
For each combination, it extracts rules from the fitted models and returns them as a Polars DataFrame. The weight transformations loop is parallelized using joblib for improved performance.
- Parameters:
estimator (XGBClassifier) – Base XGBoost classifier to use as a template for rule extraction.
X_train (pl.DataFrame | pd.DataFrame) – Training feature matrix.
y_train (pl.Series | pd.Series) – Training target values.
scale_pos_weights (list | np.ndarray) – Array of scale_pos_weight values to try. Parallelised across workers.
sample_weights_df (pl.DataFrame | pd.DataFrame | None, default=None) – DataFrame mapping transformation names to sample weight arrays. If None, uses baseline weights of 1.0 for all samples.
n_jobs (int, default=-1) – Number of parallel jobs to run. -1 means using all processors.
verbose (int, default=0) –
Controls the verbosity level:
0: silent (no output)
1: progress information (start/end summary)
>=2: detailed progress with live updates from joblib Parallel backend
- Returns:
Same schema as
rule_grid_search(): columns rule, tree, scale_pos_weight, transformation.- Return type:
pl.DataFrame
Examples
>>> weights_train = generate_sample_weight_transformations(X_train["amount"]) >>> scale_pos_weights = np.logspace(0, np.log10(imbalance_ratio*2), 20) >>> results = rule_grid_search( ... estimator, X_train, y_train, ... scale_weights, weights_train, n_jobs=-1, verbose=1 ... )
rule_grid_search_parallel_scales#
- iguanas.rule_generation.rule_grid_search_parallel_scales(estimator: xgboost.sklearn.XGBClassifier, X_train: polars.DataFrame | pandas.DataFrame, y_train: polars.Series | pandas.Series, scale_pos_weights: list[float] | numpy.ndarray, sample_weights_df: polars.DataFrame | pandas.DataFrame | None = None, n_jobs: int = -1, verbose: int = 0) polars.DataFrame[source]#
Perform grid search parallelised over scale_pos_weight values.
This function systematically trains XGBoost models with different combinations of: - sample weights - scale_pos_weight values
For each combination, it extracts rules from the fitted models and returns them as a Polars DataFrame. The weight transformations loop is parallelized using joblib for improved performance.
- Parameters:
estimator (XGBClassifier) – Base XGBoost classifier to use as a template for rule extraction.
X_train (pl.DataFrame | pd.DataFrame) – Training feature matrix.
y_train (pl.Series | pd.Series) – Training target values.
scale_pos_weights (list | np.ndarray) – Array of scale_pos_weight values to try. Parallelised across workers.
sample_weights_df (pl.DataFrame | pd.DataFrame | None, default=None) – DataFrame mapping transformation names to sample weight arrays. If None, uses baseline weights of 1.0 for all samples.
n_jobs (int, default=-1) – Number of parallel jobs to run. -1 means using all processors.
verbose (int, default=0) –
Controls the verbosity level:
0: silent (no output)
1: progress information (start/end summary)
>=2: detailed progress with live updates from joblib Parallel backend
- Returns:
Same schema as
rule_grid_search(): columns rule, tree, scale_pos_weight, transformation.- Return type:
pl.DataFrame