Quick Start#
This guide will get you started with Iguanas in minutes.
Basic Example#
Here’s a simple example showing the core Iguanas workflow:
import polars as pl
import numpy as np
from xgboost import XGBClassifier
# Import Iguanas modules
from iguanas.rule_generation import rule_grid_search_parallel_scales
from iguanas.rule_evaluation import apply_rules
from iguanas.rule_selection import filter_correlated_rules
from iguanas.metrics import compute_metrics
# 1. Load your data (example with synthetic data)
X_train = pl.DataFrame({
'age': [25, 45, 35, 50, 30, 55, 40, 28],
'income': [30000, 80000, 50000, 90000, 40000, 95000, 70000, 35000],
'credit_score': [650, 720, 680, 750, 660, 780, 710, 640]
})
y_train = pl.Series([0, 1, 0, 1, 0, 1, 1, 0])
# 3. Configure XGBoost estimator for rule extraction
estimator = XGBClassifier(
max_depth=1, # Decision stumps for simple rules
n_estimators=10,
random_state=42
)
# 4. Generate rules using grid search
scale_pos_weights = np.logspace(0, 1, 5) # Try different class balance weights
rules_df = rule_grid_search_parallel_scales(
estimator=estimator,
X_train=X_train,
y_train=y_train,
scale_pos_weights=scale_pos_weights,
n_jobs=-1,
verbose=1
)
# 5. Apply rules to your data
rules = rules_df['rule'].unique().to_list()
R_train = apply_rules(X_train, rules)
# 6. Compute performance metrics
metrics = compute_metrics(R_train, y_train)
print(metrics.select(['rule', 'precision', 'recall', 'f1']).head(10))
# 7. Filter correlated rules to keep only diverse, high-performing rules
importance = dict(metrics[['rule', 'f1']].rows())
uncorrelated_rules = filter_correlated_rules(R_train, importance, max_corr=0.8)
print(f"Original rules: {len(rules)}")
print(f"Filtered rules: {len(uncorrelated_rules)}")
Understanding the API#
Iguanas is organized into modular components that work together in a typical workflow:
- 1. Rule Generation (Rule Generation)
Generate rules from your data using XGBoost decision trees:
rule_grid_search(): Parallelized grid search over weight transformationsextract_rules(): Extract rules from fitted XGBoost modelsextract_rule_by_max_gain(): Extract single rule by maximum gain
- 2. Rule Evaluation (Rule Evaluation)
Apply rules to data and evaluate their performance:
apply_rules(): Evaluate rule expressions on DataFramesapply_and_filter_by_performance(): Filter rules by precision/recall thresholdsselect_diverse_top_rules(): Select top performing non-correlated rules
- 3. Metrics (Metrics)
Compute comprehensive performance metrics:
compute_metrics(): Calculate precision, recall, F-scores, TPVE metricsSupports both count-based and weighted metrics
- 4. Rule Selection (Rule Selection)
Filter and select rules based on similarity and correlation:
filter_correlated_rules(): Remove highly correlated rulesfilter_rules_by_feature_overlap(): Filter rules with similar feature usageextract_feature_names_from_rule(): Extract features used in rules
- 5. Rule Combination (Rule Combination)
Combine rules to create more powerful composite rules:
combine_rules_full_search(): Generate all combinationscombine_rules_greedy(): Greedy search for best combinationscombine_rules_beam_search(): Beam search algorithmcombine_rules_a_star(): A* search algorithm
- 6. Rule Analysis (Rule Analysis)
Analyze rules at hierarchical levels:
generate_rule_performance_report(): Metrics at rule, component, and condition levels
- 7. Rule Formatting (Rule Formatting)
Transform and simplify rules:
simplify_rule(): Remove redundant conditionsformat_floats_as_integers(): Convert float thresholds to integersadd_missing_value_conditions(): Handle missing valuesDecoder functions for encoded features
- 8. Utilities
Supporting utilities for rule generation:
Weight Transformations: Generate sample weight transformations
Monotone Constraints: Infer monotone constraints for XGBoost
Next Steps#
Explore the Rule Generation for generating rules from your data
Check out Rule Evaluation for applying rules to data
Check out Metrics for evaluating the performance of rules
Learn about Rule Combination for combining rules into ensembles
Dive into Rule Selection for selecting the best rules based on performance metrics
Browse the API Reference for complete API documentation