Iguanas is a lightning-fast rule generation library built on top of Polars, designed to streamline the entire rule-based system development workflow β from raw data to production-ready rules β leveraging Polarsβ blazing-fast multi-core processing.
Note
For data preprocessing and feature engineering prior to rule generation, we recommend using Gators β a complementary library built on top of Polars by the same team at PayPal, providing 70+ transformers for cleaning, encoding, imputation, scaling, and more.
Built by the PSP Data Team at PayPal, Iguanas makes rule generation, evaluation, and selection both faster and simpler.
Key Features#
π Lightning Fast: Built on Polars for multi-core parallel processing
π― End-to-End: Generate, evaluate, combine, and select rules in one library
π¦ Production Ready: Lightweight rule strings that deploy anywhere
π§ Flexible: Sequential and parallel grid search strategies
π Composable: Chain generation β evaluation β selection with a few function calls
π Easy to Learn: Simple functional API with clear, consistent signatures
Quick Start#
import polars as pl
import numpy as np
from xgboost import XGBClassifier
from iguanas.weight_transformations import generate_weights
from iguanas.rule_generation import rule_grid_search_parallel_weights
from iguanas.rule_evaluation import apply_filter_and_deduplicate_rules
# 1. Load your data
X_train = pl.DataFrame({
"age": [25, 45, 35, 50, 30, 55, 40, 28],
"income": [30000, 80000, 50000, 90000, 40000, 95000, 70000, 35000],
})
y_train = pl.Series([0, 1, 0, 1, 0, 1, 1, 0])
# 2. Generate sample weight transformations
weights = generate_weights(X_train["income"])
# 3. Run a parallel grid search to extract rules
estimator = XGBClassifier(max_depth=2, n_estimators=5, random_state=42)
scale_pos_weights = np.logspace(0, 1, 5)
rules_df = rule_grid_search_parallel_weights(
estimator, X_train, y_train,
scale_pos_weights=scale_pos_weights,
weights_train_vec=weights,
n_jobs=-1,
)
# 4. Evaluate, filter, and deduplicate rules
R, metrics, selected_rules = apply_filter_and_deduplicate_rules(
X_train, y_train, rules_df,
metrics_threshold=[
{"name": "precision", "operator": ">=", "value": 0.6},
{"name": "recall", "operator": ">=", "value": 0.5},
],
max_corr=0.8,
)
print(selected_rules)
What Can Iguanas Do?#
βοΈ Rule Generation - Extract rules from XGBoost models with grid search
π Metrics - Precision, recall, F-beta, and weighted variants
π Rule Evaluation - Evaluate, filter, and deduplicate rule sets
π Rule Combination - Combine rules with greedy, beam, and A* search
βοΈ Rule Selection - Prune by feature overlap and correlation
π¬ Rule Analysis - Inspect and report on rule structure
ποΈ Rule Formatting - Simplify and clean rule expressions
π Monotone Constraints - Infer feature directionality
βοΈ Weight Transformations - Generate sample weight schedules
Use Cases#
Iguanas is perfect for:
Fraud Detection β Generate high-precision rules to flag suspicious transactions
Risk Scoring β Build interpretable rule sets for credit or operational risk
Compliance & Policy β Encode business policies as auditable rule expressions
Anomaly Detection β Surface rare but meaningful patterns in labelled data
Model Explainability β Extract human-readable rules from gradient boosted models
Credits#
Developed by the PSP Data Team at PayPal.
β‘ Built by data scientists, for data scientists