Iguanas is a lightning-fast rule generation library built on top of Polars, designed to streamline the entire rule-based system development workflow — from raw data to production-ready rules — leveraging Polars’ blazing-fast multi-core processing.

Note

For data preprocessing and feature engineering prior to rule generation, we recommend using Gators — a complementary library built on top of Polars by the same team at PayPal, providing 70+ transformers for cleaning, encoding, imputation, scaling, and more.

Built by the PSP Data Team at PayPal, Iguanas makes rule generation, evaluation, and selection both faster and simpler.

Key Features#

🚀 Lightning Fast: Built on Polars for multi-core parallel processing
🎯 End-to-End: Generate, evaluate, combine, and select rules in one library
📦 Production Ready: Lightweight rule strings that deploy anywhere
🔧 Flexible: Sequential and parallel grid search strategies
🔗 Composable: Chain generation → evaluation → selection with a few function calls
🎓 Easy to Learn: Simple functional API with clear, consistent signatures

Quick Start#

import polars as pl
import numpy as np
from xgboost import XGBClassifier

from iguanas.weight_transformations import generate_weights
from iguanas.rule_generation import rule_grid_search_parallel_weights
from iguanas.rule_evaluation import apply_filter_and_deduplicate_rules

# 1. Load your data
X_train = pl.DataFrame({
    "age":    [25, 45, 35, 50, 30, 55, 40, 28],
    "income": [30000, 80000, 50000, 90000, 40000, 95000, 70000, 35000],
})
y_train = pl.Series([0, 1, 0, 1, 0, 1, 1, 0])

# 2. Generate sample weight transformations
weights = generate_weights(X_train["income"])

# 3. Run a parallel grid search to extract rules
estimator = XGBClassifier(max_depth=2, n_estimators=5, random_state=42)
scale_pos_weights = np.logspace(0, 1, 5)

rules_df = rule_grid_search_parallel_weights(
    estimator, X_train, y_train,
    scale_pos_weights=scale_pos_weights,
    weights_train_vec=weights,
    n_jobs=-1,
)

# 4. Evaluate, filter, and deduplicate rules
R, metrics, selected_rules = apply_filter_and_deduplicate_rules(
    X_train, y_train, rules_df,
    metrics_threshold=[
        {"name": "precision", "operator": ">=", "value": 0.6},
        {"name": "recall",    "operator": ">=", "value": 0.5},
    ],
    max_corr=0.8,
)

print(selected_rules)

What Can Iguanas Do?#

⚙️ Rule Generation - Extract rules from XGBoost models with grid search
📊 Metrics - Precision, recall, F-beta, and weighted variants
🔍 Rule Evaluation - Evaluate, filter, and deduplicate rule sets
🔀 Rule Combination - Combine rules with greedy, beam, and A* search
✂️ Rule Selection - Prune by feature overlap and correlation
🔬 Rule Analysis - Inspect and report on rule structure
🖊️ Rule Formatting - Simplify and clean rule expressions
📐 Monotone Constraints - Infer feature directionality
⚖️ Weight Transformations - Generate sample weight schedules

Use Cases#

Iguanas is perfect for:

Fraud Detection — Generate high-precision rules to flag suspicious transactions
Risk Scoring — Build interpretable rule sets for credit or operational risk
Compliance & Policy — Encode business policies as auditable rule expressions
Anomaly Detection — Surface rare but meaningful patterns in labelled data
Model Explainability — Extract human-readable rules from gradient boosted models

Credits#

Developed by the PSP Data Team at PayPal.

⚡ Built by data scientists, for data scientists