Rule Combination#

Functions#

combine_rules_cumulative#

iguanas.rule_combination.combine_rules_cumulative(R: polars.DataFrame, output_names: list[str] | None = None, operator: str = 'or') polars.DataFrame[source]#

Compute horizontal cumulative boolean operations across all columns.

Parameters:
  • R (pl.DataFrame) – Input DataFrame. All columns will be used in the cumulative operation.

  • output_names (list[str] | None, default=None) – List of names for the output columns. If None, generates names based on operator. Must have the same length as R.columns.

  • operator (str, default='or') –

    Boolean operator to apply:

    • ’or’: cumulative OR (any True)

    • ’and’: cumulative AND (all True)

Returns:

DataFrame with boolean values:

  • If operator=’or’: True if at least one condition is True up to that position

  • If operator=’and’: True if all conditions are True up to that position

Return type:

pl.DataFrame

Raises:

ValueError – If operator is not ‘or’ or ‘and’, or if output_names length doesn’t match columns.

Examples

>>> import polars as pl
>>> R = pl.DataFrame({
...     "rule_A": [True, False, True],
...     "rule_B": [False, True, True],
...     "rule_C": [True, True, False],
... })
>>> combine_rules_cumulative(R, operator="or")
# Column 1: rule_A | ...; Column 2: rule_A | rule_B | ...; Column 3: all three
>>> combine_rules_cumulative(R, operator="and", output_names=["step1", "step2", "step3"])
# Named columns, each True only if all rules up to that position are True

combine_rules_greedy#

iguanas.rule_combination.combine_rules_greedy(R: polars.DataFrame, y: polars.Series, metric: str = 'f1', max_rules: int = 5, operator: str = 'or', weights: polars.Series | None = None, min_improvement: float = 0.0) polars.DataFrame[source]#

Greedily select rules that maximize a performance metric.

Starts with the best single rule, then iteratively adds rules that provide the largest metric improvement. Stops when no rule improves the metric by at least min_improvement or when max_rules is reached.

Parameters:
  • R (pl.DataFrame) – DataFrame containing boolean rule columns. All columns will be used as candidate rules.

  • y (pl.Series) – Boolean target series indicating true labels.

  • metric (str, default="f1") – Performance metric to optimize. Must be a column name produced by compute_metrics (e.g., “f1”, “accuracy”, “precision”, “recall”).

  • max_rules (int, default=5) – Maximum number of rules to select.

  • operator (str, default="or") – Boolean operator for combining rules: ‘or’ or ‘and’.

  • weights (pl.Series | None, default=None) – Optional sample weights for weighted metric computation.

  • min_improvement (float, default=0.0) – Minimum metric improvement required to add a new rule.

Returns:

DataFrame with single column containing the combined rule. Column name reflects the selected rules using the operator.

Return type:

pl.DataFrame

Examples

>>> import polars as pl
>>> R = pl.DataFrame({"rule_A": [True, False, True],
...                   "rule_B": [False, True, True],
...                   "rule_C": [True, True, False]})
>>> y = pl.Series([True, True, False])
>>> result_R = combine_rules_greedy(
...     R, y, metric="f1", max_rules=2
... )
>>> print(result_R.columns)  # e.g., ['(rule_B) | (rule_A)']
Raises:

ValueError – If operator is not ‘or’ or ‘and’, or if metric column not found.

combine_rules_a_star#

iguanas.rule_combination.combine_rules_a_star(R: polars.DataFrame, y: polars.Series, metric: str = 'f1', max_rules: int = 5, operator: str = 'or', weights: polars.Series | None = None, min_improvement: float = 0.0, return_top_k: int = 10) polars.DataFrame[source]#

Find top rule combinations using A* search algorithm.

Uses A* to efficiently explore the space of rule combinations, finding optimal or near-optimal combinations by balancing actual performance (g) with estimated potential (h). More thorough than greedy or beam search when finding the globally best combination is important.

A* Cost Function:
  • g(n): Negative metric value (better metrics = lower cost)

  • h(n): Optimistic estimate of best possible improvement from remaining rules

  • f(n): g(n) + h(n) (total estimated cost)

Parameters:
  • R (pl.DataFrame) – DataFrame containing boolean rule columns. All columns will be used as candidate rules.

  • y (pl.Series) – Boolean target series indicating true labels.

  • metric (str, default="f1") – Performance metric to optimize. Must be a column name produced by compute_metrics (e.g., “f1”, “accuracy”, “precision”, “recall”).

  • max_rules (int, default=5) – Maximum number of rules in a combination.

  • operator (str, default="or") – Boolean operator for combining rules: ‘or’ or ‘and’.

  • weights (pl.Series | None, default=None) – Optional sample weights for weighted metric computation.

  • min_improvement (float, default=0.0) – Minimum metric improvement required over parent combination to expand a node. Acts as a pruning criterion.

  • return_top_k (int, default=10) – Number of top combinations to return. Set to 1 for single best.

Returns:

DataFrame containing columns for the top rule combinations found. Each column represents one combination, with the column name showing the combined rule expression. Ordered by metric value (best first).

Return type:

pl.DataFrame

Examples

>>> import polars as pl
>>> R = pl.DataFrame({"rule_A": [True, False, True],
...                   "rule_B": [False, True, True],
...                   "rule_C": [True, True, False]})
>>> y = pl.Series([True, True, False])
>>> # Find single best combination
>>> best = combine_rules_a_star(R, y, metric="f1", return_top_k=1)
>>> # Find top 5 combinations
>>> top_5 = combine_rules_a_star(R, y, metric="f1", return_top_k=5)
Raises:

ValueError – If operator is not ‘or’ or ‘and’, or if metric column not found.

Notes

A* is guaranteed to find the optimal solution if the heuristic is admissible (never overestimates the true cost). The heuristic used here estimates the best possible improvement from remaining rules, which is optimistic and thus admissible.