Weight Transformations#

Functions#

generate_increasing_weights#

iguanas.weight_transformations.generate_increasing_weights(X: polars.Series | polars.DataFrame, powers: numpy.ndarray | None = None) polars.DataFrame[source]#

Generate weight transformations where larger input values receive larger weights.

Parameters:
  • X (pl.Series | pl.DataFrame) – Numerical Polars Series or DataFrame to transform. If a DataFrame, transformations are applied to each column and concatenated horizontally.

  • powers (np.ndarray | None, default=[0.25, 0.5, 1.0, 2.0, 4.0]) – Power values for polynomial transformations.

Returns:

Each column is a different weight transformation (baseline, powers, log).

Return type:

pl.DataFrame

Examples

>>> import polars as pl
>>> s = pl.Series("amount", [0.0, 10.0, 50.0, 100.0])
>>> df = generate_increasing_weights(s)
>>> df.columns  # 'Baseline', '(1+x)^0.25__amount', ..., 'log(1+x)__amount'
>>> # DataFrame input: each column processed independently
>>> X = pl.DataFrame({"a": [1.0, 2.0, 3.0], "b": [4.0, 5.0, 6.0]})
>>> generate_increasing_weights(X).shape
(3, ...)  # 1 Baseline + 5 power cols + 1 log col per feature, minus duplicate Baselines

generate_decreasing_weights#

iguanas.weight_transformations.generate_decreasing_weights(X: polars.Series | polars.DataFrame, powers: numpy.ndarray | None = None) polars.DataFrame[source]#

Generate weight transformations where smaller input values receive larger weights.

Parameters:
  • X (pl.Series | pl.DataFrame) – Numerical Polars Series or DataFrame to transform. If a DataFrame, transformations are applied to each column and concatenated horizontally.

  • powers (np.ndarray | None, default=[0.25, 0.5, 1.0, 2.0, 4.0]) – Power values for reciprocal transformations (1/(1+x)^power).

Returns:

Each column is a different inverse weight transformation.

Return type:

pl.DataFrame

Examples

>>> import polars as pl
>>> s = pl.Series("amount", [0.0, 10.0, 50.0, 100.0])
>>> df = generate_decreasing_weights(s)
>>> df.columns  # '1/(1+x)__amount', '1/(1+x)^0.25__amount', ..., '1/log(1+x)__amount'

generate_weights#

iguanas.weight_transformations.generate_weights(X: polars.Series | polars.DataFrame, powers: numpy.ndarray | None = None) polars.DataFrame[source]#

Generate all weight transformations (increasing and decreasing).

Parameters:
  • X (pl.Series | pl.DataFrame) – Numerical Polars Series or DataFrame to transform. If a DataFrame, transformations are applied to each column and concatenated horizontally.

  • powers (np.ndarray | None, default=[0.25, 0.5, 1.0, 2.0, 4.0]) – Power values used for both increasing and decreasing transformations.

Returns:

Combined increasing and decreasing weight transformations in one DataFrame.

Return type:

pl.DataFrame

Examples

>>> import polars as pl
>>> s = pl.Series("amount", [0.0, 10.0, 50.0, 100.0])
>>> df = generate_weights(s)
>>> # Columns include both (1+x)^p and 1/(1+x)^p families plus log variants

select_uncorrelated_weights#

iguanas.weight_transformations.select_uncorrelated_weights(sample_weights_df: polars.DataFrame, importance: dict[str, float], target_len: int, min_corr: float = 0.01, max_corr: float = 0.99, step: float = 0.01, use_abs: bool = False) tuple[list[str], float][source]#

Return a filtered set of weight columns closest to a target length.

The function searches the correlation threshold range [min_corr, max_corr] (discretised by step) using binary search. For each candidate threshold it calls iguanas.rule_selection.filter_correlated_rules() with max_corr set to the candidate value and returns the first filtered list whose length is >= target_len.

Parameters:
  • sample_weights_df (pl.DataFrame) – DataFrame containing candidate weight series (columns are weight names).

  • importance (dict[str, float]) – Mapping from rule/weight name to importance score used by the filter.

  • target_len (int) – Desired number of selected rules (must be non-negative).

  • min_corr (float, default=0.01) – Minimum correlation threshold to consider (lower bound of search).

  • max_corr (float, default=0.99) – Maximum correlation threshold to consider (upper bound of search).

  • step (float, default=0.01) – Step size used to discretise the correlation thresholds in the search.

  • use_abs (bool, default=False) – If True, use absolute correlation values when filtering.

Returns:

A tuple (filtered_names, corr_value) where filtered_names is the list of selected weight names at the chosen correlation threshold and corr_value is the correlation threshold that produced that list.

Return type:

tuple[list[str], float]

Notes

If target_len is below the minimum achievable length at min_corr, the minimum result is returned. If it is above the maximum achievable length at max_corr, the maximum result is returned. The search discretises thresholds as i * step where i ranges between round(min_corr/step) and round(max_corr/step).

Examples

>>> import polars as pl
>>> df = pl.DataFrame({"w1": [0.1, 0.2], "w2": [0.0, 0.3]})
>>> selected, corr = select_uncorrelated_weights(df, {"w1": 1.0, "w2": 0.5}, 1)