gators.scalers package#

Module contents#

class gators.scalers.ArcSinSquareRootScaler[source]#

Bases: gators.transformer._base_transformer._BaseTransformer

Applies arcsine square root transformation for proportion data.

The transformation arcsin(sqrt(X)) is a variance-stabilizing transformation commonly used for proportion or percentage data. It’s particularly useful when data are bounded between 0 and 1, making the variance more homogeneous across the range.

This transformation is often used in:

Analysis of proportions, percentages, or rates
Binomial proportion data
Data from beta distributions

Note: Input values should be in the range [0, 1]. Values outside this range will produce NaN or complex numbers.

Parameters:

subset (list[str], default=None) – List of numeric column names to transform. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.
drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.

Examples

Create an instance of the ArcSinSquareRootScaler class:

>>> import polars as pl
>>> from gators.scalers import ArcSinSquareRootScaler
>>> scaler = ArcSinSquareRootScaler(subset=["success_rate", "conversion_rate"])

Fit the transformer:

>>> X = pl.DataFrame({
...     "success_rate": [0.1, 0.25, 0.5, 0.75, 0.9],
...     "conversion_rate": [0.05, 0.15, 0.30, 0.60, 0.85]
... })
>>> scaler.fit(X)

Transform the DataFrame:

>>> transformed_X = scaler.transform(X)
>>> print(transformed_X)
shape: (5, 2)
┌─────────────────────────┬────────────────────────────┐
│ success_rate__arcsin    ┆ conversion_rate__arcsin    │
│ ---                     ┆ ---                        │
│ f64                     ┆ f64                        │
├─────────────────────────┼────────────────────────────┤
│ 0.322                   ┆ 0.227                      │
│ 0.524                   ┆ 0.395                      │
│ 0.785                   ┆ 0.588                      │
│ 1.047                   ┆ 0.908                      │
│ 1.249                   ┆ 1.150                      │
└─────────────────────────┴────────────────────────────┘

Notes

The transformation maps [0, 1] to [0, π/2] (0 to ~1.571).

fit(X: polars.DataFrame, y: polars.Series | None = None) → gators.scalers.arcsin_squareroot_scaler.ArcSinSquareRootScaler[source]#

Fit the transformer by storing column names.

Parameters:

X (pl.DataFrame) – Input DataFrame to fit.
y (pl.Series, default=None) – Target series (not used, present for sklearn compatibility).

Returns:

The fitted transformer instance.

Return type:

ArcSinSquareRootScaler

transform(X: polars.DataFrame) → polars.DataFrame[source]#

Transform the input DataFrame by applying arcsin(sqrt(X)).

Parameters:: X (pl.DataFrame) – Input DataFrame to transform. Values should be in [0, 1].
Returns:: Transformed DataFrame with arcsin-transformed columns.
Return type:: pl.DataFrame

class gators.scalers.ArcSinhScaler[source]#

Bases: gators.transformer._base_transformer._BaseTransformer

Applies inverse hyperbolic sine (arcsinh) transformation.

The transformation asinh(X) = log(X + sqrt(X^2 + 1)) is similar to log transformation but can handle zero and negative values. It’s useful for stabilizing variance in data with both positive and negative values or wide dynamic range.

Properties:

Defined for all real numbers (unlike log)
Approximately linear near zero
Approximately logarithmic for large abs(X)
Symmetric around zero: asinh(-X) = -asinh(X)

This transformation is commonly used in:

Financial data with positive and negative returns
Data with extreme outliers
Variables spanning multiple orders of magnitude with zeros

Parameters:

subset (list[str], default=None) – List of numeric column names to transform. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.
drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.

Examples

Create an instance of the ArcSinhScaler class:

>>> import polars as pl
>>> from gators.scalers import ArcSinhScaler
>>> scaler = ArcSinhScaler(subset=["returns", "profit"])

Fit the transformer:

>>> X = pl.DataFrame({
...     "returns": [-100, -10, 0, 10, 100],
...     "profit": [-50, -5, 0, 5, 50]
... })
>>> scaler.fit(X)

Transform the DataFrame:

>>> transformed_X = scaler.transform(X)
>>> print(transformed_X)
shape: (5, 2)
┌───────────────────┬─────────────────┐
│ returns__arcsinh  ┆ profit__arcsinh │
│ ---               ┆ ---             │
│ f64               ┆ f64             │
├───────────────────┼─────────────────┤
│ -5.298            ┆ -4.606          │
│ -2.998            ┆ -2.312          │
│ 0.0               ┆ 0.0             │
│ 2.998             ┆ 2.312           │
│ 5.298             ┆ 4.606           │
└───────────────────┴─────────────────┘

Notes

The transformation is symmetric: - asinh(X) ≈ log(2X) for large positive X - asinh(X) ≈ X for X near 0 - asinh(-X) = -asinh(X)

fit(X: polars.DataFrame, y: polars.Series | None = None) → gators.scalers.arcsinh_scaler.ArcSinhScaler[source]#

Fit the transformer by storing column names.

Parameters:

X (pl.DataFrame) – Input DataFrame to fit.
y (pl.Series, default=None) – Target series (not used, present for sklearn compatibility).

Returns:

The fitted transformer instance.

Return type:

ArcSinhScaler

transform(X: polars.DataFrame) → polars.DataFrame[source]#

Transform the input DataFrame by applying asinh(X).

Parameters:: X (pl.DataFrame) – Input DataFrame to transform.
Returns:: Transformed DataFrame with arcsinh-transformed columns.
Return type:: pl.DataFrame

class gators.scalers.BoxCox[source]#

Bases: gators.transformer._base_transformer._BaseTransformer

Applies the Box-Cox power transformation to numeric features.

The Box-Cox transformation is a family of power transformations that can help normalize skewed data and stabilize variance. Unlike Yeo-Johnson, Box-Cox requires all values to be strictly positive (x > 0).

For each feature x with parameter lambda:

If lambda != 0: (x^lambda - 1) / lambda
If lambda == 0: log(x)

Parameters:

lambdas (dict[str, int | float]) – Dictionary mapping column names to their lambda (power) parameters. Lambda values typically range from -2 to 2.
drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.

Examples

Create an instance of the BoxCox class:

>>> import polars as pl
>>> from gators.scalers import BoxCox
>>> transformer = BoxCox(lambdas={"sales": 0.5, "price": 0.0})

Fit the transformer:

>>> X = pl.DataFrame({"sales": [10, 20, 30, 40],
...                    "price": [5, 15, 25, 35]})
>>> transformer.fit(X)

Transform the DataFrame:

>>> transformed_X = transformer.transform(X)
>>> print(transformed_X)
shape: (4, 2)
┌─────────────────┬─────────────────┐
│ sales__boxcox   ┆ price__boxcox   │
│ ---             ┆ ---             │
│ f64             ┆ f64             │
├─────────────────┼─────────────────┤
│ ...             ┆ ...             │
└─────────────────┴─────────────────┘

Notes

All input values must be strictly positive (> 0). Negative or zero values will produce invalid results. Use Yeo-Johnson transformation if you need to handle zero or negative values.

fit(X: polars.DataFrame, y: polars.Series | None = None) → gators.scalers.box_cox.BoxCox[source]#

Fit the transformer by storing column names.

Parameters:

X (pl.DataFrame) – Input DataFrame to fit. All values in specified columns must be positive.
y (pl.Series, default=None) – Target series (not used, present for sklearn compatibility).

Returns:

The fitted transformer instance.

Return type:

BoxCox

transform(X: polars.DataFrame) → polars.DataFrame[source]#

Transform the input DataFrame by applying Box-Cox transformation.

Parameters:: X (pl.DataFrame) – Input DataFrame to transform. All values in specified columns must be strictly positive (> 0).
Returns:: Transformed DataFrame with power-transformed columns.
Return type:: pl.DataFrame

class gators.scalers.LogScaler[source]#

Bases: gators.transformer._base_transformer._BaseTransformer

Applies logarithm transformation with choice of base.

Log transformation is useful for:

Reducing right skewness in data
Stabilizing variance
Converting multiplicative relationships to additive
Compressing large value ranges

Supports three bases:

‘e’: Natural logarithm ln(X) / log_e(X)
‘10’: Base-10 logarithm
‘2’: Base-2 logarithm

Note: Only positive values can be transformed. Zero and negative values will result in null/inf values.

Parameters:

subset (list[str], default=None) – List of numeric column names to transform. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.
base (Literal['e', '10', '2'], default='e') – The logarithm base to use: - ‘e’: ln(X) - ‘10’: log10(X) - ‘2’: log2(X)
drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.

Examples

Create an instance of the LogScaler class with natural log:

>>> import polars as pl
>>> from gators.scalers import LogScaler
>>> scaler = LogScaler(subset=["sales", "revenue"], base="e")

Fit the transformer:

>>> X = pl.DataFrame({
...     "sales": [1, 10, 100, 1000],
...     "revenue": [10, 100, 1000, 10000]
... })
>>> scaler.fit(X)

Transform the DataFrame:

>>> transformed_X = scaler.transform(X)
>>> print(transformed_X)
shape: (4, 2)
┌───────────────┬──────────────────┐
│ sales__log_ln ┆ revenue__log_ln  │
│ ---           ┆ ---              │
│ f64           ┆ f64              │
├───────────────┼──────────────────┤
│ 0.0           ┆ 2.303            │
│ 2.303         ┆ 4.605            │
│ 4.605         ┆ 6.908            │
│ 6.908         ┆ 9.210            │
└───────────────┴──────────────────┘

>>> # Using log10
>>> scaler10 = LogScaler(subset=["count"], base="10")
>>> X2 = pl.DataFrame({"count": [1, 10, 100, 1000]})
>>> scaler10.fit(X2)
>>> scaler10.transform(X2)
shape: (4, 1)
┌─────────────────┐
│ count__log_10   │
│ ---             │
│ f64             │
├─────────────────┤
│ 0.0             │
│ 1.0             │
│ 2.0             │
│ 3.0             │
└─────────────────┘

>>> # Using log2
>>> scaler2 = LogScaler(subset=["size"], base="2")
>>> X3 = pl.DataFrame({"size": [1, 2, 4, 8, 16]})
>>> scaler2.fit(X3)
>>> scaler2.transform(X3)
shape: (5, 1)
┌──────────────┐
│ size__log_2  │
│ ---          │
│ f64          │
├──────────────┤
│ 0.0          │
│ 1.0          │
│ 2.0          │
│ 3.0          │
│ 4.0          │
└──────────────┘

fit(X: polars.DataFrame, y: polars.Series | None = None) → gators.scalers.log_scaler.LogScaler[source]#

Fit the transformer by storing column names.

Parameters:

X (pl.DataFrame) – Input DataFrame to fit.
y (pl.Series, default=None) – Target series (not used, present for sklearn compatibility).

Returns:

The fitted transformer instance.

Return type:

LogScaler

transform(X: polars.DataFrame) → polars.DataFrame[source]#

Transform the input DataFrame by applying logarithm.

Parameters:: X (pl.DataFrame) – Input DataFrame to transform. Values should be positive.
Returns:: Transformed DataFrame with log-transformed columns.
Return type:: pl.DataFrame

Notes

Zero and negative values will result in null or -inf values.

class gators.scalers.MinmaxScaler[source]#

Bases: gators.transformer._base_transformer._BaseTransformer

Scales numeric features to a [0, 1] range using min-max normalization.

Transforms features by scaling each feature to the range [0, 1] based on the minimum and maximum values observed during fitting. The transformation is given by: X_scaled = (X - X_min) / (X_max - X_min).

Parameters:

subset (list[str], default=None) – List of numeric column names to scale. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.
drop_columns (bool, default=True) – If True, drop the original columns after scaling. If False, keep both original and scaled columns.

Examples

Create an instance of the MinmaxScaler class:

>>> import polars as pl
>>> from gators.scalers import MinmaxScaler
>>> scaler = MinmaxScaler(subset=["age", "income"])

Fit the transformer:

>>> X = pl.DataFrame({"age": [20, 30, 40, 50],
...                    "income": [20000, 40000, 60000, 80000]})
>>> scaler.fit(X)

Transform the DataFrame:

>>> transformed_X = scaler.transform(X)
>>> print(transformed_X)
shape: (4, 2)
┌───────────────────┬─────────────────────┐
│ age__minmax_scale ┆ income__minmax_scale│
│ ---               ┆ ---                 │
│ f64               ┆ f64                 │
├───────────────────┼─────────────────────┤
│ 0.0               ┆ 0.0                 │
│ 0.333             ┆ 0.333               │
│ 0.667             ┆ 0.667               │
│ 1.0               ┆ 1.0                 │
└───────────────────┴─────────────────────┘

fit(X: polars.DataFrame, y: polars.Series | None = None) → gators.scalers.minmax_scaler.MinmaxScaler[source]#

Fit the transformer by computing min and max values.

Parameters:

X (pl.DataFrame) – Input DataFrame to fit.
y (pl.Series, default=None) – Target series (not used, present for sklearn compatibility).

Returns:

The fitted transformer instance.

Return type:

MinmaxScaler

transform(X: polars.DataFrame) → polars.DataFrame[source]#

Transform the input DataFrame by applying min-max scaling.

Parameters:: X (pl.DataFrame) – Input DataFrame to transform.
Returns:: Transformed DataFrame with scaled columns.
Return type:: pl.DataFrame

class gators.scalers.PowerScaler[source]#

Bases: gators.transformer._base_transformer._BaseTransformer

Scales numeric features using power transformation X^power.

Applies a power transformation to selected features. Useful for reducing skewness or adjusting the scale of features. Common power values:

power < 1: compress large values (e.g., 0.5 for square root)
power > 1: expand large values (e.g., 2 for squaring)

Parameters:

subset (list[str], default=None) – List of numeric column names to transform. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.
power (float, default=0.5) – The power exponent to apply. Default is 0.5 (square root).
drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.

Examples

Create an instance of the PowerScaler class:

>>> import polars as pl
>>> from gators.scalers import PowerScaler
>>> scaler = PowerScaler(subset=["sales", "revenue"], power=0.5)

Fit the transformer:

>>> X = pl.DataFrame({"sales": [100, 400, 900, 1600],
...                    "revenue": [1000, 4000, 9000, 16000]})
>>> scaler.fit(X)

Transform the DataFrame:

>>> transformed_X = scaler.transform(X)
>>> print(transformed_X)
shape: (4, 2)
┌──────────────────┬─────────────────────┐
│ sales__power_0.5 ┆ revenue__power_0.5  │
│ ---              ┆ ---                 │
│ f64              ┆ f64                 │
├──────────────────┼─────────────────────┤
│ 10.0             ┆ 31.62               │
│ 20.0             ┆ 63.25               │
│ 30.0             ┆ 94.87               │
│ 40.0             ┆ 126.49              │
└──────────────────┴─────────────────────┘

>>> # Square transformation
>>> scaler2 = PowerScaler(subset=["count"], power=2.0)
>>> X2 = pl.DataFrame({"count": [1, 2, 3, 4]})
>>> scaler2.fit(X2)
>>> scaler2.transform(X2)
shape: (4, 1)
┌────────────────┐
│ count__power_2 │
│ ---            │
│ f64            │
├────────────────┤
│ 1.0            │
│ 4.0            │
│ 9.0            │
│ 16.0           │
└────────────────┘

fit(X: polars.DataFrame, y: polars.Series | None = None) → gators.scalers.power_scaler.PowerScaler[source]#

Fit the transformer by storing column names.

Parameters:

X (pl.DataFrame) – Input DataFrame to fit.
y (pl.Series, default=None) – Target series (not used, present for sklearn compatibility).

Returns:

The fitted transformer instance.

Return type:

PowerScaler

transform(X: polars.DataFrame) → polars.DataFrame[source]#

Transform the input DataFrame by applying power transformation.

Parameters:: X (pl.DataFrame) – Input DataFrame to transform.
Returns:: Transformed DataFrame with power-transformed columns.
Return type:: pl.DataFrame

class gators.scalers.RobustScaler[source]#

Bases: gators.transformer._base_transformer._BaseTransformer

Scale numeric features using median and a configurable quantile range.

Applies the transformation:

X_scaled = (X - median) / (Q(q_high) - Q(q_low))

Because centering and scaling are driven by robust statistics (median and inter-quantile range) rather than mean and standard deviation, the scaler is largely unaffected by outliers.

Parameters:

quantile_range (tuple[float, float], default=(0.25, 0.75)) – (q_low, q_high) — the two quantile levels used to compute the scale. Both values must be in [0.0, 1.0] and q_low < q_high. The default (0.25, 0.75) is equivalent to the classic IQR-based robust scaler.
subset (list[str] or None, default=None) – Numeric columns to scale. When None all Float64, Float32, Int64, and Int32 columns are selected automatically.
drop_columns (bool, default=True) – If True the original columns are dropped and only the scaled columns are kept. If False both original and scaled columns are present in the output.

_median#

Fitted median per column.

Type:: dict[str, float]

_scale#

Fitted IQR-based scale (1 / (Q_high - Q_low)) per column. Columns where the quantile range is zero get a scale of 0.0 (i.e. scaled output will be all zeros).

Type:: dict[str, float]

_column_mapping#

Mapping from original column name to scaled column name.

Type:: dict[str, str]

Examples

>>> import polars as pl
>>> from gators.scalers import RobustScaler

>>> X = pl.DataFrame({
...     "age":    [20.0, 30.0, 40.0, 50.0, 200.0],
...     "income": [1000.0, 2000.0, 3000.0, 4000.0, 5000.0],
... })
>>> scaler = RobustScaler(quantile_range=(0.25, 0.75))
>>> scaler.fit(X)
>>> X_scaled = scaler.transform(X)

fit(X: polars.DataFrame, y: polars.Series | None = None) → gators.scalers.robust_scaler.RobustScaler[source]#

Compute median and quantile-range scale for each column.

Parameters:

X (pl.DataFrame) – Input DataFrame to fit.
y (pl.Series, default=None) – Not used; present for sklearn compatibility.

Returns:

The fitted transformer instance.

Return type:

RobustScaler

transform(X: polars.DataFrame) → polars.DataFrame[source]#

Scale the numeric columns using the fitted median and quantile range.

Parameters:: X (pl.DataFrame) – Input DataFrame to transform.
Returns:: DataFrame with robust-scaled columns.
Return type:: pl.DataFrame

class gators.scalers.StandardScaler[source]#

Bases: gators.transformer._base_transformer._BaseTransformer

Standardizes numeric features by removing the mean and scaling to unit variance.

Transforms features by centering them around zero and scaling by the standard deviation. The transformation is given by: X_scaled = (X - mean) / std. This is also known as z-score normalization.

Parameters:

subset (list[str], default=None) – List of numeric column names to standardize. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.
drop_columns (bool, default=True) – If True, drop the original columns after scaling. If False, keep both original and scaled columns.

Examples

Create an instance of the StandardScaler class:

>>> import polars as pl
>>> from gators.scalers import StandardScaler
>>> scaler = StandardScaler(subset=["age", "income"])

Fit the transformer:

>>> X = pl.DataFrame({"age": [20, 30, 40, 50],
...                    "income": [20000, 40000, 60000, 80000]})
>>> scaler.fit(X)

Transform the DataFrame:

>>> transformed_X = scaler.transform(X)
>>> print(transformed_X)
shape: (4, 2)
┌────────────────────┬──────────────────────┐
│ age__standard_scale ┆ income__standard_scale│
│ ---                 ┆ ---                   │
│ f64                 ┆ f64                   │
├────────────────────┼──────────────────────┤
│ -1.161              ┆ -1.161                │
│ -0.387              ┆ -0.387                │
│ 0.387               ┆ 0.387                 │
│ 1.161               ┆ 1.161                 │
└────────────────────┴──────────────────────┘

fit(X: polars.DataFrame, y: polars.Series | None = None) → gators.scalers.standard_scaler.StandardScaler[source]#

Fit the transformer by computing mean and standard deviation.

Parameters:

X (pl.DataFrame) – Input DataFrame to fit.
y (pl.Series, default=None) – Target series (not used, present for sklearn compatibility).

Returns:

The fitted transformer instance.

Return type:

StandardScaler

transform(X: polars.DataFrame) → polars.DataFrame[source]#

Transform the input DataFrame by applying standard scaling.

Parameters:: X (pl.DataFrame) – Input DataFrame to transform.
Returns:: Transformed DataFrame with standardized columns.
Return type:: pl.DataFrame

class gators.scalers.YeoJohnson[source]#

Bases: gators.transformer._base_transformer._BaseTransformer

Applies the Yeo-Johnson power transformation to numeric features.

The Yeo-Johnson transformation is a family of power transformations that can be applied to both positive and negative values (unlike Box-Cox which requires positive values). It can help normalize skewed data and stabilize variance.

For each feature x with parameter lambda:

If x >= 0 and lambda != 0: ((x + 1)^lambda - 1) / lambda
If x >= 0 and lambda == 0: log(x + 1)
If x < 0 and lambda != 2: -((-x + 1)^(2-lambda) - 1) / (2 - lambda)
If x < 0 and lambda == 2: -log(-x + 1)

Parameters:

lambdas (dict[str, int | float]) – Dictionary mapping column names to their lambda (power) parameters. Lambda values typically range from -2 to 2.
drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.

Examples

Create an instance of the YeoJohnson class:

>>> import polars as pl
>>> from gators.scalers import YeoJohnson
>>> transformer = YeoJohnson(lambdas={"sales": 0.5, "profit": 0.0})

Fit the transformer:

>>> X = pl.DataFrame({"sales": [10, 20, 30, 40],
...                    "profit": [-5, 5, 15, 25]})
>>> transformer.fit(X)

Transform the DataFrame:

>>> transformed_X = transformer.transform(X)
>>> print(transformed_X)
shape: (4, 2)
┌───────────────────┬────────────────────┐
│ sales__yeojonhson ┆ profit__yeojonhson │
│ ---               ┆ ---                │
│ f64               ┆ f64                │
├───────────────────┼────────────────────┤
│ ...               ┆ ...                │
└───────────────────┴────────────────────┘

fit(X: polars.DataFrame, y: polars.Series | None = None) → gators.scalers.yeo_johnson.YeoJohnson[source]#

Fit the transformer by storing column names.

Parameters:

X (pl.DataFrame) – Input DataFrame to fit.
y (pl.Series, default=None) – Target series (not used, present for sklearn compatibility).

Returns:

The fitted transformer instance.

Return type:

YeoJohnson

transform(X: polars.DataFrame) → polars.DataFrame[source]#

Transform the input DataFrame by applying Yeo-Johnson transformation.

Parameters:: X (pl.DataFrame) – Input DataFrame to transform.
Returns:: Transformed DataFrame with power-transformed columns.
Return type:: pl.DataFrame