gators.scalers package#

Module contents#

class gators.scalers.ArcSinSquareRootScaler[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Applies arcsine square root transformation for proportion data.

The transformation arcsin(sqrt(X)) is a variance-stabilizing transformation commonly used for proportion or percentage data. It’s particularly useful when data are bounded between 0 and 1, making the variance more homogeneous across the range.

This transformation is often used in:

  • Analysis of proportions, percentages, or rates

  • Binomial proportion data

  • Data from beta distributions

Note: Input values should be in the range [0, 1]. Values outside this range will produce NaN or complex numbers.

Parameters:
  • subset (Optional[List[str]], default=None) – List of numeric column names to transform. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.

  • drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.

Examples

Create an instance of the ArcSinSquareRootScaler class:

>>> import polars as pl
>>> from gators.scalers import ArcSinSquareRootScaler
>>> scaler = ArcSinSquareRootScaler(subset=["success_rate", "conversion_rate"])

Fit the transformer:

>>> X = pl.DataFrame({
...     "success_rate": [0.1, 0.25, 0.5, 0.75, 0.9],
...     "conversion_rate": [0.05, 0.15, 0.30, 0.60, 0.85]
... })
>>> scaler.fit(X)

Transform the DataFrame:

>>> transformed_X = scaler.transform(X)
>>> print(transformed_X)
shape: (5, 2)
┌─────────────────────────┬────────────────────────────┐
│ success_rate__arcsin    ┆ conversion_rate__arcsin    │
│ ---                     ┆ ---                        │
│ f64                     ┆ f64                        │
├─────────────────────────┼────────────────────────────┤
│ 0.322                   ┆ 0.227                      │
│ 0.524                   ┆ 0.395                      │
│ 0.785                   ┆ 0.588                      │
│ 1.047                   ┆ 0.908                      │
│ 1.249                   ┆ 1.150                      │
└─────────────────────────┴────────────────────────────┘

Notes

The transformation maps [0, 1] to [0, π/2] (0 to ~1.571).

fit(X, y=None)[source]#

Fit the transformer by storing column names.

Parameters:
  • X (DataFrame) – Input DataFrame to fit.

  • y (Series | None) – Target series (not used, present for sklearn compatibility).

Returns:

The fitted transformer instance.

Return type:

ArcSinSquareRootScaler

transform(X)[source]#

Transform the input DataFrame by applying arcsin(sqrt(X)).

Parameters:

X (DataFrame) – Input DataFrame to transform. Values should be in [0, 1].

Returns:

Transformed DataFrame with arcsin-transformed columns.

Return type:

DataFrame

class gators.scalers.ArcSinhScaler[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Applies inverse hyperbolic sine (arcsinh) transformation.

The transformation asinh(X) = log(X + sqrt(X^2 + 1)) is similar to log transformation but can handle zero and negative values. It’s useful for stabilizing variance in data with both positive and negative values or wide dynamic range.

Properties:

  • Defined for all real numbers (unlike log)

  • Approximately linear near zero

  • Approximately logarithmic for large |X|

  • Symmetric around zero: asinh(-X) = -asinh(X)

This transformation is commonly used in:

  • Financial data with positive and negative returns

  • Data with extreme outliers

  • Variables spanning multiple orders of magnitude with zeros

Parameters:
  • subset (Optional[List[str]], default=None) – List of numeric column names to transform. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.

  • drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.

Examples

Create an instance of the ArcSinhScaler class:

>>> import polars as pl
>>> from gators.scalers import ArcSinhScaler
>>> scaler = ArcSinhScaler(subset=["returns", "profit"])

Fit the transformer:

>>> X = pl.DataFrame({
...     "returns": [-100, -10, 0, 10, 100],
...     "profit": [-50, -5, 0, 5, 50]
... })
>>> scaler.fit(X)

Transform the DataFrame:

>>> transformed_X = scaler.transform(X)
>>> print(transformed_X)
shape: (5, 2)
┌───────────────────┬─────────────────┐
│ returns__arcsinh  ┆ profit__arcsinh │
│ ---               ┆ ---             │
│ f64               ┆ f64             │
├───────────────────┼─────────────────┤
│ -5.298            ┆ -4.606          │
│ -2.998            ┆ -2.312          │
│ 0.0               ┆ 0.0             │
│ 2.998             ┆ 2.312           │
│ 5.298             ┆ 4.606           │
└───────────────────┴─────────────────┘

Notes

The transformation is symmetric: - asinh(X) ≈ log(2X) for large positive X - asinh(X) ≈ X for X near 0 - asinh(-X) = -asinh(X)

fit(X, y=None)[source]#

Fit the transformer by storing column names.

Parameters:
  • X (DataFrame) – Input DataFrame to fit.

  • y (Series | None) – Target series (not used, present for sklearn compatibility).

Returns:

The fitted transformer instance.

Return type:

ArcSinhScaler

transform(X)[source]#

Transform the input DataFrame by applying asinh(X).

Parameters:

X (DataFrame) – Input DataFrame to transform.

Returns:

Transformed DataFrame with arcsinh-transformed columns.

Return type:

DataFrame

class gators.scalers.BoxCox[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Applies the Box-Cox power transformation to numeric features.

The Box-Cox transformation is a family of power transformations that can help normalize skewed data and stabilize variance. Unlike Yeo-Johnson, Box-Cox requires all values to be strictly positive (x > 0).

For each feature x with parameter lambda:

  • If lambda != 0: (x^lambda - 1) / lambda

  • If lambda == 0: log(x)

Parameters:
  • lambdas (Dict[str, Union[int, float]]) – Dictionary mapping column names to their lambda (power) parameters. Lambda values typically range from -2 to 2.

  • drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.

Examples

Create an instance of the BoxCox class:

>>> import polars as pl
>>> from gators.scalers import BoxCox
>>> transformer = BoxCox(lambdas={"sales": 0.5, "price": 0.0})

Fit the transformer:

>>> X = pl.DataFrame({"sales": [10, 20, 30, 40],
...                    "price": [5, 15, 25, 35]})
>>> transformer.fit(X)

Transform the DataFrame:

>>> transformed_X = transformer.transform(X)
>>> print(transformed_X)
shape: (4, 2)
┌─────────────────┬─────────────────┐
│ sales__boxcox   ┆ price__boxcox   │
│ ---             ┆ ---             │
│ f64             ┆ f64             │
├─────────────────┼─────────────────┤
│ ...             ┆ ...             │
└─────────────────┴─────────────────┘

Notes

All input values must be strictly positive (> 0). Negative or zero values will produce invalid results. Use Yeo-Johnson transformation if you need to handle zero or negative values.

fit(X, y=None)[source]#

Fit the transformer by storing column names.

Parameters:
  • X (DataFrame) – Input DataFrame to fit. All values in specified columns must be positive.

  • y (Series | None) – Target series (not used, present for sklearn compatibility).

Returns:

The fitted transformer instance.

Return type:

BoxCox

transform(X)[source]#

Transform the input DataFrame by applying Box-Cox transformation.

Parameters:

X (DataFrame) – Input DataFrame to transform. All values in specified columns must be strictly positive (> 0).

Returns:

Transformed DataFrame with power-transformed columns.

Return type:

DataFrame

class gators.scalers.LogScaler[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Applies logarithm transformation with choice of base.

Log transformation is useful for:

  • Reducing right skewness in data

  • Stabilizing variance

  • Converting multiplicative relationships to additive

  • Compressing large value ranges

Supports three bases:

  • ‘e’: Natural logarithm ln(X) / log_e(X)

  • ‘10’: Base-10 logarithm

  • ‘2’: Base-2 logarithm

Note: Only positive values can be transformed. Zero and negative values will result in null/inf values.

Parameters:
  • subset (Optional[List[str]], default=None) – List of numeric column names to transform. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.

  • base (Literal['e', '10', '2'], default='e') – The logarithm base to use: - ‘e’: ln(X) - ‘10’: log10(X) - ‘2’: log2(X)

  • drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.

Examples

Create an instance of the LogScaler class with natural log:

>>> import polars as pl
>>> from gators.scalers import LogScaler
>>> scaler = LogScaler(subset=["sales", "revenue"], base="e")

Fit the transformer:

>>> X = pl.DataFrame({
...     "sales": [1, 10, 100, 1000],
...     "revenue": [10, 100, 1000, 10000]
... })
>>> scaler.fit(X)

Transform the DataFrame:

>>> transformed_X = scaler.transform(X)
>>> print(transformed_X)
shape: (4, 2)
┌───────────────┬──────────────────┐
│ sales__log_ln ┆ revenue__log_ln  │
│ ---           ┆ ---              │
│ f64           ┆ f64              │
├───────────────┼──────────────────┤
│ 0.0           ┆ 2.303            │
│ 2.303         ┆ 4.605            │
│ 4.605         ┆ 6.908            │
│ 6.908         ┆ 9.210            │
└───────────────┴──────────────────┘
>>> # Using log10
>>> scaler10 = LogScaler(subset=["count"], base="10")
>>> X2 = pl.DataFrame({"count": [1, 10, 100, 1000]})
>>> scaler10.fit(X2)
>>> scaler10.transform(X2)
shape: (4, 1)
┌─────────────────┐
│ count__log_10   │
│ ---             │
│ f64             │
├─────────────────┤
│ 0.0             │
│ 1.0             │
│ 2.0             │
│ 3.0             │
└─────────────────┘
>>> # Using log2
>>> scaler2 = LogScaler(subset=["size"], base="2")
>>> X3 = pl.DataFrame({"size": [1, 2, 4, 8, 16]})
>>> scaler2.fit(X3)
>>> scaler2.transform(X3)
shape: (5, 1)
┌──────────────┐
│ size__log_2  │
│ ---          │
│ f64          │
├──────────────┤
│ 0.0          │
│ 1.0          │
│ 2.0          │
│ 3.0          │
│ 4.0          │
└──────────────┘
fit(X, y=None)[source]#

Fit the transformer by storing column names.

Parameters:
  • X (DataFrame) – Input DataFrame to fit.

  • y (Series | None) – Target series (not used, present for sklearn compatibility).

Returns:

The fitted transformer instance.

Return type:

LogScaler

transform(X)[source]#

Transform the input DataFrame by applying logarithm.

Parameters:

X (DataFrame) – Input DataFrame to transform. Values should be positive.

Returns:

Transformed DataFrame with log-transformed columns.

Return type:

DataFrame

Notes

Zero and negative values will result in null or -inf values.

class gators.scalers.MinmaxScaler[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Scales numeric features to a [0, 1] range using min-max normalization.

Transforms features by scaling each feature to the range [0, 1] based on the minimum and maximum values observed during fitting. The transformation is given by: X_scaled = (X - X_min) / (X_max - X_min).

Parameters:
  • subset (Optional[List[str]], default=None) – List of numeric column names to scale. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.

  • drop_columns (bool, default=True) – If True, drop the original columns after scaling. If False, keep both original and scaled columns.

Examples

Create an instance of the MinmaxScaler class:

>>> import polars as pl
>>> from gators.scalers import MinmaxScaler
>>> scaler = MinmaxScaler(subset=["age", "income"])

Fit the transformer:

>>> X = pl.DataFrame({"age": [20, 30, 40, 50],
...                    "income": [20000, 40000, 60000, 80000]})
>>> scaler.fit(X)

Transform the DataFrame:

>>> transformed_X = scaler.transform(X)
>>> print(transformed_X)
shape: (4, 2)
┌───────────────────┬─────────────────────┐
│ age__minmax_scale ┆ income__minmax_scale│
│ ---               ┆ ---                 │
│ f64               ┆ f64                 │
├───────────────────┼─────────────────────┤
│ 0.0               ┆ 0.0                 │
│ 0.333             ┆ 0.333               │
│ 0.667             ┆ 0.667               │
│ 1.0               ┆ 1.0                 │
└───────────────────┴─────────────────────┘
fit(X, y=None)[source]#

Fit the transformer by computing min and max values.

Parameters:
  • X (DataFrame) – Input DataFrame to fit.

  • y (Series | None) – Target series (not used, present for sklearn compatibility).

Returns:

The fitted transformer instance.

Return type:

MinmaxScaler

transform(X)[source]#

Transform the input DataFrame by applying min-max scaling.

Parameters:

X (DataFrame) – Input DataFrame to transform.

Returns:

Transformed DataFrame with scaled columns.

Return type:

DataFrame

class gators.scalers.PowerScaler[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Scales numeric features using power transformation X^power.

Applies a power transformation to selected features. Useful for reducing skewness or adjusting the scale of features. Common power values:

  • power < 1: compress large values (e.g., 0.5 for square root)

  • power > 1: expand large values (e.g., 2 for squaring)

Parameters:
  • subset (Optional[List[str]], default=None) – List of numeric column names to transform. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.

  • power (float, default=0.5) – The power exponent to apply. Default is 0.5 (square root).

  • drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.

Examples

Create an instance of the PowerScaler class:

>>> import polars as pl
>>> from gators.scalers import PowerScaler
>>> scaler = PowerScaler(subset=["sales", "revenue"], power=0.5)

Fit the transformer:

>>> X = pl.DataFrame({"sales": [100, 400, 900, 1600],
...                    "revenue": [1000, 4000, 9000, 16000]})
>>> scaler.fit(X)

Transform the DataFrame:

>>> transformed_X = scaler.transform(X)
>>> print(transformed_X)
shape: (4, 2)
┌──────────────────┬─────────────────────┐
│ sales__power_0.5 ┆ revenue__power_0.5  │
│ ---              ┆ ---                 │
│ f64              ┆ f64                 │
├──────────────────┼─────────────────────┤
│ 10.0             ┆ 31.62               │
│ 20.0             ┆ 63.25               │
│ 30.0             ┆ 94.87               │
│ 40.0             ┆ 126.49              │
└──────────────────┴─────────────────────┘
>>> # Square transformation
>>> scaler2 = PowerScaler(subset=["count"], power=2.0)
>>> X2 = pl.DataFrame({"count": [1, 2, 3, 4]})
>>> scaler2.fit(X2)
>>> scaler2.transform(X2)
shape: (4, 1)
┌────────────────┐
│ count__power_2 │
│ ---            │
│ f64            │
├────────────────┤
│ 1.0            │
│ 4.0            │
│ 9.0            │
│ 16.0           │
└────────────────┘
fit(X, y=None)[source]#

Fit the transformer by storing column names.

Parameters:
  • X (DataFrame) – Input DataFrame to fit.

  • y (Series | None) – Target series (not used, present for sklearn compatibility).

Returns:

The fitted transformer instance.

Return type:

PowerScaler

transform(X)[source]#

Transform the input DataFrame by applying power transformation.

Parameters:

X (DataFrame) – Input DataFrame to transform.

Returns:

Transformed DataFrame with power-transformed columns.

Return type:

DataFrame

class gators.scalers.StandardScaler[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Standardizes numeric features by removing the mean and scaling to unit variance.

Transforms features by centering them around zero and scaling by the standard deviation. The transformation is given by: X_scaled = (X - mean) / std. This is also known as z-score normalization.

Parameters:
  • subset (Optional[List[str]], default=None) – List of numeric column names to standardize. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.

  • drop_columns (bool, default=True) – If True, drop the original columns after scaling. If False, keep both original and scaled columns.

Examples

Create an instance of the StandardScaler class:

>>> import polars as pl
>>> from gators.scalers import StandardScaler
>>> scaler = StandardScaler(subset=["age", "income"])

Fit the transformer:

>>> X = pl.DataFrame({"age": [20, 30, 40, 50],
...                    "income": [20000, 40000, 60000, 80000]})
>>> scaler.fit(X)

Transform the DataFrame:

>>> transformed_X = scaler.transform(X)
>>> print(transformed_X)
shape: (4, 2)
┌────────────────────┬──────────────────────┐
│ age__standard_scale ┆ income__standard_scale│
│ ---                 ┆ ---                   │
│ f64                 ┆ f64                   │
├────────────────────┼──────────────────────┤
│ -1.161              ┆ -1.161                │
│ -0.387              ┆ -0.387                │
│ 0.387               ┆ 0.387                 │
│ 1.161               ┆ 1.161                 │
└────────────────────┴──────────────────────┘
fit(X, y=None)[source]#

Fit the transformer by computing mean and standard deviation.

Parameters:
  • X (DataFrame) – Input DataFrame to fit.

  • y (Series | None) – Target series (not used, present for sklearn compatibility).

Returns:

The fitted transformer instance.

Return type:

StandardScaler

transform(X)[source]#

Transform the input DataFrame by applying standard scaling.

Parameters:

X (DataFrame) – Input DataFrame to transform.

Returns:

Transformed DataFrame with standardized columns.

Return type:

DataFrame

class gators.scalers.YeoJonhson[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Applies the Yeo-Johnson power transformation to numeric features.

The Yeo-Johnson transformation is a family of power transformations that can be applied to both positive and negative values (unlike Box-Cox which requires positive values). It can help normalize skewed data and stabilize variance.

For each feature x with parameter lambda:

  • If x >= 0 and lambda != 0: ((x + 1)^lambda - 1) / lambda

  • If x >= 0 and lambda == 0: log(x + 1)

  • If x < 0 and lambda != 2: -((-x + 1)^(2-lambda) - 1) / (2 - lambda)

  • If x < 0 and lambda == 2: -log(-x + 1)

Parameters:
  • lambdas (Dict[str, Union[int, float]]) – Dictionary mapping column names to their lambda (power) parameters. Lambda values typically range from -2 to 2.

  • drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.

Examples

Create an instance of the YeoJonhson class:

>>> import polars as pl
>>> from gators.scalers import YeoJonhson
>>> transformer = YeoJonhson(lambdas={"sales": 0.5, "profit": 0.0})

Fit the transformer:

>>> X = pl.DataFrame({"sales": [10, 20, 30, 40],
...                    "profit": [-5, 5, 15, 25]})
>>> transformer.fit(X)

Transform the DataFrame:

>>> transformed_X = transformer.transform(X)
>>> print(transformed_X)
shape: (4, 2)
┌───────────────────┬────────────────────┐
│ sales__yeojonhson ┆ profit__yeojonhson │
│ ---               ┆ ---                │
│ f64               ┆ f64                │
├───────────────────┼────────────────────┤
│ ...               ┆ ...                │
└───────────────────┴────────────────────┘
fit(X, y=None)[source]#

Fit the transformer by storing column names.

Parameters:
  • X (DataFrame) – Input DataFrame to fit.

  • y (Series | None) – Target series (not used, present for sklearn compatibility).

Returns:

The fitted transformer instance.

Return type:

YeoJonhson

transform(X)[source]#

Transform the input DataFrame by applying Yeo-Johnson transformation.

Parameters:

X (DataFrame) – Input DataFrame to transform.

Returns:

Transformed DataFrame with power-transformed columns.

Return type:

DataFrame