gators.scalers package#
Module contents#
- class gators.scalers.ArcSinSquareRootScaler[source]#
Bases:
BaseModel,BaseEstimator,TransformerMixinApplies arcsine square root transformation for proportion data.
The transformation arcsin(sqrt(X)) is a variance-stabilizing transformation commonly used for proportion or percentage data. It’s particularly useful when data are bounded between 0 and 1, making the variance more homogeneous across the range.
This transformation is often used in:
Analysis of proportions, percentages, or rates
Binomial proportion data
Data from beta distributions
Note: Input values should be in the range [0, 1]. Values outside this range will produce NaN or complex numbers.
- Parameters:
subset (Optional[List[str]], default=None) – List of numeric column names to transform. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.
drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.
Examples
Create an instance of the ArcSinSquareRootScaler class:
>>> import polars as pl >>> from gators.scalers import ArcSinSquareRootScaler >>> scaler = ArcSinSquareRootScaler(subset=["success_rate", "conversion_rate"])
Fit the transformer:
>>> X = pl.DataFrame({ ... "success_rate": [0.1, 0.25, 0.5, 0.75, 0.9], ... "conversion_rate": [0.05, 0.15, 0.30, 0.60, 0.85] ... }) >>> scaler.fit(X)
Transform the DataFrame:
>>> transformed_X = scaler.transform(X) >>> print(transformed_X) shape: (5, 2) ┌─────────────────────────┬────────────────────────────┐ │ success_rate__arcsin ┆ conversion_rate__arcsin │ │ --- ┆ --- │ │ f64 ┆ f64 │ ├─────────────────────────┼────────────────────────────┤ │ 0.322 ┆ 0.227 │ │ 0.524 ┆ 0.395 │ │ 0.785 ┆ 0.588 │ │ 1.047 ┆ 0.908 │ │ 1.249 ┆ 1.150 │ └─────────────────────────┴────────────────────────────┘
Notes
The transformation maps [0, 1] to [0, π/2] (0 to ~1.571).
- class gators.scalers.ArcSinhScaler[source]#
Bases:
BaseModel,BaseEstimator,TransformerMixinApplies inverse hyperbolic sine (arcsinh) transformation.
The transformation asinh(X) = log(X + sqrt(X^2 + 1)) is similar to log transformation but can handle zero and negative values. It’s useful for stabilizing variance in data with both positive and negative values or wide dynamic range.
Properties:
Defined for all real numbers (unlike log)
Approximately linear near zero
Approximately logarithmic for large |X|
Symmetric around zero: asinh(-X) = -asinh(X)
This transformation is commonly used in:
Financial data with positive and negative returns
Data with extreme outliers
Variables spanning multiple orders of magnitude with zeros
- Parameters:
subset (Optional[List[str]], default=None) – List of numeric column names to transform. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.
drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.
Examples
Create an instance of the ArcSinhScaler class:
>>> import polars as pl >>> from gators.scalers import ArcSinhScaler >>> scaler = ArcSinhScaler(subset=["returns", "profit"])
Fit the transformer:
>>> X = pl.DataFrame({ ... "returns": [-100, -10, 0, 10, 100], ... "profit": [-50, -5, 0, 5, 50] ... }) >>> scaler.fit(X)
Transform the DataFrame:
>>> transformed_X = scaler.transform(X) >>> print(transformed_X) shape: (5, 2) ┌───────────────────┬─────────────────┐ │ returns__arcsinh ┆ profit__arcsinh │ │ --- ┆ --- │ │ f64 ┆ f64 │ ├───────────────────┼─────────────────┤ │ -5.298 ┆ -4.606 │ │ -2.998 ┆ -2.312 │ │ 0.0 ┆ 0.0 │ │ 2.998 ┆ 2.312 │ │ 5.298 ┆ 4.606 │ └───────────────────┴─────────────────┘
Notes
The transformation is symmetric: - asinh(X) ≈ log(2X) for large positive X - asinh(X) ≈ X for X near 0 - asinh(-X) = -asinh(X)
- class gators.scalers.BoxCox[source]#
Bases:
BaseModel,BaseEstimator,TransformerMixinApplies the Box-Cox power transformation to numeric features.
The Box-Cox transformation is a family of power transformations that can help normalize skewed data and stabilize variance. Unlike Yeo-Johnson, Box-Cox requires all values to be strictly positive (x > 0).
For each feature x with parameter lambda:
If lambda != 0: (x^lambda - 1) / lambda
If lambda == 0: log(x)
- Parameters:
Examples
Create an instance of the BoxCox class:
>>> import polars as pl >>> from gators.scalers import BoxCox >>> transformer = BoxCox(lambdas={"sales": 0.5, "price": 0.0})
Fit the transformer:
>>> X = pl.DataFrame({"sales": [10, 20, 30, 40], ... "price": [5, 15, 25, 35]}) >>> transformer.fit(X)
Transform the DataFrame:
>>> transformed_X = transformer.transform(X) >>> print(transformed_X) shape: (4, 2) ┌─────────────────┬─────────────────┐ │ sales__boxcox ┆ price__boxcox │ │ --- ┆ --- │ │ f64 ┆ f64 │ ├─────────────────┼─────────────────┤ │ ... ┆ ... │ └─────────────────┴─────────────────┘
Notes
All input values must be strictly positive (> 0). Negative or zero values will produce invalid results. Use Yeo-Johnson transformation if you need to handle zero or negative values.
- class gators.scalers.LogScaler[source]#
Bases:
BaseModel,BaseEstimator,TransformerMixinApplies logarithm transformation with choice of base.
Log transformation is useful for:
Reducing right skewness in data
Stabilizing variance
Converting multiplicative relationships to additive
Compressing large value ranges
Supports three bases:
‘e’: Natural logarithm ln(X) / log_e(X)
‘10’: Base-10 logarithm
‘2’: Base-2 logarithm
Note: Only positive values can be transformed. Zero and negative values will result in null/inf values.
- Parameters:
subset (Optional[List[str]], default=None) – List of numeric column names to transform. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.
base (Literal['e', '10', '2'], default='e') – The logarithm base to use: - ‘e’: ln(X) - ‘10’: log10(X) - ‘2’: log2(X)
drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.
Examples
Create an instance of the LogScaler class with natural log:
>>> import polars as pl >>> from gators.scalers import LogScaler >>> scaler = LogScaler(subset=["sales", "revenue"], base="e")
Fit the transformer:
>>> X = pl.DataFrame({ ... "sales": [1, 10, 100, 1000], ... "revenue": [10, 100, 1000, 10000] ... }) >>> scaler.fit(X)
Transform the DataFrame:
>>> transformed_X = scaler.transform(X) >>> print(transformed_X) shape: (4, 2) ┌───────────────┬──────────────────┐ │ sales__log_ln ┆ revenue__log_ln │ │ --- ┆ --- │ │ f64 ┆ f64 │ ├───────────────┼──────────────────┤ │ 0.0 ┆ 2.303 │ │ 2.303 ┆ 4.605 │ │ 4.605 ┆ 6.908 │ │ 6.908 ┆ 9.210 │ └───────────────┴──────────────────┘
>>> # Using log10 >>> scaler10 = LogScaler(subset=["count"], base="10") >>> X2 = pl.DataFrame({"count": [1, 10, 100, 1000]}) >>> scaler10.fit(X2) >>> scaler10.transform(X2) shape: (4, 1) ┌─────────────────┐ │ count__log_10 │ │ --- │ │ f64 │ ├─────────────────┤ │ 0.0 │ │ 1.0 │ │ 2.0 │ │ 3.0 │ └─────────────────┘
>>> # Using log2 >>> scaler2 = LogScaler(subset=["size"], base="2") >>> X3 = pl.DataFrame({"size": [1, 2, 4, 8, 16]}) >>> scaler2.fit(X3) >>> scaler2.transform(X3) shape: (5, 1) ┌──────────────┐ │ size__log_2 │ │ --- │ │ f64 │ ├──────────────┤ │ 0.0 │ │ 1.0 │ │ 2.0 │ │ 3.0 │ │ 4.0 │ └──────────────┘
- transform(X)[source]#
Transform the input DataFrame by applying logarithm.
- Parameters:
X (
DataFrame) – Input DataFrame to transform. Values should be positive.- Returns:
Transformed DataFrame with log-transformed columns.
- Return type:
DataFrame
Notes
Zero and negative values will result in null or -inf values.
- class gators.scalers.MinmaxScaler[source]#
Bases:
BaseModel,BaseEstimator,TransformerMixinScales numeric features to a [0, 1] range using min-max normalization.
Transforms features by scaling each feature to the range [0, 1] based on the minimum and maximum values observed during fitting. The transformation is given by: X_scaled = (X - X_min) / (X_max - X_min).
- Parameters:
subset (Optional[List[str]], default=None) – List of numeric column names to scale. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.
drop_columns (bool, default=True) – If True, drop the original columns after scaling. If False, keep both original and scaled columns.
Examples
Create an instance of the MinmaxScaler class:
>>> import polars as pl >>> from gators.scalers import MinmaxScaler >>> scaler = MinmaxScaler(subset=["age", "income"])
Fit the transformer:
>>> X = pl.DataFrame({"age": [20, 30, 40, 50], ... "income": [20000, 40000, 60000, 80000]}) >>> scaler.fit(X)
Transform the DataFrame:
>>> transformed_X = scaler.transform(X) >>> print(transformed_X) shape: (4, 2) ┌───────────────────┬─────────────────────┐ │ age__minmax_scale ┆ income__minmax_scale│ │ --- ┆ --- │ │ f64 ┆ f64 │ ├───────────────────┼─────────────────────┤ │ 0.0 ┆ 0.0 │ │ 0.333 ┆ 0.333 │ │ 0.667 ┆ 0.667 │ │ 1.0 ┆ 1.0 │ └───────────────────┴─────────────────────┘
- class gators.scalers.PowerScaler[source]#
Bases:
BaseModel,BaseEstimator,TransformerMixinScales numeric features using power transformation X^power.
Applies a power transformation to selected features. Useful for reducing skewness or adjusting the scale of features. Common power values:
power < 1: compress large values (e.g., 0.5 for square root)
power > 1: expand large values (e.g., 2 for squaring)
- Parameters:
subset (Optional[List[str]], default=None) – List of numeric column names to transform. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.
power (float, default=0.5) – The power exponent to apply. Default is 0.5 (square root).
drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.
Examples
Create an instance of the PowerScaler class:
>>> import polars as pl >>> from gators.scalers import PowerScaler >>> scaler = PowerScaler(subset=["sales", "revenue"], power=0.5)
Fit the transformer:
>>> X = pl.DataFrame({"sales": [100, 400, 900, 1600], ... "revenue": [1000, 4000, 9000, 16000]}) >>> scaler.fit(X)
Transform the DataFrame:
>>> transformed_X = scaler.transform(X) >>> print(transformed_X) shape: (4, 2) ┌──────────────────┬─────────────────────┐ │ sales__power_0.5 ┆ revenue__power_0.5 │ │ --- ┆ --- │ │ f64 ┆ f64 │ ├──────────────────┼─────────────────────┤ │ 10.0 ┆ 31.62 │ │ 20.0 ┆ 63.25 │ │ 30.0 ┆ 94.87 │ │ 40.0 ┆ 126.49 │ └──────────────────┴─────────────────────┘
>>> # Square transformation >>> scaler2 = PowerScaler(subset=["count"], power=2.0) >>> X2 = pl.DataFrame({"count": [1, 2, 3, 4]}) >>> scaler2.fit(X2) >>> scaler2.transform(X2) shape: (4, 1) ┌────────────────┐ │ count__power_2 │ │ --- │ │ f64 │ ├────────────────┤ │ 1.0 │ │ 4.0 │ │ 9.0 │ │ 16.0 │ └────────────────┘
- class gators.scalers.StandardScaler[source]#
Bases:
BaseModel,BaseEstimator,TransformerMixinStandardizes numeric features by removing the mean and scaling to unit variance.
Transforms features by centering them around zero and scaling by the standard deviation. The transformation is given by: X_scaled = (X - mean) / std. This is also known as z-score normalization.
- Parameters:
subset (Optional[List[str]], default=None) – List of numeric column names to standardize. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.
drop_columns (bool, default=True) – If True, drop the original columns after scaling. If False, keep both original and scaled columns.
Examples
Create an instance of the StandardScaler class:
>>> import polars as pl >>> from gators.scalers import StandardScaler >>> scaler = StandardScaler(subset=["age", "income"])
Fit the transformer:
>>> X = pl.DataFrame({"age": [20, 30, 40, 50], ... "income": [20000, 40000, 60000, 80000]}) >>> scaler.fit(X)
Transform the DataFrame:
>>> transformed_X = scaler.transform(X) >>> print(transformed_X) shape: (4, 2) ┌────────────────────┬──────────────────────┐ │ age__standard_scale ┆ income__standard_scale│ │ --- ┆ --- │ │ f64 ┆ f64 │ ├────────────────────┼──────────────────────┤ │ -1.161 ┆ -1.161 │ │ -0.387 ┆ -0.387 │ │ 0.387 ┆ 0.387 │ │ 1.161 ┆ 1.161 │ └────────────────────┴──────────────────────┘
- class gators.scalers.YeoJonhson[source]#
Bases:
BaseModel,BaseEstimator,TransformerMixinApplies the Yeo-Johnson power transformation to numeric features.
The Yeo-Johnson transformation is a family of power transformations that can be applied to both positive and negative values (unlike Box-Cox which requires positive values). It can help normalize skewed data and stabilize variance.
For each feature x with parameter lambda:
If x >= 0 and lambda != 0: ((x + 1)^lambda - 1) / lambda
If x >= 0 and lambda == 0: log(x + 1)
If x < 0 and lambda != 2: -((-x + 1)^(2-lambda) - 1) / (2 - lambda)
If x < 0 and lambda == 2: -log(-x + 1)
- Parameters:
Examples
Create an instance of the YeoJonhson class:
>>> import polars as pl >>> from gators.scalers import YeoJonhson >>> transformer = YeoJonhson(lambdas={"sales": 0.5, "profit": 0.0})
Fit the transformer:
>>> X = pl.DataFrame({"sales": [10, 20, 30, 40], ... "profit": [-5, 5, 15, 25]}) >>> transformer.fit(X)
Transform the DataFrame:
>>> transformed_X = transformer.transform(X) >>> print(transformed_X) shape: (4, 2) ┌───────────────────┬────────────────────┐ │ sales__yeojonhson ┆ profit__yeojonhson │ │ --- ┆ --- │ │ f64 ┆ f64 │ ├───────────────────┼────────────────────┤ │ ... ┆ ... │ └───────────────────┴────────────────────┘