gators.scalers package#
Module contents#
- class gators.scalers.ArcSinSquareRootScaler[source]#
Bases:
gators.transformer._base_transformer._BaseTransformerApplies arcsine square root transformation for proportion data.
The transformation arcsin(sqrt(X)) is a variance-stabilizing transformation commonly used for proportion or percentage data. It’s particularly useful when data are bounded between 0 and 1, making the variance more homogeneous across the range.
This transformation is often used in:
Analysis of proportions, percentages, or rates
Binomial proportion data
Data from beta distributions
Note: Input values should be in the range [0, 1]. Values outside this range will produce NaN or complex numbers.
- Parameters:
subset (list[str], default=None) – List of numeric column names to transform. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.
drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.
Examples
Create an instance of the ArcSinSquareRootScaler class:
>>> import polars as pl >>> from gators.scalers import ArcSinSquareRootScaler >>> scaler = ArcSinSquareRootScaler(subset=["success_rate", "conversion_rate"])
Fit the transformer:
>>> X = pl.DataFrame({ ... "success_rate": [0.1, 0.25, 0.5, 0.75, 0.9], ... "conversion_rate": [0.05, 0.15, 0.30, 0.60, 0.85] ... }) >>> scaler.fit(X)
Transform the DataFrame:
>>> transformed_X = scaler.transform(X) >>> print(transformed_X) shape: (5, 2) ┌─────────────────────────┬────────────────────────────┐ │ success_rate__arcsin ┆ conversion_rate__arcsin │ │ --- ┆ --- │ │ f64 ┆ f64 │ ├─────────────────────────┼────────────────────────────┤ │ 0.322 ┆ 0.227 │ │ 0.524 ┆ 0.395 │ │ 0.785 ┆ 0.588 │ │ 1.047 ┆ 0.908 │ │ 1.249 ┆ 1.150 │ └─────────────────────────┴────────────────────────────┘
Notes
The transformation maps [0, 1] to [0, π/2] (0 to ~1.571).
- fit(X: polars.DataFrame, y: polars.Series | None = None) gators.scalers.arcsin_squareroot_scaler.ArcSinSquareRootScaler[source]#
Fit the transformer by storing column names.
- Parameters:
X (pl.DataFrame) – Input DataFrame to fit.
y (pl.Series, default=None) – Target series (not used, present for sklearn compatibility).
- Returns:
The fitted transformer instance.
- Return type:
- class gators.scalers.ArcSinhScaler[source]#
Bases:
gators.transformer._base_transformer._BaseTransformerApplies inverse hyperbolic sine (arcsinh) transformation.
The transformation asinh(X) = log(X + sqrt(X^2 + 1)) is similar to log transformation but can handle zero and negative values. It’s useful for stabilizing variance in data with both positive and negative values or wide dynamic range.
Properties:
Defined for all real numbers (unlike log)
Approximately linear near zero
Approximately logarithmic for large abs(X)
Symmetric around zero: asinh(-X) = -asinh(X)
This transformation is commonly used in:
Financial data with positive and negative returns
Data with extreme outliers
Variables spanning multiple orders of magnitude with zeros
- Parameters:
subset (list[str], default=None) – List of numeric column names to transform. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.
drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.
Examples
Create an instance of the ArcSinhScaler class:
>>> import polars as pl >>> from gators.scalers import ArcSinhScaler >>> scaler = ArcSinhScaler(subset=["returns", "profit"])
Fit the transformer:
>>> X = pl.DataFrame({ ... "returns": [-100, -10, 0, 10, 100], ... "profit": [-50, -5, 0, 5, 50] ... }) >>> scaler.fit(X)
Transform the DataFrame:
>>> transformed_X = scaler.transform(X) >>> print(transformed_X) shape: (5, 2) ┌───────────────────┬─────────────────┐ │ returns__arcsinh ┆ profit__arcsinh │ │ --- ┆ --- │ │ f64 ┆ f64 │ ├───────────────────┼─────────────────┤ │ -5.298 ┆ -4.606 │ │ -2.998 ┆ -2.312 │ │ 0.0 ┆ 0.0 │ │ 2.998 ┆ 2.312 │ │ 5.298 ┆ 4.606 │ └───────────────────┴─────────────────┘
Notes
The transformation is symmetric: - asinh(X) ≈ log(2X) for large positive X - asinh(X) ≈ X for X near 0 - asinh(-X) = -asinh(X)
- fit(X: polars.DataFrame, y: polars.Series | None = None) gators.scalers.arcsinh_scaler.ArcSinhScaler[source]#
Fit the transformer by storing column names.
- Parameters:
X (pl.DataFrame) – Input DataFrame to fit.
y (pl.Series, default=None) – Target series (not used, present for sklearn compatibility).
- Returns:
The fitted transformer instance.
- Return type:
- class gators.scalers.BoxCox[source]#
Bases:
gators.transformer._base_transformer._BaseTransformerApplies the Box-Cox power transformation to numeric features.
The Box-Cox transformation is a family of power transformations that can help normalize skewed data and stabilize variance. Unlike Yeo-Johnson, Box-Cox requires all values to be strictly positive (x > 0).
For each feature x with parameter lambda:
If lambda != 0: (x^lambda - 1) / lambda
If lambda == 0: log(x)
- Parameters:
lambdas (dict[str, int | float]) – Dictionary mapping column names to their lambda (power) parameters. Lambda values typically range from -2 to 2.
drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.
Examples
Create an instance of the BoxCox class:
>>> import polars as pl >>> from gators.scalers import BoxCox >>> transformer = BoxCox(lambdas={"sales": 0.5, "price": 0.0})
Fit the transformer:
>>> X = pl.DataFrame({"sales": [10, 20, 30, 40], ... "price": [5, 15, 25, 35]}) >>> transformer.fit(X)
Transform the DataFrame:
>>> transformed_X = transformer.transform(X) >>> print(transformed_X) shape: (4, 2) ┌─────────────────┬─────────────────┐ │ sales__boxcox ┆ price__boxcox │ │ --- ┆ --- │ │ f64 ┆ f64 │ ├─────────────────┼─────────────────┤ │ ... ┆ ... │ └─────────────────┴─────────────────┘
Notes
All input values must be strictly positive (> 0). Negative or zero values will produce invalid results. Use Yeo-Johnson transformation if you need to handle zero or negative values.
- fit(X: polars.DataFrame, y: polars.Series | None = None) gators.scalers.box_cox.BoxCox[source]#
Fit the transformer by storing column names.
- Parameters:
X (pl.DataFrame) – Input DataFrame to fit. All values in specified columns must be positive.
y (pl.Series, default=None) – Target series (not used, present for sklearn compatibility).
- Returns:
The fitted transformer instance.
- Return type:
- transform(X: polars.DataFrame) polars.DataFrame[source]#
Transform the input DataFrame by applying Box-Cox transformation.
- Parameters:
X (pl.DataFrame) – Input DataFrame to transform. All values in specified columns must be strictly positive (> 0).
- Returns:
Transformed DataFrame with power-transformed columns.
- Return type:
pl.DataFrame
- class gators.scalers.LogScaler[source]#
Bases:
gators.transformer._base_transformer._BaseTransformerApplies logarithm transformation with choice of base.
Log transformation is useful for:
Reducing right skewness in data
Stabilizing variance
Converting multiplicative relationships to additive
Compressing large value ranges
Supports three bases:
‘e’: Natural logarithm ln(X) / log_e(X)
‘10’: Base-10 logarithm
‘2’: Base-2 logarithm
Note: Only positive values can be transformed. Zero and negative values will result in null/inf values.
- Parameters:
subset (list[str], default=None) – List of numeric column names to transform. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.
base (Literal['e', '10', '2'], default='e') – The logarithm base to use: - ‘e’: ln(X) - ‘10’: log10(X) - ‘2’: log2(X)
drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.
Examples
Create an instance of the LogScaler class with natural log:
>>> import polars as pl >>> from gators.scalers import LogScaler >>> scaler = LogScaler(subset=["sales", "revenue"], base="e")
Fit the transformer:
>>> X = pl.DataFrame({ ... "sales": [1, 10, 100, 1000], ... "revenue": [10, 100, 1000, 10000] ... }) >>> scaler.fit(X)
Transform the DataFrame:
>>> transformed_X = scaler.transform(X) >>> print(transformed_X) shape: (4, 2) ┌───────────────┬──────────────────┐ │ sales__log_ln ┆ revenue__log_ln │ │ --- ┆ --- │ │ f64 ┆ f64 │ ├───────────────┼──────────────────┤ │ 0.0 ┆ 2.303 │ │ 2.303 ┆ 4.605 │ │ 4.605 ┆ 6.908 │ │ 6.908 ┆ 9.210 │ └───────────────┴──────────────────┘
>>> # Using log10 >>> scaler10 = LogScaler(subset=["count"], base="10") >>> X2 = pl.DataFrame({"count": [1, 10, 100, 1000]}) >>> scaler10.fit(X2) >>> scaler10.transform(X2) shape: (4, 1) ┌─────────────────┐ │ count__log_10 │ │ --- │ │ f64 │ ├─────────────────┤ │ 0.0 │ │ 1.0 │ │ 2.0 │ │ 3.0 │ └─────────────────┘
>>> # Using log2 >>> scaler2 = LogScaler(subset=["size"], base="2") >>> X3 = pl.DataFrame({"size": [1, 2, 4, 8, 16]}) >>> scaler2.fit(X3) >>> scaler2.transform(X3) shape: (5, 1) ┌──────────────┐ │ size__log_2 │ │ --- │ │ f64 │ ├──────────────┤ │ 0.0 │ │ 1.0 │ │ 2.0 │ │ 3.0 │ │ 4.0 │ └──────────────┘
- fit(X: polars.DataFrame, y: polars.Series | None = None) gators.scalers.log_scaler.LogScaler[source]#
Fit the transformer by storing column names.
- Parameters:
X (pl.DataFrame) – Input DataFrame to fit.
y (pl.Series, default=None) – Target series (not used, present for sklearn compatibility).
- Returns:
The fitted transformer instance.
- Return type:
- transform(X: polars.DataFrame) polars.DataFrame[source]#
Transform the input DataFrame by applying logarithm.
- Parameters:
X (pl.DataFrame) – Input DataFrame to transform. Values should be positive.
- Returns:
Transformed DataFrame with log-transformed columns.
- Return type:
pl.DataFrame
Notes
Zero and negative values will result in null or -inf values.
- class gators.scalers.MinmaxScaler[source]#
Bases:
gators.transformer._base_transformer._BaseTransformerScales numeric features to a [0, 1] range using min-max normalization.
Transforms features by scaling each feature to the range [0, 1] based on the minimum and maximum values observed during fitting. The transformation is given by: X_scaled = (X - X_min) / (X_max - X_min).
- Parameters:
subset (list[str], default=None) – List of numeric column names to scale. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.
drop_columns (bool, default=True) – If True, drop the original columns after scaling. If False, keep both original and scaled columns.
Examples
Create an instance of the MinmaxScaler class:
>>> import polars as pl >>> from gators.scalers import MinmaxScaler >>> scaler = MinmaxScaler(subset=["age", "income"])
Fit the transformer:
>>> X = pl.DataFrame({"age": [20, 30, 40, 50], ... "income": [20000, 40000, 60000, 80000]}) >>> scaler.fit(X)
Transform the DataFrame:
>>> transformed_X = scaler.transform(X) >>> print(transformed_X) shape: (4, 2) ┌───────────────────┬─────────────────────┐ │ age__minmax_scale ┆ income__minmax_scale│ │ --- ┆ --- │ │ f64 ┆ f64 │ ├───────────────────┼─────────────────────┤ │ 0.0 ┆ 0.0 │ │ 0.333 ┆ 0.333 │ │ 0.667 ┆ 0.667 │ │ 1.0 ┆ 1.0 │ └───────────────────┴─────────────────────┘
- fit(X: polars.DataFrame, y: polars.Series | None = None) gators.scalers.minmax_scaler.MinmaxScaler[source]#
Fit the transformer by computing min and max values.
- Parameters:
X (pl.DataFrame) – Input DataFrame to fit.
y (pl.Series, default=None) – Target series (not used, present for sklearn compatibility).
- Returns:
The fitted transformer instance.
- Return type:
- class gators.scalers.PowerScaler[source]#
Bases:
gators.transformer._base_transformer._BaseTransformerScales numeric features using power transformation X^power.
Applies a power transformation to selected features. Useful for reducing skewness or adjusting the scale of features. Common power values:
power < 1: compress large values (e.g., 0.5 for square root)
power > 1: expand large values (e.g., 2 for squaring)
- Parameters:
subset (list[str], default=None) – List of numeric column names to transform. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.
power (float, default=0.5) – The power exponent to apply. Default is 0.5 (square root).
drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.
Examples
Create an instance of the PowerScaler class:
>>> import polars as pl >>> from gators.scalers import PowerScaler >>> scaler = PowerScaler(subset=["sales", "revenue"], power=0.5)
Fit the transformer:
>>> X = pl.DataFrame({"sales": [100, 400, 900, 1600], ... "revenue": [1000, 4000, 9000, 16000]}) >>> scaler.fit(X)
Transform the DataFrame:
>>> transformed_X = scaler.transform(X) >>> print(transformed_X) shape: (4, 2) ┌──────────────────┬─────────────────────┐ │ sales__power_0.5 ┆ revenue__power_0.5 │ │ --- ┆ --- │ │ f64 ┆ f64 │ ├──────────────────┼─────────────────────┤ │ 10.0 ┆ 31.62 │ │ 20.0 ┆ 63.25 │ │ 30.0 ┆ 94.87 │ │ 40.0 ┆ 126.49 │ └──────────────────┴─────────────────────┘
>>> # Square transformation >>> scaler2 = PowerScaler(subset=["count"], power=2.0) >>> X2 = pl.DataFrame({"count": [1, 2, 3, 4]}) >>> scaler2.fit(X2) >>> scaler2.transform(X2) shape: (4, 1) ┌────────────────┐ │ count__power_2 │ │ --- │ │ f64 │ ├────────────────┤ │ 1.0 │ │ 4.0 │ │ 9.0 │ │ 16.0 │ └────────────────┘
- fit(X: polars.DataFrame, y: polars.Series | None = None) gators.scalers.power_scaler.PowerScaler[source]#
Fit the transformer by storing column names.
- Parameters:
X (pl.DataFrame) – Input DataFrame to fit.
y (pl.Series, default=None) – Target series (not used, present for sklearn compatibility).
- Returns:
The fitted transformer instance.
- Return type:
- class gators.scalers.RobustScaler[source]#
Bases:
gators.transformer._base_transformer._BaseTransformerScale numeric features using median and a configurable quantile range.
Applies the transformation:
X_scaled = (X - median) / (Q(q_high) - Q(q_low))
Because centering and scaling are driven by robust statistics (median and inter-quantile range) rather than mean and standard deviation, the scaler is largely unaffected by outliers.
- Parameters:
quantile_range (tuple[float, float], default=(0.25, 0.75)) –
(q_low, q_high)— the two quantile levels used to compute the scale. Both values must be in[0.0, 1.0]andq_low < q_high. The default(0.25, 0.75)is equivalent to the classic IQR-based robust scaler.subset (list[str] or None, default=None) – Numeric columns to scale. When
Noneall Float64, Float32, Int64, and Int32 columns are selected automatically.drop_columns (bool, default=True) – If
Truethe original columns are dropped and only the scaled columns are kept. IfFalseboth original and scaled columns are present in the output.
- _median#
Fitted median per column.
- Type:
dict[str, float]
- _scale#
Fitted IQR-based scale (
1 / (Q_high - Q_low)) per column. Columns where the quantile range is zero get a scale of0.0(i.e. scaled output will be all zeros).- Type:
dict[str, float]
- _column_mapping#
Mapping from original column name to scaled column name.
- Type:
dict[str, str]
Examples
>>> import polars as pl >>> from gators.scalers import RobustScaler
>>> X = pl.DataFrame({ ... "age": [20.0, 30.0, 40.0, 50.0, 200.0], ... "income": [1000.0, 2000.0, 3000.0, 4000.0, 5000.0], ... }) >>> scaler = RobustScaler(quantile_range=(0.25, 0.75)) >>> scaler.fit(X) >>> X_scaled = scaler.transform(X)
- fit(X: polars.DataFrame, y: polars.Series | None = None) gators.scalers.robust_scaler.RobustScaler[source]#
Compute median and quantile-range scale for each column.
- Parameters:
X (pl.DataFrame) – Input DataFrame to fit.
y (pl.Series, default=None) – Not used; present for sklearn compatibility.
- Returns:
The fitted transformer instance.
- Return type:
- class gators.scalers.StandardScaler[source]#
Bases:
gators.transformer._base_transformer._BaseTransformerStandardizes numeric features by removing the mean and scaling to unit variance.
Transforms features by centering them around zero and scaling by the standard deviation. The transformation is given by: X_scaled = (X - mean) / std. This is also known as z-score normalization.
- Parameters:
subset (list[str], default=None) – List of numeric column names to standardize. If None, all numeric columns (Float64, Int64, Float32, Int32) are automatically selected.
drop_columns (bool, default=True) – If True, drop the original columns after scaling. If False, keep both original and scaled columns.
Examples
Create an instance of the StandardScaler class:
>>> import polars as pl >>> from gators.scalers import StandardScaler >>> scaler = StandardScaler(subset=["age", "income"])
Fit the transformer:
>>> X = pl.DataFrame({"age": [20, 30, 40, 50], ... "income": [20000, 40000, 60000, 80000]}) >>> scaler.fit(X)
Transform the DataFrame:
>>> transformed_X = scaler.transform(X) >>> print(transformed_X) shape: (4, 2) ┌────────────────────┬──────────────────────┐ │ age__standard_scale ┆ income__standard_scale│ │ --- ┆ --- │ │ f64 ┆ f64 │ ├────────────────────┼──────────────────────┤ │ -1.161 ┆ -1.161 │ │ -0.387 ┆ -0.387 │ │ 0.387 ┆ 0.387 │ │ 1.161 ┆ 1.161 │ └────────────────────┴──────────────────────┘
- fit(X: polars.DataFrame, y: polars.Series | None = None) gators.scalers.standard_scaler.StandardScaler[source]#
Fit the transformer by computing mean and standard deviation.
- Parameters:
X (pl.DataFrame) – Input DataFrame to fit.
y (pl.Series, default=None) – Target series (not used, present for sklearn compatibility).
- Returns:
The fitted transformer instance.
- Return type:
- class gators.scalers.YeoJohnson[source]#
Bases:
gators.transformer._base_transformer._BaseTransformerApplies the Yeo-Johnson power transformation to numeric features.
The Yeo-Johnson transformation is a family of power transformations that can be applied to both positive and negative values (unlike Box-Cox which requires positive values). It can help normalize skewed data and stabilize variance.
For each feature x with parameter lambda:
If x >= 0 and lambda != 0: ((x + 1)^lambda - 1) / lambda
If x >= 0 and lambda == 0: log(x + 1)
If x < 0 and lambda != 2: -((-x + 1)^(2-lambda) - 1) / (2 - lambda)
If x < 0 and lambda == 2: -log(-x + 1)
- Parameters:
lambdas (dict[str, int | float]) – Dictionary mapping column names to their lambda (power) parameters. Lambda values typically range from -2 to 2.
drop_columns (bool, default=True) – If True, drop the original columns after transformation. If False, keep both original and transformed columns.
Examples
Create an instance of the YeoJohnson class:
>>> import polars as pl >>> from gators.scalers import YeoJohnson >>> transformer = YeoJohnson(lambdas={"sales": 0.5, "profit": 0.0})
Fit the transformer:
>>> X = pl.DataFrame({"sales": [10, 20, 30, 40], ... "profit": [-5, 5, 15, 25]}) >>> transformer.fit(X)
Transform the DataFrame:
>>> transformed_X = transformer.transform(X) >>> print(transformed_X) shape: (4, 2) ┌───────────────────┬────────────────────┐ │ sales__yeojonhson ┆ profit__yeojonhson │ │ --- ┆ --- │ │ f64 ┆ f64 │ ├───────────────────┼────────────────────┤ │ ... ┆ ... │ └───────────────────┴────────────────────┘
- fit(X: polars.DataFrame, y: polars.Series | None = None) gators.scalers.yeo_johnson.YeoJohnson[source]#
Fit the transformer by storing column names.
- Parameters:
X (pl.DataFrame) – Input DataFrame to fit.
y (pl.Series, default=None) – Target series (not used, present for sklearn compatibility).
- Returns:
The fitted transformer instance.
- Return type: