gators.clippers package#

Module contents#

Clipping transformers for outlier handling.

class gators.clippers.CustomClipper[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Clip column values using custom lower and upper bounds.

This transformer allows you to specify custom clipping bounds for each column independently. You can specify only lower bounds, only upper bounds, or both for different columns. Columns not specified in either dictionary are left unchanged.

Parameters:
  • lower_bounds (dict of str to float, optional) – Dictionary mapping column names to their lower bounds. Values below the lower bound will be clipped to the lower bound. Default is None (no lower bounds).

  • upper_bounds (dict of str to float, optional) – Dictionary mapping column names to their upper bounds. Values above the upper bound will be clipped to the upper bound. Default is None (no upper bounds).

  • inplace (bool, default=True) – If True, clip values in the original columns. If False, create new columns with the suffix ‘__clip_custom’.

  • drop_columns (bool, default=True) – If True and inplace=False, drop the original columns after clipping. If False and inplace=False, keep both original and clipped columns. Ignored if inplace=True.

_columns#

List of columns that will be clipped (union of lower_bounds and upper_bounds keys).

Type:

list of str

_bounds_map#

Mapping of column names to (lower_bound, upper_bound) tuples. None values indicate no bound on that side.

Type:

dict of str to tuple

Examples

>>> import polars as pl
>>> from gators.clippers import CustomClipper

Clip with both lower and upper bounds:

>>> X = pl.DataFrame({
...     "age": [-5, 25, 150],
...     "salary": [-1000, 50000, 2000000]
... })
>>> clipper = CustomClipper(
...     lower_bounds={"age": 0, "salary": 0},
...     upper_bounds={"age": 120, "salary": 1000000}
... )
>>> clipper.fit_transform(X)
shape: (3, 2)
┌─────┬─────────┐
│ age ┆ salary  │
│ --- ┆ ---     │
│ f64 ┆ f64     │
╞═════╪═════════╡
│ 0.0 ┆ 0.0     │
│ 25.0┆ 50000.0 │
│ 120.0┆1000000.0│
└─────┴─────────┘

Clip with only lower bounds:

>>> clipper = CustomClipper(lower_bounds={"age": 0})
>>> clipper.fit_transform(X)
shape: (3, 2)
┌─────┬─────────┐
│ age ┆ salary  │
│ --- ┆ ---     │
│ f64 ┆ f64     │
╞═════╪═════════╡
│ 0.0 ┆ -1000.0 │
│ 25.0┆ 50000.0 │
│ 150.0┆2000000.0│
└─────┴─────────┘

Create new columns instead of modifying in place:

>>> clipper = CustomClipper(
...     lower_bounds={"age": 0},
...     upper_bounds={"age": 120},
...     inplace=False
... )
>>> clipper.fit_transform(X)
shape: (3, 2)
┌──────────────────┬─────────┐
│ age__clip_custom ┆ salary  │
│ ---              ┆ ---     │
│ f64              ┆ f64     │
╞══════════════════╪═════════╡
│ 0.0              ┆ -1000.0 │
│ 25.0             ┆ 50000.0 │
│ 120.0            ┆2000000.0│
└──────────────────┴─────────┘

Notes

  • Non-numeric columns are automatically ignored.

  • Columns not specified in either bounds dictionary are left unchanged.

  • You can specify bounds for only some columns while leaving others untouched.

  • If a column appears in both dictionaries, both bounds are applied.

See also

GaussianClipper

Clip values based on mean and standard deviation.

QuantileClipper

Clip values based on quantiles.

MADClipper

Clip values based on median absolute deviation.

IQRClipper

Clip values based on interquartile range.

fit(X, y=None)[source]#

Fit the clipper by identifying columns to clip.

Parameters:
  • X (DataFrame) – Input DataFrame.

  • y (Series | None) – Target values (ignored, present for sklearn compatibility).

Returns:

self – Fitted clipper.

Return type:

CustomClipper

Raises:

ValueError – If no bounds are specified or if specified columns don’t exist in X.

transform(X)[source]#

Clip values using the custom bounds.

Parameters:

X (DataFrame) – Input DataFrame to clip.

Returns:

DataFrame with clipped values.

Return type:

DataFrame

fit_transform(X, y=None)[source]#

Fit the clipper and transform the data.

Parameters:
  • X (DataFrame) – Input DataFrame.

  • y (Series | None) – Target values (ignored, present for sklearn compatibility).

Returns:

DataFrame with clipped values.

Return type:

DataFrame

class gators.clippers.GaussianClipper[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Clip numeric values to mean ± n standard deviations.

This transformer caps values that are smaller than mean - n*std or larger than mean + n*std, where n is the number of standard deviations (n_sigmas). Values outside this range are clipped to the boundary values.

Parameters:
  • n_sigmas (int, default=3) – Number of standard deviations to use for clipping bounds. Must be a positive integer.

  • subset (Optional[List[str]], default=None) – List of numeric columns to clip. If None, all numeric columns are selected.

  • inplace (bool, default=True) – If True, clip values in the original columns. If False, create new columns with suffix ‘__clip_gaussian’.

  • drop_columns (bool, default=True) – If inplace=False, whether to drop the original columns after clipping. Ignored when inplace=True.

Examples

>>> import polars as pl
>>> from gators.clipping import GaussianClipper
>>> # Sample DataFrame with outliers
>>> X = pl.DataFrame({
...     'A': [1.0, 2.0, 3.0, 4.0, 100.0],  # 100.0 is an outlier
...     'B': [-50.0, 5.0, 6.0, 7.0, 8.0],  # -50.0 is an outlier
...     'C': [10.0, 20.0, 30.0, 40.0, 50.0]
... })
>>> # Clip using 3 standard deviations (default)
>>> clipper = GaussianClipper(inplace=False)
>>> clipper.fit(X)
GaussianClipper(n_sigmas=3, subset=['A', 'B', 'C'], drop_columns=True, inplace=False)
>>> transformed_X = clipper.transform(X)
>>> print(transformed_X)
shape: (5, 3)
┌───────────────────┬───────────────────┬───────────────────┐
│ A__clip_gaussian  ┆ B__clip_gaussian  ┆ C__clip_gaussian  │
│ ---               ┆ ---               ┆ ---               │
│ f64               ┆ f64               ┆ f64               │
╞═══════════════════╪═══════════════════╪═══════════════════╡
│ 1.0               ┆ -24.8             ┆ 10.0              │
│ 2.0               ┆ 5.0               ┆ 20.0              │
│ 3.0               ┆ 6.0               ┆ 30.0              │
│ 4.0               ┆ 7.0               ┆ 40.0              │
│ 42.8              ┆ 8.0               ┆ 50.0              │
└───────────────────┴───────────────────┴───────────────────┘
>>> # Clip using 2 standard deviations (more aggressive)
>>> clipper_2sigma = GaussianClipper(n_sigmas=2, inplace=False)
>>> clipper_2sigma.fit(X)
GaussianClipper(n_sigmas=2, subset=['A', 'B', 'C'], drop_columns=True, inplace=False)
>>> transformed_X_2sigma = clipper_2sigma.transform(X)
>>> print(transformed_X_2sigma)
shape: (5, 3)
┌───────────────────┬───────────────────┬───────────────────┐
│ A__clip_gaussian  ┆ B__clip_gaussian  ┆ C__clip_gaussian  │
│ ---               ┆ ---               ┆ ---               │
│ f64               ┆ f64               ┆ f64               │
╞═══════════════════╪═══════════════════╪═══════════════════╡
│ 1.0               ┆ -16.5             ┆ 10.0              │
│ 2.0               ┆ 5.0               ┆ 20.0              │
│ 3.0               ┆ 6.0               ┆ 30.0              │
│ 4.0               ┆ 7.0               ┆ 40.0              │
│ 28.5              ┆ 8.0               ┆ 50.0              │
└───────────────────┴───────────────────┴───────────────────┘
>>> # Clip with drop_columns=False to keep original columns
>>> clipper_no_drop = GaussianClipper(n_sigmas=3, drop_columns=False, inplace=False)
>>> clipper_no_drop.fit(X)
GaussianClipper(n_sigmas=3, subset=['A', 'B', 'C'], drop_columns=False, inplace=False)
>>> transformed_X_no_drop = clipper_no_drop.transform(X)
>>> print(transformed_X_no_drop)
shape: (5, 6)
┌───────┬────────┬──────┬───────────────────┬───────────────────┬───────────────────┐
│ A     ┆ B      ┆ C    ┆ A__clip_gaussian  ┆ B__clip_gaussian  ┆ C__clip_gaussian  │
│ ---   ┆ ---    ┆ ---  ┆ ---               ┆ ---               ┆ ---               │
│ f64   ┆ f64    ┆ f64  ┆ f64               ┆ f64               ┆ f64               │
╞═══════╪════════╪══════╪═══════════════════╪═══════════════════╪═══════════════════╡
│ 1.0   ┆ -50.0  ┆ 10.0 ┆ 1.0               ┆ -24.8             ┆ 10.0              │
│ 2.0   ┆ 5.0    ┆ 20.0 ┆ 2.0               ┆ 5.0               ┆ 20.0              │
│ 3.0   ┆ 6.0    ┆ 30.0 ┆ 3.0               ┆ 6.0               ┆ 30.0              │
│ 4.0   ┆ 7.0    ┆ 40.0 ┆ 4.0               ┆ 7.0               ┆ 40.0              │
│ 100.0 ┆ 8.0    ┆ 50.0 ┆ 42.8              ┆ 8.0               ┆ 50.0              │
└───────┴────────┴──────┴───────────────────┴───────────────────┴───────────────────┘
>>> # Clip only a subset of columns
>>> clipper_subset = GaussianClipper(n_sigmas=3, subset=['A'], inplace=False)
>>> clipper_subset.fit(X)
GaussianClipper(n_sigmas=3, subset=['A'], drop_columns=True, inplace=False)
>>> transformed_X_subset = clipper_subset.transform(X)
>>> print(transformed_X_subset)
shape: (5, 3)
┌────────┬──────┬───────────────────┐
│ B      ┆ C    ┆ A__clip_gaussian  │
│ ---    ┆ ---  ┆ ---               │
│ f64    ┆ f64  ┆ f64               │
╞════════╪══════╪═══════════════════╡
│ -50.0  ┆ 10.0 ┆ 1.0               │
│ 5.0    ┆ 20.0 ┆ 2.0               │
│ 6.0    ┆ 30.0 ┆ 3.0               │
│ 7.0    ┆ 40.0 ┆ 4.0               │
│ 8.0    ┆ 50.0 ┆ 42.8              │
└────────┴──────┴───────────────────┘
>>> # Clip inplace (modifies original columns)
>>> clipper_inplace = GaussianClipper(n_sigmas=3, inplace=True)
>>> clipper_inplace.fit(X)
GaussianClipper(n_sigmas=3, subset=['A', 'B', 'C'], drop_columns=True, inplace=True)
>>> transformed_X_inplace = clipper_inplace.transform(X)
>>> print(transformed_X_inplace)
shape: (5, 3)
┌───────┬────────┬──────┐
│ A     ┆ B      ┆ C    │
│ ---   ┆ ---    ┆ ---  │
│ f64   ┆ f64    ┆ f64  │
╞═══════╪════════╪══════╡
│ 1.0   ┆ -24.8  ┆ 10.0 │
│ 2.0   ┆ 5.0    ┆ 20.0 │
│ 3.0   ┆ 6.0    ┆ 30.0 │
│ 4.0   ┆ 7.0    ┆ 40.0 │
│ 42.8  ┆ 8.0    ┆ 50.0 │
└───────┴────────┴──────┘
fit(X, y=None)[source]#

Fit the transformer by computing clipping bounds for each column.

Parameters:
  • X (DataFrame) – Input DataFrame with numeric columns.

  • y (Series | None) – Target series (not used, present for sklearn compatibility).

Returns:

The fitted transformer instance.

Return type:

GaussianClipper

transform(X)[source]#

Transform the input DataFrame by clipping values to mean ± n*std.

Parameters:

X (DataFrame) – Input DataFrame with numeric columns.

Returns:

DataFrame with clipped numeric columns.

Return type:

DataFrame

class gators.clippers.IQRClipper[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Clip numeric values based on Interquartile Range (IQR).

This transformer caps values that fall outside the range [Q1 - n_iqrs*IQR, Q3 + n_iqrs*IQR], where Q1 and Q3 are the first and third quartiles, and IQR = Q3 - Q1. This is a robust method commonly used for outlier detection (n_iqrs=1.5 is the standard for box plots).

Parameters:
  • n_iqrs (float, default=1.5) – Number of IQRs beyond Q1/Q3 to use for clipping bounds. Must be a positive number. Common values: - 1.5: Standard box plot outlier threshold - 3.0: Extreme outlier threshold

  • subset (Optional[List[str]], default=None) – List of numeric columns to clip. If None, all numeric columns are selected.

  • inplace (bool, default=True) – If True, clip values in the original columns. If False, create new columns with suffix ‘__clip_iqr’.

  • drop_columns (bool, default=True) – If inplace=False, whether to drop the original columns after clipping. Ignored when inplace=True.

Examples

>>> import polars as pl
>>> from gators.clipping import IQRClipper
>>> # Sample DataFrame with outliers
>>> X = pl.DataFrame({
...     'A': [10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 100.0],
...     'B': [-100.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0],
... })
>>> # Clip using 1.5 IQRs (default, standard box plot threshold)
>>> clipper = IQRClipper(inplace=False)
>>> clipper.fit(X)
IQRClipper(n_iqrs=1.5, subset=['A', 'B'], drop_columns=True, inplace=False)
>>> transformed_X = clipper.transform(X)
>>> print(transformed_X)
shape: (12, 2)
┌──────────────┬──────────────┐
│ A__clip_iqr  ┆ B__clip_iqr  │
│ ---          ┆ ---          │
│ f64          ┆ f64          │
╞══════════════╪══════════════╡
│ 10.0         ┆ 1.25         │
│ 11.0         ┆ 10.0         │
│ 12.0         ┆ 11.0         │
│ 13.0         ┆ 12.0         │
│ 14.0         ┆ 13.0         │
│ 15.0         ┆ 14.0         │
│ 16.0         ┆ 15.0         │
│ 17.0         ┆ 16.0         │
│ 18.0         ┆ 17.0         │
│ 19.0         ┆ 18.0         │
│ 20.0         ┆ 19.0         │
│ 28.75        ┆ 20.0         │
└──────────────┴──────────────┘
>>> # More conservative clipping with 3 IQRs
>>> clipper_3iqr = IQRClipper(n_iqrs=3.0, inplace=False)
>>> clipper_3iqr.fit(X)
IQRClipper(n_iqrs=3.0, subset=['A', 'B'], drop_columns=True, inplace=False)
>>> transformed_X_3iqr = clipper_3iqr.transform(X)
>>> print(transformed_X_3iqr)
shape: (12, 2)
┌──────────────┬──────────────┐
│ A__clip_iqr  ┆ B__clip_iqr  │
│ ---          ┆ ---          │
│ f64          ┆ f64          │
╞══════════════╪══════════════╡
│ 10.0         ┆ -15.0        │
│ 11.0         ┆ 10.0         │
│ 12.0         ┆ 11.0         │
│ 13.0         ┆ 12.0         │
│ 14.0         ┆ 13.0         │
│ 15.0         ┆ 14.0         │
│ 16.0         ┆ 15.0         │
│ 17.0         ┆ 16.0         │
│ 18.0         ┆ 17.0         │
│ 19.0         ┆ 18.0         │
│ 20.0         ┆ 19.0         │
│ 43.0         ┆ 20.0         │
└──────────────┴──────────────┘
fit(X, y=None)[source]#

Fit the transformer by computing IQR-based clipping bounds.

Parameters:
  • X (DataFrame) – Input DataFrame with numeric columns.

  • y (Series | None) – Target series (not used, present for sklearn compatibility).

Returns:

The fitted transformer instance.

Return type:

IQRClipper

transform(X)[source]#

Transform the input DataFrame by clipping values based on IQR.

Parameters:

X (DataFrame) – Input DataFrame with numeric columns.

Returns:

DataFrame with clipped numeric columns.

Return type:

DataFrame

class gators.clippers.MADClipper[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Clip numeric values based on Median Absolute Deviation (MAD).

This transformer caps values that are more than n_mads times the MAD away from the median. MAD is a robust measure of variability that is less sensitive to outliers than standard deviation.

MAD = median(abs(X - median(X))) Clipping bounds: [median - n_mads*MAD, median + n_mads*MAD]

Parameters:
  • n_mads (float, default=3.0) – Number of MADs from the median to use for clipping bounds. Must be a positive number.

  • subset (Optional[List[str]], default=None) – List of numeric columns to clip. If None, all numeric columns are selected.

  • inplace (bool, default=True) – If True, clip values in the original columns. If False, create new columns with suffix ‘__clip_mad’.

  • drop_columns (bool, default=True) – If inplace=False, whether to drop the original columns after clipping. Ignored when inplace=True.

Examples

>>> import polars as pl
>>> from gators.clipping import MADClipper
>>> # Sample DataFrame with outliers
>>> X = pl.DataFrame({
...     'A': [10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 100.0],
...     'B': [-100.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0],
... })
>>> # Clip using 3 MADs (default)
>>> clipper = MADClipper(inplace=False)
>>> clipper.fit(X)
MADClipper(n_mads=3.0, subset=['A', 'B'], drop_columns=True, inplace=False)
>>> transformed_X = clipper.transform(X)
>>> print(transformed_X)
shape: (12, 2)
┌──────────────┬──────────────┐
│ A__clip_mad  ┆ B__clip_mad  │
│ ---          ┆ ---          │
│ f64          ┆ f64          │
╞══════════════╪══════════════╡
│ 10.0         ┆ -12.5        │
│ 11.0         ┆ 10.0         │
│ 12.0         ┆ 11.0         │
│ 13.0         ┆ 12.0         │
│ 14.0         ┆ 13.0         │
│ 15.0         ┆ 14.0         │
│ 16.0         ┆ 15.0         │
│ 17.0         ┆ 16.0         │
│ 18.0         ┆ 17.0         │
│ 19.0         ┆ 18.0         │
│ 20.0         ┆ 19.0         │
│ 27.5         ┆ 20.0         │
└──────────────┴──────────────┘
>>> # More aggressive clipping with 2 MADs
>>> clipper_2mad = MADClipper(n_mads=2.0, inplace=False)
>>> clipper_2mad.fit(X)
MADClipper(n_mads=2.0, subset=['A', 'B'], drop_columns=True, inplace=False)
>>> transformed_X_2mad = clipper_2mad.transform(X)
>>> print(transformed_X_2mad)
shape: (12, 2)
┌──────────────┬──────────────┐
│ A__clip_mad  ┆ B__clip_mad  │
│ ---          ┆ ---          │
│ f64          ┆ f64          │
╞══════════════╪══════════════╡
│ 10.0         ┆ -5.0         │
│ 11.0         ┆ 10.0         │
│ 12.0         ┆ 11.0         │
│ 13.0         ┆ 12.0         │
│ 14.0         ┆ 13.0         │
│ 15.0         ┆ 14.0         │
│ 16.0         ┆ 15.0         │
│ 17.0         ┆ 16.0         │
│ 18.0         ┆ 17.0         │
│ 19.0         ┆ 18.0         │
│ 20.0         ┆ 19.0         │
│ 22.5         ┆ 20.0         │
└──────────────┴──────────────┘
fit(X, y=None)[source]#

Fit the transformer by computing MAD-based clipping bounds.

Parameters:
  • X (DataFrame) – Input DataFrame with numeric columns.

  • y (Series | None) – Target series (not used, present for sklearn compatibility).

Returns:

The fitted transformer instance.

Return type:

MADClipper

transform(X)[source]#

Transform the input DataFrame by clipping values based on MAD.

Parameters:

X (DataFrame) – Input DataFrame with numeric columns.

Returns:

DataFrame with clipped numeric columns.

Return type:

DataFrame

class gators.clippers.QuantileClipper[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Clip numeric values based on quantile thresholds.

This transformer caps values below the lower quantile and above the upper quantile. This is useful for removing extreme outliers while preserving the bulk of the data distribution.

Parameters:
  • lower_quantile (float, default=0.01) – Lower quantile threshold (0 to 1). Values below this quantile are clipped.

  • upper_quantile (float, default=0.99) – Upper quantile threshold (0 to 1). Values above this quantile are clipped.

  • subset (Optional[List[str]], default=None) – List of numeric columns to clip. If None, all numeric columns are selected.

  • inplace (bool, default=True) – If True, clip values in the original columns. If False, create new columns with suffix ‘__clip_quantile’.

  • drop_columns (bool, default=True) – If inplace=False, whether to drop the original columns after clipping. Ignored when inplace=True.

Examples

>>> import polars as pl
>>> from gators.clipping import QuantileClipper
>>> # Sample DataFrame with outliers
>>> X = pl.DataFrame({
...     'A': [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 100.0],
...     'B': [-50.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0],
... })
>>> # Clip using 1st and 99th percentiles (default)
>>> clipper = QuantileClipper(inplace=False)
>>> clipper.fit(X)
QuantileClipper(lower_quantile=0.01, upper_quantile=0.99, subset=['A', 'B'], drop_columns=True, inplace=False)
>>> transformed_X = clipper.transform(X)
>>> print(transformed_X)
shape: (10, 2)
┌─────────────────────┬─────────────────────┐
│ A__clip_quantile    ┆ B__clip_quantile    │
│ ---                 ┆ ---                 │
│ f64                 ┆ f64                 │
╞═════════════════════╪═════════════════════╡
│ 1.09                ┆ -45.1               │
│ 2.0                 ┆ 2.0                 │
│ 3.0                 ┆ 3.0                 │
│ 4.0                 ┆ 4.0                 │
│ 5.0                 ┆ 5.0                 │
│ 6.0                 ┆ 6.0                 │
│ 7.0                 ┆ 7.0                 │
│ 8.0                 ┆ 8.0                 │
│ 9.0                 ┆ 9.0                 │
│ 9.91                ┆ 9.91                │
└─────────────────────┴─────────────────────┘
>>> # More aggressive clipping with 5th and 95th percentiles
>>> clipper_5_95 = QuantileClipper(lower_quantile=0.05, upper_quantile=0.95, inplace=False)
>>> clipper_5_95.fit(X)
QuantileClipper(lower_quantile=0.05, upper_quantile=0.95, subset=['A', 'B'], drop_columns=True, inplace=False)
>>> transformed_X_5_95 = clipper_5_95.transform(X)
>>> print(transformed_X_5_95)
shape: (10, 2)
┌─────────────────────┬─────────────────────┐
│ A__clip_quantile    ┆ B__clip_quantile    │
│ ---                 ┆ ---                 │
│ f64                 ┆ f64                 │
╞═════════════════════╪═════════════════════╡
│ 1.45                ┆ -21.5               │
│ 2.0                 ┆ 2.0                 │
│ 3.0                 ┆ 3.0                 │
│ 4.0                 ┆ 4.0                 │
│ 5.0                 ┆ 5.0                 │
│ 6.0                 ┆ 6.0                 │
│ 7.0                 ┆ 7.0                 │
│ 8.0                 ┆ 8.0                 │
│ 9.0                 ┆ 9.0                 │
│ 8.55                ┆ 9.55                │
└─────────────────────┴─────────────────────┘
>>> # Clip only specific columns
>>> clipper_subset = QuantileClipper(subset=['A'], inplace=False)
>>> clipper_subset.fit(X)
QuantileClipper(lower_quantile=0.01, upper_quantile=0.99, subset=['A'], drop_columns=True, inplace=False)
>>> transformed_X_subset = clipper_subset.transform(X)
>>> print(transformed_X_subset)
shape: (10, 2)
┌────────┬─────────────────────┐
│ B      ┆ A__clip_quantile    │
│ ---    ┆ ---                 │
│ f64    ┆ f64                 │
╞════════╪═════════════════════╡
│ -50.0  ┆ 1.09                │
│ 2.0    ┆ 2.0                 │
│ 3.0    ┆ 3.0                 │
│ 4.0    ┆ 4.0                 │
│ 5.0    ┆ 5.0                 │
│ 6.0    ┆ 6.0                 │
│ 7.0    ┆ 7.0                 │
│ 8.0    ┆ 8.0                 │
│ 9.0    ┆ 9.0                 │
│ 10.0   ┆ 9.91                │
└────────┴─────────────────────┘
fit(X, y=None)[source]#

Fit the transformer by computing quantile-based clipping bounds.

Parameters:
  • X (DataFrame) – Input DataFrame with numeric columns.

  • y (Series | None) – Target series (not used, present for sklearn compatibility).

Returns:

The fitted transformer instance.

Return type:

QuantileClipper

transform(X)[source]#

Transform the input DataFrame by clipping values to quantile thresholds.

Parameters:

X (DataFrame) – Input DataFrame with numeric columns.

Returns:

DataFrame with clipped numeric columns.

Return type:

DataFrame