gators.pipeline package#

Module contents#

class gators.pipeline.Pipeline[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Pipeline of transformers for Polars DataFrames.

Sequentially applies a list of transforms. This is a lightweight alternative to sklearn.pipeline.Pipeline specifically designed for Gators transformers that work with Polars DataFrames.

Parameters:
  • steps (list of tuple) – List of (name, transform) tuples that are chained in the order they are specified. Each transform must implement fit and transform methods.

  • verbose (bool, default=False) – If True, prints the name of each step as it’s being executed.

Examples

>>> from gators.pipeline import Pipeline
>>> from gators.imputers import NumericImputer, StringImputer
>>> from gators.encoders import WOEEncoder
>>>
>>> steps = [
...     ('numeric_imputer', NumericImputer(strategy='median')),
...     ('string_imputer', StringImputer(strategy='constant', value='MISSING')),
...     ('woe_encoder', WOEEncoder(subset=['cat_col']))
... ]
>>> pipe = Pipeline(steps=steps)
>>> pipe.fit(X_train, y=y_train)
>>> X_transformed = pipe.transform(X_train)
fit(X, y=None)[source]#

Fit all transformers in the pipeline.

Fits each transformer sequentially, transforming the data before fitting the next transformer. This ensures each transformer sees the output of the previous transformer.

Parameters:
  • X (DataFrame) – Input DataFrame to fit.

  • y (Series | None) – Target series for supervised transformers (e.g., WOEEncoder).

Returns:

The fitted pipeline instance.

Return type:

Pipeline

transform(X)[source]#

Transform data by applying all transformers in sequence.

Parameters:

X (DataFrame) – Input DataFrame to transform.

Returns:

Transformed DataFrame.

Return type:

DataFrame

fit_transform(X, y=None)[source]#

Fit all transformers and transform the data.

Fits and transforms each transformer sequentially. This is more efficient than calling fit() followed by transform() separately.

Parameters:
  • X (DataFrame) – Input DataFrame to fit and transform.

  • y (Series | None) – Target series for supervised transformers.

Returns:

Transformed DataFrame.

Return type:

DataFrame

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deep (bool) – If True, returns parameters of all sub-estimators. If False, only returns pipeline-level parameters.

Returns:

Parameter names mapped to their values. Nested parameters use double underscore notation (e.g., ‘step_name__param’).

Return type:

dict

set_params(**params)[source]#

Set parameters for this estimator.

Parameters:

**params (dict) – Estimator parameters. Use double underscore notation for nested parameters (e.g., step_name__param_name=value).

Returns:

The pipeline instance.

Return type:

Pipeline

Raises:

ValueError – If an invalid parameter name is provided.