gators.pipeline package#

Module contents#

class gators.pipeline.Pipeline[source]#

Bases: _BaseTransformer

Pipeline of transformers for Polars DataFrames.

Sequentially applies a list of transforms. This is a lightweight alternative to sklearn.pipeline.Pipeline specifically designed for Gators transformers that work with Polars DataFrames.

Parameters:

steps (List[Tuple[str, Any]]) – List of (name, transform) tuples that are chained in the order they are specified. Each transform must implement fit and transform methods.
verbose (bool, default=False) – If True, prints the name of each step as it’s being executed.

Examples

>>> from gators.pipeline import Pipeline
>>> from gators.imputers import NumericImputer, StringImputer
>>> from gators.encoders import WOEEncoder
>>>
>>> steps = [
...     ('numeric_imputer', NumericImputer(strategy='median')),
...     ('string_imputer', StringImputer(strategy='constant', value='MISSING')),
...     ('woe_encoder', WOEEncoder(subset=['cat_col']))
... ]
>>> pipe = Pipeline(steps=steps)
>>> pipe.fit(X_train, y=y_train)
>>> X_transformed = pipe.transform(X_train)

fit(X, y=None)[source]#

Fit all transformers in the pipeline.

Fits each transformer sequentially, transforming the data before fitting the next transformer. This ensures each transformer sees the output of the previous transformer.

Parameters:

X (DataFrame) – Input DataFrame to fit.
y (Series | None) – Target series for supervised transformers (e.g., WOEEncoder).

Returns:

The fitted pipeline instance.

Return type:

Pipeline

transform(X)[source]#

Transform data by applying all transformers in sequence.

Parameters:: X (DataFrame) – Input DataFrame to transform.
Returns:: Transformed DataFrame.
Return type:: DataFrame

fit_transform(X, y=None)[source]#

Fit all transformers and transform the data.

Fits and transforms each transformer sequentially. This is more efficient than calling fit() followed by transform() separately.

Parameters:

X (DataFrame) – Input DataFrame to fit and transform.
y (Series | None) – Target series for supervised transformers.

Returns:

Transformed DataFrame.

Return type:

DataFrame

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:: deep (bool) – If True, returns parameters of all sub-estimators. If False, only returns pipeline-level parameters.
Returns:: Parameter names mapped to their values. Nested parameters use double underscore notation (e.g., ‘step_name__param’).
Return type:: dict

set_params(**params)[source]#

Set parameters for this estimator.

Parameters:: **params (dict) – Estimator parameters. Use double underscore notation for nested parameters (e.g., step_name__param_name=value).
Returns:: The pipeline instance.
Return type:: Pipeline
Raises:: ValueError – If an invalid parameter name is provided.