gators.pipeline package#
Module contents#
- class gators.pipeline.Pipeline[source]#
Bases:
BaseModel,BaseEstimator,TransformerMixinPipeline of transformers for Polars DataFrames.
Sequentially applies a list of transforms. This is a lightweight alternative to sklearn.pipeline.Pipeline specifically designed for Gators transformers that work with Polars DataFrames.
- Parameters:
Examples
>>> from gators.pipeline import Pipeline >>> from gators.imputers import NumericImputer, StringImputer >>> from gators.encoders import WOEEncoder >>> >>> steps = [ ... ('numeric_imputer', NumericImputer(strategy='median')), ... ('string_imputer', StringImputer(strategy='constant', value='MISSING')), ... ('woe_encoder', WOEEncoder(subset=['cat_col'])) ... ] >>> pipe = Pipeline(steps=steps) >>> pipe.fit(X_train, y=y_train) >>> X_transformed = pipe.transform(X_train)
- fit(X, y=None)[source]#
Fit all transformers in the pipeline.
Fits each transformer sequentially, transforming the data before fitting the next transformer. This ensures each transformer sees the output of the previous transformer.
- transform(X)[source]#
Transform data by applying all transformers in sequence.
- Parameters:
X (
DataFrame) – Input DataFrame to transform.- Returns:
Transformed DataFrame.
- Return type:
DataFrame
- fit_transform(X, y=None)[source]#
Fit all transformers and transform the data.
Fits and transforms each transformer sequentially. This is more efficient than calling fit() followed by transform() separately.
- Parameters:
X (
DataFrame) – Input DataFrame to fit and transform.y (
Series|None) – Target series for supervised transformers.
- Returns:
Transformed DataFrame.
- Return type:
DataFrame
- set_params(**params)[source]#
Set parameters for this estimator.
- Parameters:
**params (dict) – Estimator parameters. Use double underscore notation for nested parameters (e.g., step_name__param_name=value).
- Returns:
The pipeline instance.
- Return type:
- Raises:
ValueError – If an invalid parameter name is provided.