gators.pipeline package#
Module contents#
- class gators.pipeline.Pipeline[source]#
Bases:
gators.transformer._base_transformer._BaseTransformerPipeline of transformers for Polars DataFrames.
Sequentially applies a list of transforms. This is a lightweight alternative to sklearn.pipeline.Pipeline specifically designed for Gators transformers that work with Polars DataFrames.
- Parameters:
steps (list[tuple[ str, Any]]) – List of (name, transform) tuples that are chained in the order they are specified. Each transform must implement fit and transform methods.
verbose (bool, default=False) – If True, emits a one-line summary per step to stdout showing the step name, row count, column count, total null count, and wall-clock time. When
Falsethere is zero measurement overhead.
Examples
>>> from gators.pipeline import Pipeline >>> from gators.imputers import NumericImputer, StringImputer >>> from gators.encoders import WOEEncoder >>> >>> steps = [ ... ('numeric_imputer', NumericImputer(strategy='median')), ... ('string_imputer', StringImputer(strategy='constant', value='MISSING')), ... ('woe_encoder', WOEEncoder(subset=['cat_col'])) ... ] >>> pipe = Pipeline(steps=steps) >>> pipe.fit(X_train, y=y_train) >>> X_transformed = pipe.transform(X_train)
- fit(X: polars.DataFrame, y: polars.Series | None = None) gators.pipeline.pipeline.Pipeline[source]#
Fit all transformers in the pipeline.
Fits each transformer sequentially, transforming the data before fitting the next transformer. This ensures each transformer sees the output of the previous transformer.
- Parameters:
X (pl.DataFrame) – Input DataFrame to fit.
y (pl.Series, default=None) – Target series for supervised transformers (e.g., WOEEncoder).
- Returns:
The fitted pipeline instance.
- Return type:
- transform(X: polars.DataFrame) polars.DataFrame[source]#
Transform data by applying all transformers in sequence.
- Parameters:
X (pl.DataFrame) – Input DataFrame to transform.
- Returns:
Transformed DataFrame.
- Return type:
pl.DataFrame
- fit_transform(X: polars.DataFrame, y: polars.Series | None = None) polars.DataFrame[source]#
Fit all transformers and transform the data.
Fits and transforms each transformer sequentially. This is more efficient than calling fit() followed by transform() separately.
- Parameters:
X (pl.DataFrame) – Input DataFrame to fit and transform.
y (pl.Series, default=None) – Target series for supervised transformers.
- Returns:
Transformed DataFrame.
- Return type:
pl.DataFrame
- get_params(deep: bool = True) dict[source]#
Get parameters for this estimator.
- Parameters:
deep (bool, default=True) – If True, returns parameters of all sub-estimators. If False, only returns pipeline-level parameters.
- Returns:
Parameter names mapped to their values. Nested parameters use double underscore notation (e.g., ‘step_name__param’).
- Return type:
dict
- clone() gators.pipeline.pipeline.Pipeline[source]#
Return a new unfitted pipeline with the same hyperparameters.
Each transformer is re-instantiated using only its public constructor parameters (obtained via
get_params()). Private attributes that hold fitted state (e.g._statistics,mapping_) are not copied, so the returned pipeline is guaranteed to be unfitted.This is the recommended alternative to
copy.deepcopyfor cross-validation workflows where you need multiple independent copies of the same pipeline configuration.- Returns:
A new, unfitted
Pipelineinstance with identical hyperparameters.- Return type:
Examples
>>> from gators.pipeline import Pipeline >>> from gators.imputers import NumericImputer >>> from gators.scalers import StandardScaler
>>> pipe = Pipeline(steps=[ ... ('impute', NumericImputer(strategy='median')), ... ('scale', StandardScaler()), ... ]) >>> pipe_clone = pipe.clone() >>> pipe_clone is pipe False >>> pipe_clone.named_steps['impute'] is pipe.named_steps['impute'] False
- set_params(**params)[source]#
Set parameters for this estimator.
- Parameters:
**params (dict) – Estimator parameters. Use double underscore notation for nested parameters (e.g., step_name__param_name=value).
- Returns:
The pipeline instance.
- Return type:
- Raises:
ValueError – If an invalid parameter name is provided.