Gators is a lightning-fast data preprocessing and feature engineering library built on top of Polars, designed to streamline your entire ML workflow from raw data to production-ready models. Leveraging Polarsβ blazing-fast multi-core processing.
Built by the PSP Data Team at PayPal, Gators makes data preprocessing and feature engineering both faster and simpler.
Key Features#
π Lightning Fast: Built on Polars for multi-core parallel processing
π Unified API: Consistent sklearn-style
.fit()and.transform()interfaceπ¦ Production Ready: Deploy the same Python code from notebook to production
π― Comprehensive: 60+ preprocessing transformers covering every use case
π Pipeline Support: Chain transformers seamlessly with the Pipeline class
π Easy to Learn: If you know sklearn, you already know Gators
Quick Start#
import polars as pl
from gators.data_cleaning import DropHighNaNRatio, VarianceFilter
from gators.encoders import OneHotEncoder
from gators.imputers import NumericImputer
from gators.scalers import StandardScaler
from gators.pipeline import Pipeline
# Load your data
X = pl.read_csv("data.csv")
# Build a preprocessing pipeline
pipeline = Pipeline([
('drop_nan', DropHighNaNRatio(threshold=0.5)),
('impute', NumericImputer(strategy='median')),
('variance', VarianceFilter(threshold=0.01)),
('encode', OneHotEncoder()),
('scale', StandardScaler())
])
# Fit and transform
X_processed = pipeline.fit_transform(X)
# Deploy the same pipeline in production!
What Can Gators Do?#
70+ transformers across 8 categories:
π§Ή Data Cleaning - Quality filters, deduplication, and more
βοΈ Clippers - Custom min/max bounds, Gaussian, IQR, MAD, Quantile, and more
π§© Encoders - OneHot, Target, WOE, CatBoost, and more
π― Numeric Features - Polynomial, rule-based features, and more
π String Features - Text properties, pattern detection, n-grams, and more
π DateTime Features - Temporal patterns, cyclical encoding, holidays, and more
π Imputation - Numeric, string, boolean, and group-based strategies
π Discretization - Equal-width, quantile, tree-based binning, and more
βοΈ Scaling - Standard, min-max, Box-Cox, and more
π Pipeline - Chain transformers seamlessly