gators.feature_generation_dt package#
Module contents#
- class gators.feature_generation_dt.CyclicFeatures[source]#
Bases:
BaseModel,BaseEstimator,TransformerMixinGenerates cyclic features from datetime columns using sine transformations with multiple phase angles.
Cyclic features are useful for representing periodic temporal patterns (e.g., month, hour) where the start and end of the cycle should be close in feature space.
- Parameters:
subset (Optional[List[str]], optional) – List of datetime columns to extract features from. If None, all datetime columns in the dataframe will be used, by default None.
components (List[str]) – List of date and time components to extract cyclic features from. Valid values: ‘semester’, ‘quarter’, ‘month’, ‘week’, ‘day_of_week’, ‘day_of_month’, ‘day_of_year’, ‘hour’, ‘minute’, ‘second’.
angles (List[float]) – List of phase shift angles in degrees. For each component, a sine feature will be generated for each angle. For example, [0, 45, 90, 135, 180] will create five features with 0°, 45°, 90°, 135°, and 180° phase shifts.
drop_columns (bool, optional) – Whether to drop the original datetime columns after feature extraction, by default False.
Examples
>>> from datetime_cyclic_features import CyclicFeatures >>> import polars as pl
>>> X ={'date': ['2023-01-01', '2023-02-01', '2023-03-01'], ... 'datetime': ['2023-01-01T00:00:00', '2023-02-01T12:00:00', '2023-03-01T23:59:59']} >>> X = pl.DataFrame(X).with_columns([ ... pl.col('date').str.strptime(pl.Date, '%Y-%m-%d'), ... pl.col('datetime').str.strptime(pl.Datetime, '%Y-%m-%dT%H:%M:%S') ... ])
# Example: Generate cyclic features for month with multiple angles >>> transformer = DatetimeCyclicFeatures(subset=[‘date’], components=[‘month’], angles=[0, 45, 90, 135, 180], drop_columns=False) >>> transformer.fit(X) DatetimeCyclicFeatures(subset=[‘date’], components=[‘month’], angles=[0, 45, 90, 135, 180], drop_columns=False) >>> result = transformer.transform(X) >>> result shape: (3, 7) ┌────────────┬──────────────┬────────────┬────────────┬────────────┬─────────────┬─────────────┐ │ date │ datetime │ date__mon… │ date__mon… │ date__mon… │ date__mon… │ date__mon… │ │ │ │ th__sin0 │ th__sin45 │ th__sin90 │ th__sin135 │ th__sin180 │ │ date │ datetime │ f64 │ f64 │ f64 │ f64 │ f64 │ ├────────────┼──────────────┼────────────┼────────────┼────────────┼─────────────┼─────────────┤ │ 2023-01-01 │ 2023-01-01… │ 0.500000 │ 0.965926 │ 0.866025 │ 0.258819 │ -0.500000 │ │ 2023-02-01 │ 2023-02-01… │ 0.866025 │ 0.965926 │ 0.500000 │ -0.258819 │ -0.866025 │ │ 2023-03-01 │ 2023-03-01… │ 1.000000 │ 0.707107 │ 0.000000 │ -0.707107 │ -1.000000 │ └────────────┴──────────────┴────────────┴────────────┴────────────┴─────────────┴─────────────┘
- class gators.feature_generation_dt.OrdinalFeatures[source]#
Bases:
BaseModel,BaseEstimator,TransformerMixinGenerates ordinal features from datetime columns.
Ordinal features extract standard temporal components (year, month, hour, etc.) as integer values from datetime columns.
- Parameters:
subset (Optional[List[str]], optional) – List of datetime columns to extract features from. If None, all datetime columns in the dataframe will be used, by default None.
components (List[str]) – List of date and time components to extract. Valid values: ‘century’, ‘year’, ‘semester’, ‘quarter’, ‘month’, ‘week’, ‘day_of_week’, ‘day_of_month’, ‘day_of_year’, ‘weekend’, ‘leap_year’, ‘hour’, ‘minute’, ‘second’.
drop_columns (bool, optional) – Whether to drop the original datetime columns after feature extraction, by default False.
Examples
>>> from datetime_ordinal_features import OrdinalFeatures >>> import polars as pl
>>> X ={'date': ['2023-01-01', '2023-02-01', '2023-03-01'], ... 'datetime': ['2023-01-01T00:00:00', '2023-02-01T12:00:00', '2023-03-01T23:59:59']} >>> X = pl.DataFrame(X).with_columns([ ... pl.col('date').str.strptime(pl.Date, '%Y-%m-%d'), ... pl.col('datetime').str.strptime(pl.Datetime, '%Y-%m-%dT%H:%M:%S') ... ])
Example 1: Extract year and month from all datetime columns
>>> transformer = DatetimeOrdinalFeatures(components=['year', 'month'], drop_columns=True) >>> transformer.fit(X) DatetimeOrdinalFeatures(components=['year', 'month'], drop_columns=True) >>> result = transformer.transform(X) >>> result shape: (3, 4) ┌────────────┬──────────────┬────────────────┬─────────────────┐ │ date__year │ date__month │ datetime__year │ datetime__month │ │ i64 │ i64 │ i64 │ i64 │ ├────────────┼──────────────┼────────────────┼─────────────────┤ │ 2023 │ 1 │ 2023 │ 1 │ │ 2023 │ 2 │ 2023 │ 2 │ │ 2023 │ 3 │ 2023 │ 3 │ └────────────┴──────────────┴────────────────┴─────────────────┘
Example 2: Extract from specific column, keep original
>>> transformer = DatetimeOrdinalFeatures(subset=['date'], components=['month', 'weekend'], drop_columns=False) >>> transformer.fit(X) DatetimeOrdinalFeatures(subset=['date'], components=['month', 'weekend'], drop_columns=False) >>> result = transformer.transform(X) >>> result shape: (3, 5) ┌────────────┬─────────────────────┬──────────────┬───────────────┐ │ date │ datetime │ date__month │ date__weekend │ │ date │ datetime │ i64 │ bool │ ├────────────┼─────────────────────┼──────────────┼───────────────┤ │ 2023-01-01 │ 2023-01-01T00:00:00 │ 1 │ true │ │ 2023-02-01 │ 2023-02-01T12:00:00 │ 2 │ false │ │ 2023-03-01 │ 2023-03-01T23:59:59 │ 3 │ false │ └────────────┴─────────────────────┴──────────────┴───────────────┘
- class gators.feature_generation_dt.DiffFeatures[source]#
Bases:
BaseModel,BaseEstimator,TransformerMixinGenerates time difference features between datetime columns or against reference dates.
Calculates differences in various units (days, hours, minutes, seconds) which are particularly useful for tree-based models to capture recency, age, and time elapsed.
- Parameters:
column_pairs (Optional[List[tuple[str, str]]], default=None) – List of column pairs (col_a, col_b) to compute differences (col_a - col_b). If None, no pairwise differences are computed.
reference_dates (Optional[dict[str, Union[str, datetime]]], default=None) – Dictionary mapping column names to reference dates. Computes (column - reference_date). Reference dates can be ISO format strings or datetime objects.
units (List[Literal["d", "h", "m", "s"]], default=["d"]) – Units for computing time differences.
drop_columns (bool, default=False) – Whether to drop the original datetime columns after creating differences.
Examples
>>> from gators.feature_generation_dt import DiffFeatures >>> import polars as pl >>> from datetime import datetime
>>> X =pl.DataFrame({ ... 'created_at': ['2023-01-01', '2023-06-15', '2024-01-01'], ... 'updated_at': ['2023-01-10', '2023-07-01', '2024-02-01'], ... 'value': [100, 200, 300] ... }).with_columns([ ... pl.col('created_at').str.strptime(pl.Datetime, '%Y-%m-%d'), ... pl.col('updated_at').str.strptime(pl.Datetime, '%Y-%m-%d') ... ])
Example 1: Pairwise difference
>>> transformer = DiffFeatures( ... column_pairs=[('updated_at', 'created_at')], ... units=['days'] ... ) >>> result = transformer.fit_transform(X) >>> print(result) shape: (3, 4) ┌─────────────────────┬─────────────────────┬───────┬──────────────────────────┐ │ created_at ┆ updated_at ┆ value ┆ updated_at_minus_created │ │ --- ┆ --- ┆ --- ┆ _at__days │ │ datetime[μs] ┆ datetime[μs] ┆ i64 ┆ i64 │ ├─────────────────────┼─────────────────────┼───────┼──────────────────────────┤ │ 2023-01-01 00:00:00 ┆ 2023-01-10 00:00:00 ┆ 100 ┆ 9 │ │ 2023-06-15 00:00:00 ┆ 2023-07-01 00:00:00 ┆ 200 ┆ 16 │ │ 2024-01-01 00:00:00 ┆ 2024-02-01 00:00:00 ┆ 300 ┆ 31 │ └─────────────────────┴─────────────────────┴───────┴──────────────────────────┘
Example 2: Reference date (time since reference)
>>> transformer = DiffFeatures( ... reference_dates={'created_at': '2024-01-01'}, ... units=['days', 'hours'] ... ) >>> result = transformer.fit_transform(X) >>> print(result) shape: (3, 5) ┌─────────────────────┬─────────────────────┬───────┬─────────────┬──────────────┐ │ created_at ┆ updated_at ┆ value ┆ created_at_ ┆ created_at_ │ │ --- ┆ --- ┆ --- ┆ since_ref ┆ since_ref │ │ datetime[μs] ┆ datetime[μs] ┆ i64 ┆ __days ┆ __hours │ ├─────────────────────┼─────────────────────┼───────┼─────────────┼──────────────┤ │ 2023-01-01 00:00:00 ┆ 2023-01-10 00:00:00 ┆ 100 ┆ -365 ┆ -8760 │ │ 2023-06-15 00:00:00 ┆ 2023-07-01 00:00:00 ┆ 200 ┆ -200 ┆ -4800 │ │ 2024-01-01 00:00:00 ┆ 2024-02-01 00:00:00 ┆ 300 ┆ 0 ┆ 0 │ └─────────────────────┴─────────────────────┴───────┴─────────────┴──────────────┘
Example 3: Multiple units
>>> transformer = DiffFeatures( ... column_pairs=[('updated_at', 'created_at')], ... units=['days', 'hours', 'minutes'] ... ) >>> result = transformer.fit_transform(X)
- class gators.feature_generation_dt.DurationToDatetime[source]#
Bases:
BaseModel,BaseEstimator,TransformerMixinConverts numeric time offset columns to datetime by adding durations to a reference date.
This transformer is useful when you have numeric time offsets (e.g., seconds, days) that need to be converted to actual datetime values by adding them to a reference date. The reference date can be a fixed datetime literal or a column containing dates.
- Parameters:
subset (List[str]) – List of column names containing numeric time offsets to convert.
reference_date (Union[datetime, str]) –
Reference date to add offsets to. Can be:
A datetime object: Same reference date for all rows
A string (column name): Different reference date per row from that column
unit (Literal["s", "m", "h", "d", "ms", "us"], default="s") –
Time unit of the numeric offset columns:
”s”: seconds
”m”: minutes
”h”: hours
”d”: days
”ms”: milliseconds
”us”: microseconds
drop_columns (bool, default=False) – Whether to drop the original numeric offset columns after conversion.
Examples
>>> from gators.feature_generation_dt import DurationToDatetime >>> import polars as pl >>> from datetime import datetime
>>> X = pl.DataFrame({ ... 'TransactionDT': [86400, 172800, 259200], # seconds ... 'value': [100, 200, 300] ... })
Example 1: Convert seconds to datetime with fixed reference date
>>> transformer = DurationToDatetime( ... subset=['TransactionDT'], ... reference_date=datetime(2017, 11, 30), ... unit='s', ... drop_columns=False ... ) >>> result = transformer.fit_transform(X) >>> print(result) shape: (3, 3) ┌───────────────┬───────┬──────────────────────────┐ │ TransactionDT ┆ value ┆ TransactionDT__datetime │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ datetime[μs] │ ├───────────────┼───────┼──────────────────────────┤ │ 86400 ┆ 100 ┆ 2017-12-01 00:00:00 │ │ 172800 ┆ 200 ┆ 2017-12-02 00:00:00 │ │ 259200 ┆ 300 ┆ 2017-12-03 00:00:00 │ └───────────────┴───────┴──────────────────────────┘
Example 2: Convert with column-based reference dates
>>> X = pl.DataFrame({ ... 'BaseDate': [datetime(2024, 1, 1), datetime(2024, 2, 1), datetime(2024, 3, 1)], ... 'offset_days': [7, 14, 21], ... 'value': [100, 200, 300] ... }) >>> transformer = DurationToDatetime( ... subset=['offset_days'], ... reference_date='BaseDate', # column name ... unit='d', ... drop_columns=False ... ) >>> result = transformer.fit_transform(X) >>> print(result) shape: (3, 4) ┌─────────────────────┬──────────────┬───────┬─────────────────────────┐ │ BaseDate ┆ offset_days ┆ value ┆ offset_days__datetime │ │ --- ┆ --- ┆ --- ┆ --- │ │ datetime[μs] ┆ i64 ┆ i64 ┆ datetime[μs] │ ├─────────────────────┼──────────────┼───────┼─────────────────────────┤ │ 2024-01-01 00:00:00 ┆ 7 ┆ 100 ┆ 2024-01-08 00:00:00 │ │ 2024-02-01 00:00:00 ┆ 14 ┆ 200 ┆ 2024-02-15 00:00:00 │ │ 2024-03-01 00:00:00 ┆ 21 ┆ 300 ┆ 2024-03-22 00:00:00 │ └─────────────────────┴──────────────┴───────┴─────────────────────────┘
Example 3: Multiple columns with different units
>>> X = pl.DataFrame({ ... 'offset_hours': [24, 48, 72], ... 'offset_minutes': [60, 120, 180], ... 'value': [1, 2, 3] ... }) >>> transformer1 = DurationToDatetime( ... subset=['offset_hours'], ... reference_date=datetime(2024, 1, 1), ... unit='h' ... ) >>> transformer2 = DurationToDatetime( ... subset=['offset_minutes'], ... reference_date=datetime(2024, 1, 1), ... unit='m' ... ) >>> result = transformer1.fit_transform(X) >>> result = transformer2.fit_transform(result)
- class gators.feature_generation_dt.BusinessTimeFeatures[source]#
Bases:
BaseModel,BaseEstimator,TransformerMixinGenerates business time features from datetime columns.
Creates binary indicators and classifications for business-relevant time periods such as business hours, business days, and time of business day.
- Parameters:
subset (Optional[List[str]], default=None) – List of datetime columns to extract features from. If None, all datetime columns will be used.
business_hours_start (int, default=9) – Start hour for business hours (24-hour format, 0-23).
business_hours_end (int, default=17) – End hour for business hours (24-hour format, 0-23).
weekend_days (List[int], default=[5, 6]) – Days of week considered weekend (0=Monday, 6=Sunday). Default is Saturday and Sunday.
features (List[str], default=["is_business_hour", "is_business_day", "time_of_business_day"]) –
List of features to generate. Options:
”is_business_hour”: Boolean for whether time is during business hours
”is_business_day”: Boolean for whether day is a business day (not weekend)
”time_of_business_day”: Category (before_hours, during_hours, after_hours)
”hour_of_business_day”: Hour within business day (0-based from start)
drop_columns (bool, default=False) – Whether to drop the original datetime columns after feature extraction.
Examples
>>> from gators.feature_generation_dt import BusinessTimeFeatures >>> import polars as pl
>>> X =pl.DataFrame({ ... 'timestamp': [ ... '2024-01-15 08:00:00', # Monday, before hours ... '2024-01-15 10:30:00', # Monday, during hours ... '2024-01-15 18:00:00', # Monday, after hours ... '2024-01-20 10:00:00', # Saturday, weekend ... ] ... }).with_columns( ... pl.col('timestamp').str.strptime(pl.Datetime, '%Y-%m-%d %H:%M:%S') ... )
Example 1: Default business time features
>>> transformer = BusinessTimeFeatures( ... subset=['timestamp'], ... features=['is_business_hour', 'is_business_day'] ... ) >>> result = transformer.fit_transform(X) >>> print(result) shape: (4, 3) ┌─────────────────────┬───────────────────┬─────────────────┐ │ timestamp ┆ timestamp__is_bus ┆ timestamp__is_b │ │ --- ┆ iness_hour ┆ usiness_day │ │ datetime[μs] ┆ --- ┆ --- │ │ ┆ bool ┆ bool │ ├─────────────────────┼───────────────────┼─────────────────┤ │ 2024-01-15 08:00:00 ┆ false ┆ true │ │ 2024-01-15 10:30:00 ┆ true ┆ true │ │ 2024-01-15 18:00:00 ┆ false ┆ true │ │ 2024-01-20 10:00:00 ┆ true ┆ false │ └─────────────────────┴───────────────────┴─────────────────┘
Example 2: Time of business day classification
>>> transformer = BusinessTimeFeatures( ... subset=['timestamp'], ... features=['time_of_business_day'], ... business_hours_start=9, ... business_hours_end=17 ... ) >>> result = transformer.fit_transform(X) >>> print(result) shape: (4, 2) ┌─────────────────────┬─────────────────────────────┐ │ timestamp ┆ timestamp__time_of_business │ │ --- ┆ _day │ │ datetime[μs] ┆ --- │ │ ┆ str │ ├─────────────────────┼─────────────────────────────┤ │ 2024-01-15 08:00:00 ┆ before_hours │ │ 2024-01-15 10:30:00 ┆ during_hours │ │ 2024-01-15 18:00:00 ┆ after_hours │ │ 2024-01-20 10:00:00 ┆ weekend │ └─────────────────────┴─────────────────────────────┘
Example 3: All features with custom hours
>>> transformer = BusinessTimeFeatures( ... subset=['timestamp'], ... features=['is_business_hour', 'is_business_day', 'time_of_business_day', 'hour_of_business_day'], ... business_hours_start=8, ... business_hours_end=18 ... ) >>> result = transformer.fit_transform(X)
- class gators.feature_generation_dt.TimeBinFeatures[source]#
Bases:
BaseModel,BaseEstimator,TransformerMixinGenerates time bin features by categorizing datetime values into periods.
Bins datetime components into meaningful categories like part of day, season, time of month, etc. These categorical features are particularly useful for tree-based models to capture non-linear temporal patterns.
- Parameters:
subset (Optional[List[str]], default=None) – List of datetime columns to extract features from. If None, all datetime columns will be used.
bin_types (List[Literal["part_of_day", "season", "time_of_month", "time_of_year", "rush_hour"]], default=["part_of_day", "season", "time_of_month", "time_of_year", "rush_hour"]) –
Types of time bins to generate. Options:
”part_of_day”: night, morning, afternoon, evening
”season”: spring, summer, fall, winter
”time_of_month”: beginning, middle, end
”time_of_year”: early, mid, late
”rush_hour”: morning_rush, evening_rush, off_peak
hemisphere (Literal["northern", "southern"], default="northern") – Hemisphere for season calculation.
drop_columns (bool, default=False) – Whether to drop the original datetime columns after feature extraction.
Examples
>>> from gators.feature_generation_dt import TimeBinFeatures >>> import polars as pl
>>> X =pl.DataFrame({ ... 'timestamp': [ ... '2024-01-15 06:00:00', ... '2024-01-15 10:00:00', ... '2024-01-15 14:00:00', ... '2024-01-15 20:00:00', ... '2024-07-15 14:00:00', ... ] ... }).with_columns( ... pl.col('timestamp').str.strptime(pl.Datetime, '%Y-%m-%d %H:%M:%S') ... )
Example 1: Part of day
>>> transformer = TimeBinFeatures( ... subset=['timestamp'], ... bin_types=['part_of_day'] ... ) >>> result = transformer.fit_transform(X) >>> print(result) shape: (5, 2) ┌─────────────────────┬─────────────────────────┐ │ timestamp ┆ timestamp__part_of_day │ │ --- ┆ --- │ │ datetime[μs] ┆ str │ ├─────────────────────┼─────────────────────────┤ │ 2024-01-15 06:00:00 ┆ morning │ │ 2024-01-15 10:00:00 ┆ morning │ │ 2024-01-15 14:00:00 ┆ afternoon │ │ 2024-01-15 20:00:00 ┆ evening │ │ 2024-07-15 14:00:00 ┆ afternoon │ └─────────────────────┴─────────────────────────┘
Example 2: Season (northern hemisphere)
>>> transformer = TimeBinFeatures( ... subset=['timestamp'], ... bin_types=['season'], ... hemisphere='northern' ... ) >>> result = transformer.fit_transform(X) >>> print(result) shape: (5, 2) ┌─────────────────────┬───────────────────┐ │ timestamp ┆ timestamp__season │ │ --- ┆ --- │ │ datetime[μs] ┆ str │ ├─────────────────────┼───────────────────┤ │ 2024-01-15 06:00:00 ┆ winter │ │ 2024-01-15 10:00:00 ┆ winter │ │ 2024-01-15 14:00:00 ┆ winter │ │ 2024-01-15 20:00:00 ┆ winter │ │ 2024-07-15 14:00:00 ┆ summer │ └─────────────────────┴───────────────────┘
Example 3: Multiple bin types
>>> transformer = TimeBinFeatures( ... subset=['timestamp'], ... bin_types=['part_of_day', 'time_of_month', 'rush_hour'] ... ) >>> result = transformer.fit_transform(X)
- class gators.feature_generation_dt.TimeWindowFeatures[source]#
Bases:
BaseModel,BaseEstimator,TransformerMixinGenerates time-based window aggregation features (velocity features).
This transformer creates rolling window statistics over time periods, useful for detecting unusual patterns like transaction velocity, spending bursts, etc.
Features are computed looking backward from each row (excluding the current row), optionally grouped by categorical columns.
- Parameters:
subset (List[str]) – List of numerical column names to aggregate over time windows.
time_column (str) – Name of the datetime column to use for time-based windowing.
windows (List[str]) –
List of time window strings. Supported formats:
’30m’ = 30 minutes
’1h’ = 1 hour
’24h’ = 24 hours
’7d’ = 7 days
’1M’ = 1 month (30 days)
’1Y’ = 1 year (365 days)
by (Optional[List[str]], default=None) – Optional list of columns to group by. Windows are computed within each group. Example: [‘card1’] computes “transactions in last 24h for this card”
func (List[str], default=['count', 'mean']) –
List of aggregation functions to apply. Available options:
’count’: Count of rows in window
’mean’: Mean of values in window
’sum’: Sum of values in window
’std’: Standard deviation in window
’median’: Median in window
’min’: Minimum value in window
’max’: Maximum value in window
drop_columns (bool, default=False) – Whether to drop the original numerical columns after creating features.
new_column_names (Optional[List[str]], default=None) – List of custom names for the window columns. If None, uses default naming pattern.
Examples
>>> from gators.feature_generation import TimeWindowFeatures >>> import polars as pl >>> from datetime import datetime, timedelta
>>> # Sample transaction data >>> X ={ ... 'TransactionDT': [ ... datetime(2024, 1, 1, 10, 0), ... datetime(2024, 1, 1, 10, 30), ... datetime(2024, 1, 1, 11, 0), ... datetime(2024, 1, 1, 12, 0), ... datetime(2024, 1, 2, 10, 0), ... ], ... 'TransactionAmt': [100, 150, 200, 120, 180], ... 'card1': ['A', 'A', 'A', 'B', 'A'] ... } >>> X = pl.DataFrame(X)
Example 1: Global time windows (no grouping)
>>> transformer = TimeWindowFeatures( ... subset=['TransactionAmt'], ... time_column='TransactionDT', ... windows=['1h', '24h'], ... func=['count', 'mean'] ... ) >>> result = transformer.fit_transform(X) >>> result.columns ['TransactionDT', 'TransactionAmt', 'card1', 'count_TransactionAmt_1h', 'mean_TransactionAmt_1h', 'count_TransactionAmt_24h', 'mean_TransactionAmt_24h']
Example 2: Grouped time windows (per card)
>>> transformer = TimeWindowFeatures( ... subset=['TransactionAmt'], ... time_column='TransactionDT', ... windows=['1h', '24h'], ... by=['card1'], ... func=['count', 'sum'] ... ) >>> result = transformer.fit_transform(X) >>> # Creates: count/sum of TransactionAmt in last 1h/24h per card1
Example 3: Multiple windows for fraud detection
>>> transformer = TimeWindowFeatures( ... subset=['TransactionAmt'], ... time_column='TransactionDT', ... windows=['30m', '1h', '3h', '24h', '7d'], ... by=['card1'], ... func=['count', 'mean', 'sum', 'max'] ... ) >>> # Detects velocity: "Card has 50 transactions in last hour"
Notes
Data should be sorted by time_column for correct window calculations
Current row is EXCLUDED from window calculations
First rows (no history) have null values
Windows look backward from current time
Useful for velocity features, spending patterns, anomaly detection
- class gators.feature_generation_dt.HolidayFeatures[source]#
Bases:
BaseModel,BaseEstimator,TransformerMixinGenerates holiday-related features from datetime columns.
Detects holidays and calculates distance to nearest holidays, which is useful for retail, finance, and other domains where holidays affect patterns. Uses the holidays library for accurate, year-specific holiday dates.
- Parameters:
subset (Optional[List[str]], default=None) – List of datetime columns to extract features from. If None, all datetime columns will be used.
country (str, default="US") – Country code for holidays (e.g., “US”, “UK”, “CA”, “DE”, “FR”, “JP”). Supports any country code from the holidays library. See https://pypi.org/project/holidays/ for full list of supported countries.
years (Optional[List[int]], default=None) – Years to include holidays for. If None, will automatically detect years from data.
features (List[str], default=["is_holiday", "days_to_holiday", "days_from_holiday"]) –
Features to generate. Options:
”is_holiday”: Boolean for whether date is a holiday
”days_to_holiday”: Days until next holiday (negative if past)
”days_from_holiday”: Days since last holiday (negative if future)
”nearest_holiday_distance”: Absolute days to nearest holiday
drop_columns (bool, default=False) – Whether to drop the original datetime columns after feature extraction.
Examples
>>> from gators.feature_generation_dt import HolidayFeatures >>> import polars as pl
>>> X =pl.DataFrame({ ... 'date': [ ... '2024-01-01', # New Year's Day ... '2024-01-15', # Around MLK Day ... '2024-07-03', # Day before Independence Day ... '2024-07-04', # Independence Day ... '2024-07-05', # Day after Independence Day ... ] ... }).with_columns( ... pl.col('date').str.strptime(pl.Datetime, '%Y-%m-%d') ... )
Example 1: Is holiday detection
>>> transformer = HolidayFeatures( ... subset=['date'], ... features=['is_holiday'] ... ) >>> result = transformer.fit_transform(X) >>> print(result) shape: (5, 2) ┌─────────────────────┬──────────────────┐ │ date ┆ date__is_holiday │ │ --- ┆ --- │ │ datetime[μs] ┆ bool │ ├─────────────────────┼──────────────────┤ │ 2024-01-01 00:00:00 ┆ true │ │ 2024-01-15 00:00:00 ┆ true │ │ 2024-07-03 00:00:00 ┆ false │ │ 2024-07-04 00:00:00 ┆ true │ │ 2024-07-05 00:00:00 ┆ false │ └─────────────────────┴──────────────────┘
Example 2: Distance to holidays
>>> transformer = HolidayFeatures( ... subset=['date'], ... features=['nearest_holiday_distance'] ... ) >>> result = transformer.fit_transform(X)
Example 3: UK holidays
>>> transformer = HolidayFeatures( ... subset=['date'], ... country='UK', ... features=['is_holiday'] ... ) >>> result = transformer.fit_transform(X)