gators.feature_generation_dt package#

Module contents#

class gators.feature_generation_dt.CyclicFeatures[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Generates cyclic features from datetime columns using sine transformations with multiple phase angles.

Cyclic features are useful for representing periodic temporal patterns (e.g., month, hour) where the start and end of the cycle should be close in feature space.

Parameters:
  • subset (Optional[List[str]], optional) – List of datetime columns to extract features from. If None, all datetime columns in the dataframe will be used, by default None.

  • components (List[str]) – List of date and time components to extract cyclic features from. Valid values: ‘semester’, ‘quarter’, ‘month’, ‘week’, ‘day_of_week’, ‘day_of_month’, ‘day_of_year’, ‘hour’, ‘minute’, ‘second’.

  • angles (List[float]) – List of phase shift angles in degrees. For each component, a sine feature will be generated for each angle. For example, [0, 45, 90, 135, 180] will create five features with 0°, 45°, 90°, 135°, and 180° phase shifts.

  • drop_columns (bool, optional) – Whether to drop the original datetime columns after feature extraction, by default False.

Examples

>>> from datetime_cyclic_features import CyclicFeatures
>>> import polars as pl
>>> X ={'date': ['2023-01-01', '2023-02-01', '2023-03-01'],
...         'datetime': ['2023-01-01T00:00:00', '2023-02-01T12:00:00', '2023-03-01T23:59:59']}
>>> X = pl.DataFrame(X).with_columns([
...     pl.col('date').str.strptime(pl.Date, '%Y-%m-%d'),
...     pl.col('datetime').str.strptime(pl.Datetime, '%Y-%m-%dT%H:%M:%S')
... ])

# Example: Generate cyclic features for month with multiple angles >>> transformer = DatetimeCyclicFeatures(subset=[‘date’], components=[‘month’], angles=[0, 45, 90, 135, 180], drop_columns=False) >>> transformer.fit(X) DatetimeCyclicFeatures(subset=[‘date’], components=[‘month’], angles=[0, 45, 90, 135, 180], drop_columns=False) >>> result = transformer.transform(X) >>> result shape: (3, 7) ┌────────────┬──────────────┬────────────┬────────────┬────────────┬─────────────┬─────────────┐ │ date │ datetime │ date__mon… │ date__mon… │ date__mon… │ date__mon… │ date__mon… │ │ │ │ th__sin0 │ th__sin45 │ th__sin90 │ th__sin135 │ th__sin180 │ │ date │ datetime │ f64 │ f64 │ f64 │ f64 │ f64 │ ├────────────┼──────────────┼────────────┼────────────┼────────────┼─────────────┼─────────────┤ │ 2023-01-01 │ 2023-01-01… │ 0.500000 │ 0.965926 │ 0.866025 │ 0.258819 │ -0.500000 │ │ 2023-02-01 │ 2023-02-01… │ 0.866025 │ 0.965926 │ 0.500000 │ -0.258819 │ -0.866025 │ │ 2023-03-01 │ 2023-03-01… │ 1.000000 │ 0.707107 │ 0.000000 │ -0.707107 │ -1.000000 │ └────────────┴──────────────┴────────────┴────────────┴────────────┴─────────────┴─────────────┘

fit(X, y=None)[source]#

Fit the transformer.

Parameters:
  • X (DataFrame) – Input DataFrame.

  • y (Series | None) – Target variable. Not used, present here for compatibility.

Returns:

Fitted transformer instance.

Return type:

CyclicFeatures

transform(X)[source]#

Transform the input DataFrame by extracting cyclic features.

Parameters:

X (DataFrame) – Input DataFrame to transform.

Returns:

Transformed DataFrame with cyclic features (sine and cosine).

Return type:

DataFrame

class gators.feature_generation_dt.OrdinalFeatures[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Generates ordinal features from datetime columns.

Ordinal features extract standard temporal components (year, month, hour, etc.) as integer values from datetime columns.

Parameters:
  • subset (Optional[List[str]], optional) – List of datetime columns to extract features from. If None, all datetime columns in the dataframe will be used, by default None.

  • components (List[str]) – List of date and time components to extract. Valid values: ‘century’, ‘year’, ‘semester’, ‘quarter’, ‘month’, ‘week’, ‘day_of_week’, ‘day_of_month’, ‘day_of_year’, ‘weekend’, ‘leap_year’, ‘hour’, ‘minute’, ‘second’.

  • drop_columns (bool, optional) – Whether to drop the original datetime columns after feature extraction, by default False.

Examples

>>> from datetime_ordinal_features import OrdinalFeatures
>>> import polars as pl
>>> X ={'date': ['2023-01-01', '2023-02-01', '2023-03-01'],
...         'datetime': ['2023-01-01T00:00:00', '2023-02-01T12:00:00', '2023-03-01T23:59:59']}
>>> X = pl.DataFrame(X).with_columns([
...     pl.col('date').str.strptime(pl.Date, '%Y-%m-%d'),
...     pl.col('datetime').str.strptime(pl.Datetime, '%Y-%m-%dT%H:%M:%S')
... ])

Example 1: Extract year and month from all datetime columns

>>> transformer = DatetimeOrdinalFeatures(components=['year', 'month'], drop_columns=True)
>>> transformer.fit(X)
DatetimeOrdinalFeatures(components=['year', 'month'], drop_columns=True)
>>> result = transformer.transform(X)
>>> result
shape: (3, 4)
┌────────────┬──────────────┬────────────────┬─────────────────┐
│ date__year │ date__month  │ datetime__year │ datetime__month │
│    i64     │     i64      │      i64       │      i64        │
├────────────┼──────────────┼────────────────┼─────────────────┤
│   2023     │      1       │      2023      │       1         │
│   2023     │      2       │      2023      │       2         │
│   2023     │      3       │      2023      │       3         │
└────────────┴──────────────┴────────────────┴─────────────────┘

Example 2: Extract from specific column, keep original

>>> transformer = DatetimeOrdinalFeatures(subset=['date'], components=['month', 'weekend'], drop_columns=False)
>>> transformer.fit(X)
DatetimeOrdinalFeatures(subset=['date'], components=['month', 'weekend'], drop_columns=False)
>>> result = transformer.transform(X)
>>> result
shape: (3, 5)
┌────────────┬─────────────────────┬──────────────┬───────────────┐
│    date    │       datetime      │ date__month  │ date__weekend │
│    date    │      datetime       │     i64      │      bool     │
├────────────┼─────────────────────┼──────────────┼───────────────┤
│ 2023-01-01 │ 2023-01-01T00:00:00 │      1       │     true      │
│ 2023-02-01 │ 2023-02-01T12:00:00 │      2       │     false     │
│ 2023-03-01 │ 2023-03-01T23:59:59 │      3       │     false     │
└────────────┴─────────────────────┴──────────────┴───────────────┘
fit(X, y=None)[source]#

Fit the transformer.

Parameters:
  • X (DataFrame) – Input DataFrame.

  • y (Series | None) – Target variable. Not used, present here for compatibility.

Returns:

Fitted transformer instance.

Return type:

OrdinalFeatures

transform(X)[source]#

Transform the input DataFrame by extracting ordinal features.

Parameters:

X (DataFrame) – Input DataFrame to transform.

Returns:

Transformed DataFrame with ordinal features.

Return type:

DataFrame

class gators.feature_generation_dt.DiffFeatures[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Generates time difference features between datetime columns or against reference dates.

Calculates differences in various units (days, hours, minutes, seconds) which are particularly useful for tree-based models to capture recency, age, and time elapsed.

Parameters:
  • column_pairs (Optional[List[tuple[str, str]]], default=None) – List of column pairs (col_a, col_b) to compute differences (col_a - col_b). If None, no pairwise differences are computed.

  • reference_dates (Optional[dict[str, Union[str, datetime]]], default=None) – Dictionary mapping column names to reference dates. Computes (column - reference_date). Reference dates can be ISO format strings or datetime objects.

  • units (List[Literal["d", "h", "m", "s"]], default=["d"]) – Units for computing time differences.

  • drop_columns (bool, default=False) – Whether to drop the original datetime columns after creating differences.

Examples

>>> from gators.feature_generation_dt import DiffFeatures
>>> import polars as pl
>>> from datetime import datetime
>>> X =pl.DataFrame({
...     'created_at': ['2023-01-01', '2023-06-15', '2024-01-01'],
...     'updated_at': ['2023-01-10', '2023-07-01', '2024-02-01'],
...     'value': [100, 200, 300]
... }).with_columns([
...     pl.col('created_at').str.strptime(pl.Datetime, '%Y-%m-%d'),
...     pl.col('updated_at').str.strptime(pl.Datetime, '%Y-%m-%d')
... ])

Example 1: Pairwise difference

>>> transformer = DiffFeatures(
...     column_pairs=[('updated_at', 'created_at')],
...     units=['days']
... )
>>> result = transformer.fit_transform(X)
>>> print(result)
shape: (3, 4)
┌─────────────────────┬─────────────────────┬───────┬──────────────────────────┐
│ created_at          ┆ updated_at          ┆ value ┆ updated_at_minus_created │
│ ---                 ┆ ---                 ┆ ---   ┆ _at__days                │
│ datetime[μs]        ┆ datetime[μs]        ┆ i64   ┆ i64                      │
├─────────────────────┼─────────────────────┼───────┼──────────────────────────┤
│ 2023-01-01 00:00:00 ┆ 2023-01-10 00:00:00 ┆ 100   ┆ 9                        │
│ 2023-06-15 00:00:00 ┆ 2023-07-01 00:00:00 ┆ 200   ┆ 16                       │
│ 2024-01-01 00:00:00 ┆ 2024-02-01 00:00:00 ┆ 300   ┆ 31                       │
└─────────────────────┴─────────────────────┴───────┴──────────────────────────┘

Example 2: Reference date (time since reference)

>>> transformer = DiffFeatures(
...     reference_dates={'created_at': '2024-01-01'},
...     units=['days', 'hours']
... )
>>> result = transformer.fit_transform(X)
>>> print(result)
shape: (3, 5)
┌─────────────────────┬─────────────────────┬───────┬─────────────┬──────────────┐
│ created_at          ┆ updated_at          ┆ value ┆ created_at_ ┆ created_at_  │
│ ---                 ┆ ---                 ┆ ---   ┆ since_ref   ┆ since_ref    │
│ datetime[μs]        ┆ datetime[μs]        ┆ i64   ┆ __days      ┆ __hours      │
├─────────────────────┼─────────────────────┼───────┼─────────────┼──────────────┤
│ 2023-01-01 00:00:00 ┆ 2023-01-10 00:00:00 ┆ 100   ┆ -365        ┆ -8760        │
│ 2023-06-15 00:00:00 ┆ 2023-07-01 00:00:00 ┆ 200   ┆ -200        ┆ -4800        │
│ 2024-01-01 00:00:00 ┆ 2024-02-01 00:00:00 ┆ 300   ┆ 0           ┆ 0            │
└─────────────────────┴─────────────────────┴───────┴─────────────┴──────────────┘

Example 3: Multiple units

>>> transformer = DiffFeatures(
...     column_pairs=[('updated_at', 'created_at')],
...     units=['days', 'hours', 'minutes']
... )
>>> result = transformer.fit_transform(X)
fit(X, y=None)[source]#

Fit the transformer by parsing reference dates.

Parameters:
  • X (DataFrame) – Input DataFrame.

  • y (Series | None) – Target variable. Not used, present here for compatibility.

Returns:

Fitted transformer instance.

Return type:

DiffFeatures

transform(X)[source]#

Transform the input DataFrame by creating time difference features.

Parameters:

X (DataFrame) – Input DataFrame to transform.

Returns:

Transformed DataFrame with time difference features.

Return type:

DataFrame

class gators.feature_generation_dt.DurationToDatetime[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Converts numeric time offset columns to datetime by adding durations to a reference date.

This transformer is useful when you have numeric time offsets (e.g., seconds, days) that need to be converted to actual datetime values by adding them to a reference date. The reference date can be a fixed datetime literal or a column containing dates.

Parameters:
  • subset (List[str]) – List of column names containing numeric time offsets to convert.

  • reference_date (Union[datetime, str]) –

    Reference date to add offsets to. Can be:

    • A datetime object: Same reference date for all rows

    • A string (column name): Different reference date per row from that column

  • unit (Literal["s", "m", "h", "d", "ms", "us"], default="s") –

    Time unit of the numeric offset columns:

    • ”s”: seconds

    • ”m”: minutes

    • ”h”: hours

    • ”d”: days

    • ”ms”: milliseconds

    • ”us”: microseconds

  • drop_columns (bool, default=False) – Whether to drop the original numeric offset columns after conversion.

Examples

>>> from gators.feature_generation_dt import DurationToDatetime
>>> import polars as pl
>>> from datetime import datetime
>>> X = pl.DataFrame({
...     'TransactionDT': [86400, 172800, 259200],  # seconds
...     'value': [100, 200, 300]
... })

Example 1: Convert seconds to datetime with fixed reference date

>>> transformer = DurationToDatetime(
...     subset=['TransactionDT'],
...     reference_date=datetime(2017, 11, 30),
...     unit='s',
...     drop_columns=False
... )
>>> result = transformer.fit_transform(X)
>>> print(result)
shape: (3, 3)
┌───────────────┬───────┬──────────────────────────┐
│ TransactionDT ┆ value ┆ TransactionDT__datetime  │
│ ---           ┆ ---   ┆ ---                      │
│ i64           ┆ i64   ┆ datetime[μs]             │
├───────────────┼───────┼──────────────────────────┤
│ 86400         ┆ 100   ┆ 2017-12-01 00:00:00      │
│ 172800        ┆ 200   ┆ 2017-12-02 00:00:00      │
│ 259200        ┆ 300   ┆ 2017-12-03 00:00:00      │
└───────────────┴───────┴──────────────────────────┘

Example 2: Convert with column-based reference dates

>>> X = pl.DataFrame({
...     'BaseDate': [datetime(2024, 1, 1), datetime(2024, 2, 1), datetime(2024, 3, 1)],
...     'offset_days': [7, 14, 21],
...     'value': [100, 200, 300]
... })
>>> transformer = DurationToDatetime(
...     subset=['offset_days'],
...     reference_date='BaseDate',  # column name
...     unit='d',
...     drop_columns=False
... )
>>> result = transformer.fit_transform(X)
>>> print(result)
shape: (3, 4)
┌─────────────────────┬──────────────┬───────┬─────────────────────────┐
│ BaseDate            ┆ offset_days  ┆ value ┆ offset_days__datetime   │
│ ---                 ┆ ---          ┆ ---   ┆ ---                     │
│ datetime[μs]        ┆ i64          ┆ i64   ┆ datetime[μs]            │
├─────────────────────┼──────────────┼───────┼─────────────────────────┤
│ 2024-01-01 00:00:00 ┆ 7            ┆ 100   ┆ 2024-01-08 00:00:00     │
│ 2024-02-01 00:00:00 ┆ 14           ┆ 200   ┆ 2024-02-15 00:00:00     │
│ 2024-03-01 00:00:00 ┆ 21           ┆ 300   ┆ 2024-03-22 00:00:00     │
└─────────────────────┴──────────────┴───────┴─────────────────────────┘

Example 3: Multiple columns with different units

>>> X = pl.DataFrame({
...     'offset_hours': [24, 48, 72],
...     'offset_minutes': [60, 120, 180],
...     'value': [1, 2, 3]
... })
>>> transformer1 = DurationToDatetime(
...     subset=['offset_hours'],
...     reference_date=datetime(2024, 1, 1),
...     unit='h'
... )
>>> transformer2 = DurationToDatetime(
...     subset=['offset_minutes'],
...     reference_date=datetime(2024, 1, 1),
...     unit='m'
... )
>>> result = transformer1.fit_transform(X)
>>> result = transformer2.fit_transform(result)
fit(X, y=None)[source]#

Fit the transformer by preparing the reference date expression.

Parameters:
  • X (DataFrame) – Input DataFrame.

  • y (Series | None) – Target variable. Not used, present here for compatibility.

Returns:

Fitted transformer instance.

Return type:

DurationToDatetime

transform(X)[source]#

Transform numeric offset columns to datetime columns.

Parameters:

X (DataFrame) – Input DataFrame to transform.

Returns:

Transformed DataFrame with datetime columns.

Return type:

DataFrame

class gators.feature_generation_dt.BusinessTimeFeatures[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Generates business time features from datetime columns.

Creates binary indicators and classifications for business-relevant time periods such as business hours, business days, and time of business day.

Parameters:
  • subset (Optional[List[str]], default=None) – List of datetime columns to extract features from. If None, all datetime columns will be used.

  • business_hours_start (int, default=9) – Start hour for business hours (24-hour format, 0-23).

  • business_hours_end (int, default=17) – End hour for business hours (24-hour format, 0-23).

  • weekend_days (List[int], default=[5, 6]) – Days of week considered weekend (0=Monday, 6=Sunday). Default is Saturday and Sunday.

  • features (List[str], default=["is_business_hour", "is_business_day", "time_of_business_day"]) –

    List of features to generate. Options:

    • ”is_business_hour”: Boolean for whether time is during business hours

    • ”is_business_day”: Boolean for whether day is a business day (not weekend)

    • ”time_of_business_day”: Category (before_hours, during_hours, after_hours)

    • ”hour_of_business_day”: Hour within business day (0-based from start)

  • drop_columns (bool, default=False) – Whether to drop the original datetime columns after feature extraction.

Examples

>>> from gators.feature_generation_dt import BusinessTimeFeatures
>>> import polars as pl
>>> X =pl.DataFrame({
...     'timestamp': [
...         '2024-01-15 08:00:00',  # Monday, before hours
...         '2024-01-15 10:30:00',  # Monday, during hours
...         '2024-01-15 18:00:00',  # Monday, after hours
...         '2024-01-20 10:00:00',  # Saturday, weekend
...     ]
... }).with_columns(
...     pl.col('timestamp').str.strptime(pl.Datetime, '%Y-%m-%d %H:%M:%S')
... )

Example 1: Default business time features

>>> transformer = BusinessTimeFeatures(
...     subset=['timestamp'],
...     features=['is_business_hour', 'is_business_day']
... )
>>> result = transformer.fit_transform(X)
>>> print(result)
shape: (4, 3)
┌─────────────────────┬───────────────────┬─────────────────┐
│ timestamp           ┆ timestamp__is_bus ┆ timestamp__is_b │
│ ---                 ┆ iness_hour        ┆ usiness_day     │
│ datetime[μs]        ┆ ---               ┆ ---             │
│                     ┆ bool              ┆ bool            │
├─────────────────────┼───────────────────┼─────────────────┤
│ 2024-01-15 08:00:00 ┆ false             ┆ true            │
│ 2024-01-15 10:30:00 ┆ true              ┆ true            │
│ 2024-01-15 18:00:00 ┆ false             ┆ true            │
│ 2024-01-20 10:00:00 ┆ true              ┆ false           │
└─────────────────────┴───────────────────┴─────────────────┘

Example 2: Time of business day classification

>>> transformer = BusinessTimeFeatures(
...     subset=['timestamp'],
...     features=['time_of_business_day'],
...     business_hours_start=9,
...     business_hours_end=17
... )
>>> result = transformer.fit_transform(X)
>>> print(result)
shape: (4, 2)
┌─────────────────────┬─────────────────────────────┐
│ timestamp           ┆ timestamp__time_of_business │
│ ---                 ┆ _day                        │
│ datetime[μs]        ┆ ---                         │
│                     ┆ str                         │
├─────────────────────┼─────────────────────────────┤
│ 2024-01-15 08:00:00 ┆ before_hours                │
│ 2024-01-15 10:30:00 ┆ during_hours                │
│ 2024-01-15 18:00:00 ┆ after_hours                 │
│ 2024-01-20 10:00:00 ┆ weekend                     │
└─────────────────────┴─────────────────────────────┘

Example 3: All features with custom hours

>>> transformer = BusinessTimeFeatures(
...     subset=['timestamp'],
...     features=['is_business_hour', 'is_business_day', 'time_of_business_day', 'hour_of_business_day'],
...     business_hours_start=8,
...     business_hours_end=18
... )
>>> result = transformer.fit_transform(X)
fit(X, y=None)[source]#

Fit the transformer by identifying datetime columns if not specified.

Parameters:
  • X (DataFrame) – Input DataFrame.

  • y (Series | None) – Target variable. Not used, present here for compatibility.

Returns:

Fitted transformer instance.

Return type:

BusinessTimeFeatures

transform(X)[source]#

Transform the input DataFrame by creating business time features.

Parameters:

X (DataFrame) – Input DataFrame to transform.

Returns:

Transformed DataFrame with business time features.

Return type:

DataFrame

class gators.feature_generation_dt.TimeBinFeatures[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Generates time bin features by categorizing datetime values into periods.

Bins datetime components into meaningful categories like part of day, season, time of month, etc. These categorical features are particularly useful for tree-based models to capture non-linear temporal patterns.

Parameters:
  • subset (Optional[List[str]], default=None) – List of datetime columns to extract features from. If None, all datetime columns will be used.

  • bin_types (List[Literal["part_of_day", "season", "time_of_month", "time_of_year", "rush_hour"]], default=["part_of_day", "season", "time_of_month", "time_of_year", "rush_hour"]) –

    Types of time bins to generate. Options:

    • ”part_of_day”: night, morning, afternoon, evening

    • ”season”: spring, summer, fall, winter

    • ”time_of_month”: beginning, middle, end

    • ”time_of_year”: early, mid, late

    • ”rush_hour”: morning_rush, evening_rush, off_peak

  • hemisphere (Literal["northern", "southern"], default="northern") – Hemisphere for season calculation.

  • drop_columns (bool, default=False) – Whether to drop the original datetime columns after feature extraction.

Examples

>>> from gators.feature_generation_dt import TimeBinFeatures
>>> import polars as pl
>>> X =pl.DataFrame({
...     'timestamp': [
...         '2024-01-15 06:00:00',
...         '2024-01-15 10:00:00',
...         '2024-01-15 14:00:00',
...         '2024-01-15 20:00:00',
...         '2024-07-15 14:00:00',
...     ]
... }).with_columns(
...     pl.col('timestamp').str.strptime(pl.Datetime, '%Y-%m-%d %H:%M:%S')
... )

Example 1: Part of day

>>> transformer = TimeBinFeatures(
...     subset=['timestamp'],
...     bin_types=['part_of_day']
... )
>>> result = transformer.fit_transform(X)
>>> print(result)
shape: (5, 2)
┌─────────────────────┬─────────────────────────┐
│ timestamp           ┆ timestamp__part_of_day  │
│ ---                 ┆ ---                     │
│ datetime[μs]        ┆ str                     │
├─────────────────────┼─────────────────────────┤
│ 2024-01-15 06:00:00 ┆ morning                 │
│ 2024-01-15 10:00:00 ┆ morning                 │
│ 2024-01-15 14:00:00 ┆ afternoon               │
│ 2024-01-15 20:00:00 ┆ evening                 │
│ 2024-07-15 14:00:00 ┆ afternoon               │
└─────────────────────┴─────────────────────────┘

Example 2: Season (northern hemisphere)

>>> transformer = TimeBinFeatures(
...     subset=['timestamp'],
...     bin_types=['season'],
...     hemisphere='northern'
... )
>>> result = transformer.fit_transform(X)
>>> print(result)
shape: (5, 2)
┌─────────────────────┬───────────────────┐
│ timestamp           ┆ timestamp__season │
│ ---                 ┆ ---               │
│ datetime[μs]        ┆ str               │
├─────────────────────┼───────────────────┤
│ 2024-01-15 06:00:00 ┆ winter            │
│ 2024-01-15 10:00:00 ┆ winter            │
│ 2024-01-15 14:00:00 ┆ winter            │
│ 2024-01-15 20:00:00 ┆ winter            │
│ 2024-07-15 14:00:00 ┆ summer            │
└─────────────────────┴───────────────────┘

Example 3: Multiple bin types

>>> transformer = TimeBinFeatures(
...     subset=['timestamp'],
...     bin_types=['part_of_day', 'time_of_month', 'rush_hour']
... )
>>> result = transformer.fit_transform(X)
fit(X, y=None)[source]#

Fit the transformer by identifying datetime columns if not specified.

Parameters:
  • X (DataFrame) – Input DataFrame.

  • y (Series | None) – Target variable. Not used, present here for compatibility.

Returns:

Fitted transformer instance.

Return type:

TimeBinFeatures

transform(X)[source]#

Transform the input DataFrame by creating time bin features.

Parameters:

X (DataFrame) – Input DataFrame to transform.

Returns:

Transformed DataFrame with time bin features.

Return type:

DataFrame

class gators.feature_generation_dt.TimeWindowFeatures[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Generates time-based window aggregation features (velocity features).

This transformer creates rolling window statistics over time periods, useful for detecting unusual patterns like transaction velocity, spending bursts, etc.

Features are computed looking backward from each row (excluding the current row), optionally grouped by categorical columns.

Parameters:
  • subset (List[str]) – List of numerical column names to aggregate over time windows.

  • time_column (str) – Name of the datetime column to use for time-based windowing.

  • windows (List[str]) –

    List of time window strings. Supported formats:

    • ’30m’ = 30 minutes

    • ’1h’ = 1 hour

    • ’24h’ = 24 hours

    • ’7d’ = 7 days

    • ’1M’ = 1 month (30 days)

    • ’1Y’ = 1 year (365 days)

  • by (Optional[List[str]], default=None) – Optional list of columns to group by. Windows are computed within each group. Example: [‘card1’] computes “transactions in last 24h for this card”

  • func (List[str], default=['count', 'mean']) –

    List of aggregation functions to apply. Available options:

    • ’count’: Count of rows in window

    • ’mean’: Mean of values in window

    • ’sum’: Sum of values in window

    • ’std’: Standard deviation in window

    • ’median’: Median in window

    • ’min’: Minimum value in window

    • ’max’: Maximum value in window

  • drop_columns (bool, default=False) – Whether to drop the original numerical columns after creating features.

  • new_column_names (Optional[List[str]], default=None) – List of custom names for the window columns. If None, uses default naming pattern.

Examples

>>> from gators.feature_generation import TimeWindowFeatures
>>> import polars as pl
>>> from datetime import datetime, timedelta
>>> # Sample transaction data
>>> X ={
...     'TransactionDT': [
...         datetime(2024, 1, 1, 10, 0),
...         datetime(2024, 1, 1, 10, 30),
...         datetime(2024, 1, 1, 11, 0),
...         datetime(2024, 1, 1, 12, 0),
...         datetime(2024, 1, 2, 10, 0),
...     ],
...     'TransactionAmt': [100, 150, 200, 120, 180],
...     'card1': ['A', 'A', 'A', 'B', 'A']
... }
>>> X = pl.DataFrame(X)

Example 1: Global time windows (no grouping)

>>> transformer = TimeWindowFeatures(
...     subset=['TransactionAmt'],
...     time_column='TransactionDT',
...     windows=['1h', '24h'],
...     func=['count', 'mean']
... )
>>> result = transformer.fit_transform(X)
>>> result.columns
['TransactionDT', 'TransactionAmt', 'card1',
 'count_TransactionAmt_1h', 'mean_TransactionAmt_1h',
 'count_TransactionAmt_24h', 'mean_TransactionAmt_24h']

Example 2: Grouped time windows (per card)

>>> transformer = TimeWindowFeatures(
...     subset=['TransactionAmt'],
...     time_column='TransactionDT',
...     windows=['1h', '24h'],
...     by=['card1'],
...     func=['count', 'sum']
... )
>>> result = transformer.fit_transform(X)
>>> # Creates: count/sum of TransactionAmt in last 1h/24h per card1

Example 3: Multiple windows for fraud detection

>>> transformer = TimeWindowFeatures(
...     subset=['TransactionAmt'],
...     time_column='TransactionDT',
...     windows=['30m', '1h', '3h', '24h', '7d'],
...     by=['card1'],
...     func=['count', 'mean', 'sum', 'max']
... )
>>> # Detects velocity: "Card has 50 transactions in last hour"

Notes

  • Data should be sorted by time_column for correct window calculations

  • Current row is EXCLUDED from window calculations

  • First rows (no history) have null values

  • Windows look backward from current time

  • Useful for velocity features, spending patterns, anomaly detection

fit(X, y=None)[source]#

Fit the transformer by generating column name mappings.

Parameters:
  • X (DataFrame) – Input DataFrame.

  • y (Series | None) – Target variable. Not used, present here for compatibility.

Returns:

Fitted transformer instance.

Return type:

TimeWindowFeatures

transform(X)[source]#

Transform the input DataFrame by creating time window features.

Parameters:

X (DataFrame) – Input DataFrame to transform. Should be sorted by time_column.

Returns:

Transformed DataFrame with time window features.

Return type:

DataFrame

class gators.feature_generation_dt.HolidayFeatures[source]#

Bases: BaseModel, BaseEstimator, TransformerMixin

Generates holiday-related features from datetime columns.

Detects holidays and calculates distance to nearest holidays, which is useful for retail, finance, and other domains where holidays affect patterns. Uses the holidays library for accurate, year-specific holiday dates.

Parameters:
  • subset (Optional[List[str]], default=None) – List of datetime columns to extract features from. If None, all datetime columns will be used.

  • country (str, default="US") – Country code for holidays (e.g., “US”, “UK”, “CA”, “DE”, “FR”, “JP”). Supports any country code from the holidays library. See https://pypi.org/project/holidays/ for full list of supported countries.

  • years (Optional[List[int]], default=None) – Years to include holidays for. If None, will automatically detect years from data.

  • features (List[str], default=["is_holiday", "days_to_holiday", "days_from_holiday"]) –

    Features to generate. Options:

    • ”is_holiday”: Boolean for whether date is a holiday

    • ”days_to_holiday”: Days until next holiday (negative if past)

    • ”days_from_holiday”: Days since last holiday (negative if future)

    • ”nearest_holiday_distance”: Absolute days to nearest holiday

  • drop_columns (bool, default=False) – Whether to drop the original datetime columns after feature extraction.

Examples

>>> from gators.feature_generation_dt import HolidayFeatures
>>> import polars as pl
>>> X =pl.DataFrame({
...     'date': [
...         '2024-01-01',  # New Year's Day
...         '2024-01-15',  # Around MLK Day
...         '2024-07-03',  # Day before Independence Day
...         '2024-07-04',  # Independence Day
...         '2024-07-05',  # Day after Independence Day
...     ]
... }).with_columns(
...     pl.col('date').str.strptime(pl.Datetime, '%Y-%m-%d')
... )

Example 1: Is holiday detection

>>> transformer = HolidayFeatures(
...     subset=['date'],
...     features=['is_holiday']
... )
>>> result = transformer.fit_transform(X)
>>> print(result)
shape: (5, 2)
┌─────────────────────┬──────────────────┐
│ date                ┆ date__is_holiday │
│ ---                 ┆ ---              │
│ datetime[μs]        ┆ bool             │
├─────────────────────┼──────────────────┤
│ 2024-01-01 00:00:00 ┆ true             │
│ 2024-01-15 00:00:00 ┆ true             │
│ 2024-07-03 00:00:00 ┆ false            │
│ 2024-07-04 00:00:00 ┆ true             │
│ 2024-07-05 00:00:00 ┆ false            │
└─────────────────────┴──────────────────┘

Example 2: Distance to holidays

>>> transformer = HolidayFeatures(
...     subset=['date'],
...     features=['nearest_holiday_distance']
... )
>>> result = transformer.fit_transform(X)

Example 3: UK holidays

>>> transformer = HolidayFeatures(
...     subset=['date'],
...     country='UK',
...     features=['is_holiday']
... )
>>> result = transformer.fit_transform(X)
fit(X, y=None)[source]#

Fit the transformer by identifying datetime columns and building holiday list.

Parameters:
  • X (DataFrame) – Input DataFrame.

  • y (Series | None) – Target variable. Not used, present here for compatibility.

Returns:

Fitted transformer instance.

Return type:

HolidayFeatures

transform(X)[source]#

Transform the input DataFrame by creating holiday features.

Parameters:

X (DataFrame) – Input DataFrame to transform.

Returns:

Transformed DataFrame with holiday features.

Return type:

DataFrame