gators.sampling.SupervisedSampling

class gators.sampling.SupervisedSampling(frac_dict: Dict[str, float], random_state: int = 0)[source]

Sample each class depending on the user input.

Parameters
n_samplesint

Number of samples to keep

Examples

Imports and initialization:

>>> from gators.sampling import SupervisedSampling
>>> obj = SupervisedSampling(frac_dict={0: 0.5, 1:0.5, 2:1, 3:1})

The fit, transform, and fit_transform methods accept:

  • dask dataframes:

>>> import dask.dataframe as dd
>>> import pandas as pd
>>> X = dd.from_pandas(pd.DataFrame({
... 'A': [0, 3, 6, 9, 12, 15],
... 'B': [1, 4, 7, 10, 13, 16],
... 'C': [2, 5, 8, 11, 14, 17]}), npartitions=1)
>>> y = dd.from_pandas(pd.Series([0, 0, 1, 1, 2, 3], name='TARGET'), npartitions=1)
  • koalas dataframes:

>>> import databricks.koalas as ks
>>> X = ks.DataFrame({
... 'A': [0, 3, 6, 9, 12, 15],
... 'B': [1, 4, 7, 10, 13, 16],
... 'C': [2, 5, 8, 11, 14, 17]})
>>> y = ks.Series([0, 0, 1, 1, 2, 3], name='TARGET')
  • and pandas dataframes:

>>> import pandas as pd
>>> X = pd.DataFrame({
... 'A': [0, 3, 6, 9, 12, 15],
... 'B': [1, 4, 7, 10, 13, 16],
... 'C': [2, 5, 8, 11, 14, 17]})
>>> y = pd.Series([0, 0, 1, 1, 2, 3], name='TARGET')

The result is a transformed dataframe and series belonging to the same dataframe library.

>>> X, y = obj.transform(X, y)
>>> X
    A   B   C
0   3   4   5
1   9  10  11
2  12  13  14
3  15  16  17
>>> y
0    0
1    1
2    2
3    3
Name: TARGET, dtype: int64
transform(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame], y: Union[pd.Series, ks.Series, dd.Series]) → Tuple[Union[pd.DataFrame, ks.DataFrame, dd.DataFrame], Union[pd.Series, ks.Series, dd.Series]][source]

Fit and transform the dataframe X and the series y.

Parameters
XDataFrame

Input dataframe.

ySeries

Input target.

Returns
XDataFrame

Sampled dataframe.

ySeries

Sampled series.

static check_dataframe(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame])

Validate dataframe.

Parameters
XDataFrame

Input dataframe.

static check_target(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame], y: Union[pd.Series, ks.Series, dd.Series])

Validate target.

Parameters
XDataFrame

Dataframe.

ySeries

Target values.