gators.sampling.SupervisedSampling¶
-
class
gators.sampling.
SupervisedSampling
(frac_dict: Dict[str, float], random_state: int = 0)[source]¶ Sample each class depending on the user input.
- Parameters
- n_samplesint
Number of samples to keep
Examples
Imports and initialization:
>>> from gators.sampling import SupervisedSampling >>> obj = SupervisedSampling(frac_dict={0: 0.5, 1:0.5, 2:1, 3:1})
The fit, transform, and fit_transform methods accept:
dask dataframes:
>>> import dask.dataframe as dd >>> import pandas as pd >>> X = dd.from_pandas(pd.DataFrame({ ... 'A': [0, 3, 6, 9, 12, 15], ... 'B': [1, 4, 7, 10, 13, 16], ... 'C': [2, 5, 8, 11, 14, 17]}), npartitions=1) >>> y = dd.from_pandas(pd.Series([0, 0, 1, 1, 2, 3], name='TARGET'), npartitions=1)
koalas dataframes:
>>> import databricks.koalas as ks >>> X = ks.DataFrame({ ... 'A': [0, 3, 6, 9, 12, 15], ... 'B': [1, 4, 7, 10, 13, 16], ... 'C': [2, 5, 8, 11, 14, 17]}) >>> y = ks.Series([0, 0, 1, 1, 2, 3], name='TARGET')
and pandas dataframes:
>>> import pandas as pd >>> X = pd.DataFrame({ ... 'A': [0, 3, 6, 9, 12, 15], ... 'B': [1, 4, 7, 10, 13, 16], ... 'C': [2, 5, 8, 11, 14, 17]}) >>> y = pd.Series([0, 0, 1, 1, 2, 3], name='TARGET')
The result is a transformed dataframe and series belonging to the same dataframe library.
>>> X, y = obj.transform(X, y) >>> X A B C 0 3 4 5 1 9 10 11 2 12 13 14 3 15 16 17 >>> y 0 0 1 1 2 2 3 3 Name: TARGET, dtype: int64
-
transform
(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame], y: Union[pd.Series, ks.Series, dd.Series]) → Tuple[Union[pd.DataFrame, ks.DataFrame, dd.DataFrame], Union[pd.Series, ks.Series, dd.Series]][source]¶ Fit and transform the dataframe X and the series y.
- Parameters
- XDataFrame
Input dataframe.
- ySeries
Input target.
- Returns
- XDataFrame
Sampled dataframe.
- ySeries
Sampled series.
-
static
check_dataframe
(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame])¶ Validate dataframe.
- Parameters
- XDataFrame
Input dataframe.
-
static
check_target
(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame], y: Union[pd.Series, ks.Series, dd.Series])¶ Validate target.
- Parameters
- XDataFrame
Dataframe.
- ySeries
Target values.