gators.sampling.UnsupervisedSampling¶
-
class
gators.sampling.
UnsupervisedSampling
(n_samples: int)[source]¶ Randomly sample the data and target.
- Parameters
- n_samplesint
Number of samples to keep.
Examples
>>> from gators.sampling import UnsupervisedSampling >>> obj = UnsupervisedSampling(n_samples=3)
The fit, transform, and fit_transform methods accept:
dask dataframes:
>>> import dask.dataframe as dd >>> import pandas as pd >>> X = dd.from_pandas(pd.DataFrame({ ... 'A': [0, 3, 6, 9, 12, 15], ... 'B': [1, 4, 7, 10, 13, 16], ... 'C': [2, 5, 8, 11, 14, 17]}), npartitions=1) >>> y = dd.from_pandas(pd.Series([0, 0, 1, 1, 2, 3], name='TARGET'), npartitions=1)
koalas dataframes:
>>> import databricks.koalas as ks >>> X = ks.DataFrame({ ... 'A': [0, 3, 6, 9, 12, 15], ... 'B': [1, 4, 7, 10, 13, 16], ... 'C': [2, 5, 8, 11, 14, 17]}) >>> y = ks.Series([0, 0, 1, 1, 2, 3], name='TARGET')
and pandas dataframes:
>>> import pandas as pd >>> X = pd.DataFrame({ ... 'A': [0, 3, 6, 9, 12, 15], ... 'B': [1, 4, 7, 10, 13, 16], ... 'C': [2, 5, 8, 11, 14, 17]}) >>> y = pd.Series([0, 0, 1, 1, 2, 3], name='TARGET')
The result is a transformed dataframe and series belonging to the same dataframe library.
>>> X, y = obj.transform(X, y) >>> X A B C 5 15 16 17 2 6 7 8 1 3 4 5 >>> y 5 3 2 1 1 0 Name: TARGET, dtype: int64
-
transform
(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame], y: Union[pd.Series, ks.Series, dd.Series]) → Tuple[Union[pd.DataFrame, ks.DataFrame, dd.DataFrame], Union[pd.Series, ks.Series, dd.Series]][source]¶ Fit and transform the dataframe X and the series y.
- Parameters
- XDataFrame
Input dataframe.
- ySeries
Input target.
- Returns
- XDataFrame
Sampled dataframe.
- ySeries
Sampled series.
-
static
check_dataframe
(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame])¶ Validate dataframe.
- Parameters
- XDataFrame
Input dataframe.
-
static
check_target
(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame], y: Union[pd.Series, ks.Series, dd.Series])¶ Validate target.
- Parameters
- XDataFrame
Dataframe.
- ySeries
Target values.