gators.feature_generation.PolynomialFeatures¶
- 
class gators.feature_generation.PolynomialFeatures(columns: List[str], degree=2, interaction_only=False)[source]¶
- Create new columns based on columns multiplication. - Parameters
- theta_vecList[float]
- List of columns. 
- degreeint, default = 2
- The degree of polynomial. The default of degree of 2 will produce A * A, B * B, and A * B from features A and B. 
- interaction_onlybool, default = False
- Allows to keep only interaction terms. If true, only A * B will be produced from features A and B. 
- dtypetype, default np.float64
- Numpy dtype of the output data. 
 
 - Examples - Imports and initialization: - >>> from gators.feature_generation import PolynomialFeatures >>> obj = PolynomialFeatures(columns=['A', 'B']) - The fit, transform, and fit_transform methods accept: - dask dataframes: 
 - >>> import dask.dataframe as dd >>> import pandas as pd >>> X = dd.from_pandas(pd.DataFrame( ... {'X': [200.0, 210.0], 'Y': [140.0, 160.0], 'Z': [100.0, 125.0]}), npartitions=1) - koalas dataframes: 
 - >>> import databricks.koalas as ks >>> X = ks.DataFrame( ... {'A': [0.0, 3.0, 6.0], 'B': [1.0, 4.0, 7.0], 'C': [2.0, 5.0, 8.0]}) - and pandas dataframes: 
 - >>> import pandas as pd >>> X = pd.DataFrame( ... {'A': [0.0, 3.0, 6.0], 'B': [1.0, 4.0, 7.0], 'C': [2.0, 5.0, 8.0]}) - The result is a transformed dataframe belonging to the same dataframe library. - >>> obj.fit_transform(X) A B C A__x__A A__x__B B__x__B 0 0.0 1.0 2.0 0.0 0.0 1.0 1 3.0 4.0 5.0 9.0 12.0 16.0 2 6.0 7.0 8.0 36.0 42.0 49.0 - >>> X = pd.DataFrame( ... {'A': [0.0, 3.0, 6.0], 'B': [1.0, 4.0, 7.0], 'C': [2.0, 5.0, 8.0]}) >>> _ = obj.fit(X) >>> obj.transform_numpy(X.to_numpy()) array([[ 0., 1., 2., 0., 0., 1.], [ 3., 4., 5., 9., 12., 16.], [ 6., 7., 8., 36., 42., 49.]]) - 
fit(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame], y: Union[pd.Series, ks.Series, dd.Series] = None) → gators.feature_generation.polynomial_features.PolynomialFeatures[source]¶
- Fit the dataframe X. - Parameters
- XDataFrame.
- Input dataframe. y (np.ndarray, optional): labels. Defaults to None. 
 
- Returns
- selfPolynomialFeatures
- Instance of itself. 
 
 
 - 
transform(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame]) → Union[pd.DataFrame, ks.DataFrame, dd.DataFrame][source]¶
- Transform the dataframe X. - Parameters
- XDataFrame.
- Input dataframe. 
 
- Returns
- XDataFrame
- Transformed dataframe. 
 
 
 - 
transform_numpy(X: numpy.ndarray) → numpy.ndarray[source]¶
- Transform the array X. - Parameters
- Xnp.ndarray
- Input array. 
 
- Returns
- Xnp.ndarray
- Transformed array. 
 
 
 - 
static check_array(X: numpy.ndarray)¶
- Validate array. - Parameters
- Xnp.ndarray
- Array. 
 
 
 - 
check_array_is_numerics(X: numpy.ndarray)¶
- Check if array is only numerics. - Parameters
- Xnp.ndarray
- Array. 
 
 
 - 
static check_binary_target(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame], y: Union[pd.Series, ks.Series, dd.Series])¶
- Raise an error if the target is not binary. - Parameters
- ySeries
- Target values. 
 
 
 - 
static check_dataframe(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame])¶
- Validate dataframe. - Parameters
- XDataFrame
- Dataframe. 
 
 
 - 
static check_dataframe_contains_numerics(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame])¶
- Check if dataframe is only numerics. - Parameters
- XDataFrame
- Dataframe. 
 
 
 - 
static check_dataframe_is_numerics(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame])¶
- Check if dataframe is only numerics. - Parameters
- XDataFrame
- Dataframe. 
 
 
 - 
check_dataframe_with_objects(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame])¶
- Check if dataframe contains object columns. - Parameters
- XDataFrame
- Dataframe. 
 
 
 - 
check_datatype(dtype, accepted_dtypes)¶
- Check if dataframe is only numerics. - Parameters
- XDataFrame
- Dataframe. 
 
 
 - 
static check_multiclass_target(y: Union[pd.Series, ks.Series, dd.Series])¶
- Raise an error if the target is not discrete. - Parameters
- ySeries
- Target values. 
 
 
 - 
check_nans(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame], columns: List[str])¶
- Raise an error if X contains NaN values. - Parameters
- XDataFrame
- Dataframe. 
- theta_vecList[float]
- List of columns. 
 
 
 - 
static check_regression_target(y: Union[pd.Series, ks.Series, dd.Series])¶
- Raise an error if the target is not discrete. - Parameters
- ySeries
- Target values. 
 
 
 - 
static check_target(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame], y: Union[pd.Series, ks.Series, dd.Series])¶
- Validate target. - Parameters
- XDataFrame
- Dataframe. 
- ySeries
- Target values. 
 
 
 - 
fit_transform(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame], y: Union[pd.Series, ks.Series, dd.Series] = None) → Union[pd.DataFrame, ks.DataFrame, dd.DataFrame]¶
- Fit and Transform the dataframe X. - Parameters
- XDataFrame.
- Input dataframe. 
- ySeries, default None.
- Input target. 
 
- Returns
- XDataFrame
- Transformed dataframe. 
 
 
 - 
static get_column_names(inplace: bool, columns: List[str], suffix: str)¶
- Return the names of the modified columns. - Parameters
- inplacebool
- If True return columns. If False return columns__suffix. 
- columnsList[str]
- List of columns. 
- suffixstr
- Suffix used if inplace is False. 
 
- Returns
- List[str]
- List of column names. 
 
 
 - 
get_params(deep=True)¶
- Get parameters for this estimator. - Parameters
- deepbool, default=True
- If True, will return the parameters for this estimator and contained subobjects that are estimators. 
 
- Returns
- paramsdict
- Parameter names mapped to their values. 
 
 
 - 
set_params(**params)¶
- Set the parameters of this estimator. - The method works on simple estimators as well as on nested objects (such as - Pipeline). The latter have parameters of the form- <component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
- Estimator parameters. 
 
- Returns
- selfestimator instance
- Estimator instance.