gators.clipping.Clipping¶
-
class
gators.clipping.
Clipping
(clip_dict: Dict[str, List[float]], inplace: bool = True)[source]¶ Trim values using the limits given by the user.
The data should be only composed of numerical columns. Use gators.encoders to replace the categorical columns by numerical ones before using Clipping.
- Parameters
- clip_dictDict[str, List[float]]
The keys are the columns to clip, the values are lists of two elements:
the first element is the lower limit
the second element is the upper limit
- dtypetype, default np.float64.
Numerical datatype of the output data.
Examples
Imports and initialization:
>>> from gators.binning import BinRareCategories >>> clip_dict = {'A':[-0.5, 0.5], 'B':[-0.5, 0.5], 'C':[-0., 1.]} >>> obj = Clipping(clip_dict=clip_dict)
The fit, transform, and fit_transform methods accept:
dask dataframes:
>>> import dask.dataframe as dd >>> import pandas as pd >>> X = dd.from_pandas(pd.DataFrame( ... {'A': [1.8, 2.2, 1.0, 0.4, 0.8], ... 'B': [0.4, 1.9, -0.2, 0.1, 0.1], ... 'C': [1.0, -1.0, -0.1, 1.5, 0.4]}), npartitions=1)
koalas dataframes:
>>> import databricks.koalas as ks >>> X = ks.DataFrame( ... {'A': [1.8, 2.2, 1.0, 0.4, 0.8], ... 'B': [0.4, 1.9, -0.2, 0.1, 0.1], ... 'C': [1.0, -1.0, -0.1, 1.5, 0.4]})
and pandas dataframes:
>>> import pandas as pd >>> X = pd.DataFrame( ... {'A': [1.8, 2.2, 1.0, 0.4, 0.8], ... 'B': [0.4, 1.9, -0.2, 0.1, 0.1], ... 'C': [1.0, -1.0, -0.1, 1.5, 0.4]})
The result is a transformed dataframe belonging to the same dataframe library.
>>> obj.fit_transform(X) A B C 0 0.5 0.4 1.0 1 0.5 0.5 -0.0 2 0.5 -0.2 -0.0 3 0.4 0.1 1.0 4 0.5 0.1 0.4
Independly of the dataframe library used to fit the transformer, the tranform_numpy method only accepts NumPy arrays and returns a transformed NumPy array. Note that this transformer should only be used when the number of rows is small e.g. in real-time environment.
>>> obj.transform_numpy(X.to_numpy()) array([[ 0.5, 0.4, 1. ], [ 0.5, 0.5, -0. ], [ 0.5, -0.2, -0. ], [ 0.4, 0.1, 1. ], [ 0.5, 0.1, 0.4]])
-
fit
(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame], y: Union[pd.Series, ks.Series, dd.Series] = None) → gators.clipping.clipping.Clipping[source]¶ Fit the transformer on the pandas/koalas dataframe X.
- Parameters
- XDataFrame
Input dataframe.
- ySeries, default None.
Target values.
- Returns
- self‘Clipping’
Instance of itself.
-
transform
(X)[source]¶ Transform the dataframe X.
- Parameters
- XDataFrame.
Input dataframe.
- Returns
- XDataFrame
Transformed dataframe.
-
transform_numpy
(X: numpy.ndarray) → numpy.ndarray[source]¶ Transform the array X.
- Parameters
- Xnp.ndarray:
Input array.
- Returns
- Xnp.ndarray
Transformed array.
-
static
check_array
(X: numpy.ndarray)¶ Validate array.
- Parameters
- Xnp.ndarray
Array.
-
check_array_is_numerics
(X: numpy.ndarray)¶ Check if array is only numerics.
- Parameters
- Xnp.ndarray
Array.
-
static
check_binary_target
(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame], y: Union[pd.Series, ks.Series, dd.Series])¶ Raise an error if the target is not binary.
- Parameters
- ySeries
Target values.
-
static
check_dataframe
(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame])¶ Validate dataframe.
- Parameters
- XDataFrame
Dataframe.
-
static
check_dataframe_contains_numerics
(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame])¶ Check if dataframe is only numerics.
- Parameters
- XDataFrame
Dataframe.
-
static
check_dataframe_is_numerics
(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame])¶ Check if dataframe is only numerics.
- Parameters
- XDataFrame
Dataframe.
-
check_dataframe_with_objects
(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame])¶ Check if dataframe contains object columns.
- Parameters
- XDataFrame
Dataframe.
-
check_datatype
(dtype, accepted_dtypes)¶ Check if dataframe is only numerics.
- Parameters
- XDataFrame
Dataframe.
-
static
check_multiclass_target
(y: Union[pd.Series, ks.Series, dd.Series])¶ Raise an error if the target is not discrete.
- Parameters
- ySeries
Target values.
-
check_nans
(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame], columns: List[str])¶ Raise an error if X contains NaN values.
- Parameters
- XDataFrame
Dataframe.
- theta_vecList[float]
List of columns.
-
static
check_regression_target
(y: Union[pd.Series, ks.Series, dd.Series])¶ Raise an error if the target is not discrete.
- Parameters
- ySeries
Target values.
-
static
check_target
(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame], y: Union[pd.Series, ks.Series, dd.Series])¶ Validate target.
- Parameters
- XDataFrame
Dataframe.
- ySeries
Target values.
-
fit_transform
(X: Union[pd.DataFrame, ks.DataFrame, dd.DataFrame], y: Union[pd.Series, ks.Series, dd.Series] = None) → Union[pd.DataFrame, ks.DataFrame, dd.DataFrame]¶ Fit and Transform the dataframe X.
- Parameters
- XDataFrame.
Input dataframe.
- ySeries, default None.
Input target.
- Returns
- XDataFrame
Transformed dataframe.
-
static
get_column_names
(inplace: bool, columns: List[str], suffix: str)¶ Return the names of the modified columns.
- Parameters
- inplacebool
If True return columns. If False return columns__suffix.
- columnsList[str]
List of columns.
- suffixstr
Suffix used if inplace is False.
- Returns
- List[str]
List of column names.
-
get_params
(deep=True)¶ Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsdict
Parameter names mapped to their values.
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfestimator instance
Estimator instance.