About Gators

Gators was created to help data scientists to:

  • Perform in-memory and out-of-core memory data pre-processing for model building.

  • Fast real-time pre-processing and model scoring.

History of development

In 2018, Gators development began at Simility and had been open sourced in 2021,

Timeline

  • 2018: Development of Gators started.

  • 2020: Dask, Koalas and Cython packages are added to tackle out-of-core memory datasets and fast real-time pre-processing.

  • 2021: Gators becomes open-sourced.

Library Highlights

  • Data pre-processing can be done for both in-memory and out-of-memory datasets using the same interface.

  • Using Cython, the real-time data pre-processing is carried out on NumPy arrays with compiled C-code leading to fast response times, similar to compiled-languages.

Our Vision

A world where data scientists can develop and push their models in production using only Python, even when there are a large number of queries per second (QPS).

Python packages leveraged in gators

“If I have seen further it is by standing on the shoulders of giants.”

Sir Isaac Newton

gators uses a variety of libraries internally, at each step of the model building process.

Below is the list of libraries used.

Data pre-processing

../_images/pandas_logo.png

The well-known package for data analysis is used for data pre-processing during the model building phase. This package should be used as long as the data can fit in memory.

../_images/koalas_logo.png

Koalas is one of the two libraries chosen to handle the preprocessing when the data does not fit in memory.

../_images/dask_logo.png

Dask can also be used to handle the preprocessing when the data does not fit in memory.

../_images/numpy_logo.png

NumPy is used in the production environment when the pre-processing needs to be as fast as possible.

../_images/cython_logo.jpeg

In the production environment, the pre-processing with be done by pre-compiled Cython code on NumPy arrays.

Model building

../_images/sklearn_logo.png

The most well known package for model building is used for cross-validation and model evaluation.

../_images/xgboost_logo.png

Decision tree-based package used for model building. XGBoost algorithm applies level-wise tree growth.

../_images/lightgbm_logo.png

Decision tree-based package used for model building. LightGBM algorithm applies leaf-wise tree growth.

../_images/treelite_logo.png

Treelite is used to compile the trained models in C before being deployed in production, and treelite-runtime is used for real-time model scoring.