Last updated:
0 purchases
personalization 0.1.1
personalization
An end-to-end demo machine learning pipeline to provide an artifact for a real-time inference service
Aim
We want to create a machine learning training code which satisfies the following properties that given
data can train the model and save it to artifact
Solution
Our implementation of the package 'personalization'
We choose to use Polars to read data, it is roughly 2-3 times faster than Pandas and supports nice API for
aggregations and features creation.
For the model part, we decided to take lightGBM due to ts speed, small size (model artifact size up to 50 Mb on 300 million rows of search data) and explainability. The user should choose lightGBM parameters carefully.
We tested an example lightgbm params in notebooks/train.ipynb.
Offline evaluation
The offline evaluation has been done in notebooks/train.ipynb, we can see significant increase in NDCG levels across venues with our model against the baseline.
CICD: code style and PyPI
The code is checked with pre-commit configs, tested and published in Github Actions, current coverage is around 80 percent.
The inference service code can be found here https://github.com/ra312/model-server
How to run
Obtain sessions.csv and venues.csv and move them to the root folder
Install personalization
python -m pip instal personalization
Run the following command in shell to train pipeline and get artifact:
python3 -m personalization \
--sessions-bucket-path sessions.csv \
--venues-bucket-path venues.csv \
--objective lambdarank \
--num_leaves 100 \
--min_sum_hessian_in_leaf 10 \
--metric ndcg --ndcg_eval_at 10 20 \
--learning_rate 0.8 \
--force_row_wise True \
--num_iterations 10 \
--trained-model-path trained_model.joblib
TODO
Next steps:
Scalability(e.g. use Flyte)
Data: add support to ingest sessions and venues data from a database
Versioning: add MLFlow integration
Development
Clone this repository
Requirements:
Poetry
Python 3.8.1+
Create a virtual environment and install the dependencies
poetry install
Activate the virtual environment
poetry shell
Testing
pytest
Pre-commit
Pre-commit hooks run all the auto-formatters (e.g. black, isort), linters (e.g. mypy, flake8), and other quality
checks to make sure the changeset is in good shape before a commit/push happens.
You can install the hooks with (runs for each commit):
pre-commit install
Or if you want them to run only for each push:
pre-commit install -t pre-push
Or if you want e.g. want to run all checks manually for all files:
pre-commit run --all-files
This project was generated using the wolt-python-package-cookiecutter template.
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.