Last updated:
0 purchases
pycave 3.2.1
PyCave
PyCave allows you to run traditional machine learning models on CPU, GPU, and even on multiple
nodes. All models are implemented in PyTorch and provide an Estimator API
that is fully compatible with scikit-learn.
For Gaussian mixture model, PyCave allows for 100x speed ups when using a GPU and enables to train
on markedly larger datasets via mini-batch training. The full suite of benchmarks run to compare
PyCave models against scikit-learn models is available on the
documentation website.
PyCave version 3 is a complete rewrite of PyCave which is tested much more rigorously, depends on
well-maintained libraries and is tuned for better performance. While you are, thus, highly
encouraged to upgrade, refer to pycave-v2.borchero.com for
documentation on PyCave 2.
Features
Support for GPU and multi-node training by implementing models in PyTorch and relying on
PyTorch Lightning
Mini-batch training for all models such that they can be used on huge datasets
Well-structured implementation of models
High-level Estimator API allows for easy usage such that models feel and behave like in
scikit-learn
Medium-level LightingModule implements the training algorithm
Low-level PyTorch Module manages the model parameters
Installation
PyCave is available via pip:
pip install pycave
If you are using Poetry:
poetry add pycave
Usage
If you've ever used scikit-learn, you'll feel right at home when using PyCave. First, let's create
some artificial data to work with:
import torch
X = torch.cat([
torch.randn(10000, 8) - 5,
torch.randn(10000, 8),
torch.randn(10000, 8) + 5,
])
This dataset consists of three clusters with 8-dimensional datapoints. If you want to fit a K-Means
model, to find the clusters' centroids, it's as easy as:
from pycave.clustering import KMeans
estimator = KMeans(3)
estimator.fit(X)
# Once the estimator is fitted, it provides various properties. One of them is
# the `model_` property which yields the PyTorch module with the fitted parameters.
print("Centroids are:")
print(estimator.model_.centroids)
Due to the high-level estimator API, the usage for all machine learning models is similar. The API
documentation provides more detailed information about parameters that can be passed to estimators
and which methods are available.
GPU and Multi-Node training
For GPU- and multi-node training, PyCave leverages PyTorch Lightning. The hardware that training
runs on is determined by the
Trainer
class. It's
init
method provides various configuration options.
If you want to run K-Means with a GPU, you can pass the options accelerator='gpu' and devices=1
to the estimator's initializer:
estimator = KMeans(3, trainer_params=dict(accelerator='gpu', devices=1))
Similarly, if you want to train on 4 nodes simultaneously where each node has one GPU available,
you can specify this as follows:
estimator = KMeans(3, trainer_params=dict(num_nodes=4, accelerator='gpu', devices=1))
In fact, you do not need to change anything else in your code.
Implemented Models
Currently, PyCave implements three different models:
GaussianMixture
MarkovChain
K-Means
License
PyCave is licensed under the MIT License.
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.