GitLocker: The Coding Marketplace

Description:

alaas 0.2.1

ALaaS: Active Learning as a Service.

Active Learning as a Service (ALaaS) is a fast and scalable framework for automatically selecting a subset to be labeled
from a full dataset so to reduce labeling cost. It provides a out-of-the-box and standalone experience for users to quickly
utilize active learning.
ALaaS is featured for

:hatching_chick: Easy-to-use With <10 lines of code to start the system to employ active learning.
:rocket: Fast Use the stage-level parallellism to achieve over 10x speedup than under-optimized active learning process.
:collision: Elastic Scale up and down multiple active workers, depending on the number of GPU devices.

The project is still under the active development. Welcome to join us!

Demo on AWS
Installation
Quick Start
ALaaS Server Customization (for Advance users)
Strategy Zoo
Citation

Demo on AWS :coffee:
Free ALaaS demo on AWS (Support HTTP & gRPC)
Use least confidence sampling with ResNet-18
to select images to be labeled for your tasks!
We have deployed ALaaS on AWS for demonstration. Try it by yourself!

Call ALaaS with HTTP 🌐
Call ALaaS with gRPC 🔐

curl \
-X POST http://13.213.29.8:8081/post \
-H 'Content-Type: application/json' \
-d '{"data":[{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png"},
{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane2.png"},
{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane3.png"},
{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png"},
{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane5.png"}],
"parameters": {"budget": 3},
"execEndpoint":"/query"}'

# pip install alaas
from alaas.client import Client

url_list = [
'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png',
'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane2.png',
'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane3.png',
'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png',
'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane5.png'
]
client = Client('grpc://13.213.29.8:60035')
print(client.query_by_uri(url_list, budget=3))

Then you will see 3 data samples (the most informative) has been selected from all the 5 data points by ALaaS.
Installation :construction:
You can easily install the ALaaS by PyPI,
pip install alaas

The package of ALaaS contains both client and server parts. You can build an active data selection service on your own
servers or just apply the client to perform data selection.
:warning: For deep learning frameworks like TensorFlow and Pytorch, you may need to install manually since the version to meet your deployment can be different (as well as transformers if you are running models from it).
You can also use Docker to run ALaaS:
docker pull huangyz0918/alaas

and start a service by the following command:
docker run -it --rm -p 8081:8081 \
--mount type=bind,source=<config path>,target=/server/config.yml,readonly huangyz0918/alaas:latest

Quick Start :truck:
After the installation of ALaaS, you can easily start a local server, here is the simplest example that can be executed with only 2 lines of code.
from alaas.server import Server

Server.start()

The example code (by default) will start an image data selection (PyTorch ResNet-18 for image classification task) HTTP server in port 8081 for you. After this, you can try to get the selection results on your own image dataset, a client-side example is like
curl \
-X POST http://0.0.0.0:8081/post \
-H 'Content-Type: application/json' \
-d '{"data":[{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png"},
{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane2.png"},
{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane3.png"},
{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png"},
{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane5.png"}],
"parameters": {"budget": 3},
"execEndpoint":"/query"}'

You can also use alaas.Client to build the query request (for both http and grpc protos) like this,
from alaas.client import Client

url_list = [
'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png',
'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane2.png',
'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane3.png',
'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png',
'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane5.png'
]
client = Client('http://0.0.0.0:8081')
print(client.query_by_uri(url_list, budget=3))

The output data is a subset uris/data in your input dataset, which indicates selected results for further data labeling.
ALaaS Server Customization :wrench:
We support two different methods to start your server, 1. by input parameters 2. by YAML configuration
Input Parameters
You can modify your server by setting different input parameters,
from alaas.server import Server

Server.start(proto='http', # the server proto, can be 'grpc', 'http' and 'https'.
port=8081, # the access port of your server.
host='0.0.0.0', # the access IP address of your server.
job_name='default_app', # the server name.
model_hub='pytorch/vision:v0.10.0', # the active learning model hub, the server will automatically download it for data selection.
model_name='resnet18', # the active learning model name (should be available in your model hub).
device='cpu', # the deploy location/device (can be something like 'cpu', 'cuda' or 'cuda:0').
strategy='LeastConfidence', # the selection strategy (read the document to see what ALaaS supports).
batch_size=1, # the batch size of data processing.
replica=1, # the number of workers to select/query data.
tokenizer=None, # the tokenizer name (should be available in your model hub), only for NLP tasks.
transformers_task=None # the NLP task name (for Hugging Face [Pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines)), only for NLP tasks.
)

YAML Configuration
You can also start the server by setting an input YAML configuration like this,
from alaas import Server

# start the server by an input configuration file.
Server.start_by_config('path_to_your_configuration.yml')

Details about building a configuration for your deployment scenarios can be found here.
Strategy Zoo :art:
Currently we supported several active learning strategies shown in the following table,

Type
Setting
Abbr
Strategy
Year
Reference

Random
Pool-base
RS
Random Sampling
-
-

Uncertainty
Pool
LC
Least Confidence Sampling
1994
DD Lew et al.

Uncertainty
Pool
MC
Margin Confidence Sampling
2001
T Scheffer et al.

Uncertainty
Pool
RC
Ratio Confidence Sampling
2009
B Settles et al.

Uncertainty
Pool
VRC
Variation Ratios Sampling
1965
EH Johnson et al.

Uncertainty
Pool
ES
Entropy Sampling
2009
B Settles et al.

Uncertainty
Pool
MSTD
Mean Standard Deviation
2016
M Kampffmeyer et al.

Uncertainty
Pool
BALD
Bayesian Active Learning Disagreement
2017
Y Gal et al.

Clustering
Pool
KCG
K-Center Greedy Sampling
2017
Ozan Sener et al.

Clustering
Pool
KM
K-Means Sampling
2011
Z Bodó et al.

Clustering
Pool
CS
Core-Set Selection Approach
2018
Ozan Sener et al.

Diversity
Pool
DBAL
Diverse Mini-batch Sampling
2019
Fedor Zhdanov

Adversarial
Pool
DFAL
DeepFool Active Learning
2018
M Ducoffe et al.

Citation
Our tech report of ALaaS is available on arxiv and NeurIPS 2022. Please cite as:
@article{huang2022active,
title={Active-Learning-as-a-Service: An Efficient MLOps System for Data-Centric AI},
author={Huang, Yizheng and Zhang, Huaizheng and Li, Yuanming and Lau, Chiew Tong and You, Yang},
journal={arXiv preprint arXiv:2207.09109},
year={2022}
}

Contributors ✨
Thanks goes to these wonderful people (emoji key):

Yizheng Huang🚇 ⚠️ 💻
Huaizheng🖋 ⚠️ 📖
Yuanming Li⚠️ 💻

This project follows the all-contributors specification. Contributions of any kind welcome!
Acknowledgement

Jina - Build cross-modal and multimodal applications on the cloud.
Transformers - State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

License
The theme is available as open source under the terms of the Apache 2.0 License.