ragbooster 0.1.1

Creator: bradpython12

Last updated:

Add to Cart

Description:

ragbooster 0.1.1

RAGBooster
RAGBooster improves the performance of retrieval-based large language models by learning which data sources are important to retrieve high quality data.
We provide an example notebook that shows how we boost RedPajama-INCITE-Instruct-3B-v1, a small LLM with 3 billion parameters to be on par with OpenAI's GPT3.5 (175 billion parameters) in a question answering task by using Bing websearch and ragbooster:

Demo notebook: Boosting RedPajama-INCITE-Instruct-3B-v1 for question answering

Furthermore, we have an additional example notebook, where we demonstrate how to boost a tiny qa model to get within 5% accuracy on GPT3.5 on a data imputation task:

Demo notebook: Boosting minilm-uncased-squad2 for data imputation

Core classes
At the core of RAGBooster are RetrievalAugmentedModels, which fetch external data to improve prediction quality. Retrieval augmentation requires two components:

A retriever, which retrieves external data for a prediction sample. We currently only implement a BingRetriever, which queries Microsoft's Bing Websearch API.
A generator, which generates the final prediction from the prediction sample and the external data. This is typically a large language model. We provide the Generator interface, which makes it very easy to leverage LLMs available via an API, for example from OpenAI.

Once you defined your retrieval-augmented model, you can leverage RAGBooster to boost its performance by learning the data importance of retrieval sources (e.g., domains in the web). This often increases accuracy by a few percent.
Background
Have a look at our paper on Improving Retrieval-Augmented Large Language Models with Data-Centric Refinement for detailed algorithms, proofs and experimental results.
Installation
RAGBooster is available as pip package, and can be installed as follows:
pip install ragbooster
Installation for Development

Requires Python 3.9 and Rust to be available


Clone the repository: git clone git@github.com:amsterdata/ragbooster.git
Change to the project directory: cd ragbooster
Create a virtualenv: python3.9 -m venv venv
Activate the virtualenv source venv/bin/activate
Install the dev dependencies with pip install ".[dev]"
Build the project maturin develop --release


Optional steps:

Run the tests with cargo test --release
Run the benchmarks with RUSTFLAGS="-C target-cpu=native" cargo bench
Run linting for the Python code with flake8 python
Start jupyter with jupyter notebook and run the example notebooks

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.