ragbooster 0.1.1

Last updated:

0 purchases

ragbooster 0.1.1 Image
ragbooster 0.1.1 Images
Add to Cart

Description:

ragbooster 0.1.1

RAGBooster
RAGBooster improves the performance of retrieval-based large language models by learning which data sources are important to retrieve high quality data.
We provide an example notebook that shows how we boost RedPajama-INCITE-Instruct-3B-v1, a small LLM with 3 billion parameters to be on par with OpenAI's GPT3.5 (175 billion parameters) in a question answering task by using Bing websearch and ragbooster:

Demo notebook: Boosting RedPajama-INCITE-Instruct-3B-v1 for question answering

Furthermore, we have an additional example notebook, where we demonstrate how to boost a tiny qa model to get within 5% accuracy on GPT3.5 on a data imputation task:

Demo notebook: Boosting minilm-uncased-squad2 for data imputation

Core classes
At the core of RAGBooster are RetrievalAugmentedModels, which fetch external data to improve prediction quality. Retrieval augmentation requires two components:

A retriever, which retrieves external data for a prediction sample. We currently only implement a BingRetriever, which queries Microsoft's Bing Websearch API.
A generator, which generates the final prediction from the prediction sample and the external data. This is typically a large language model. We provide the Generator interface, which makes it very easy to leverage LLMs available via an API, for example from OpenAI.

Once you defined your retrieval-augmented model, you can leverage RAGBooster to boost its performance by learning the data importance of retrieval sources (e.g., domains in the web). This often increases accuracy by a few percent.
Background
Have a look at our paper on Improving Retrieval-Augmented Large Language Models with Data-Centric Refinement for detailed algorithms, proofs and experimental results.
Installation
RAGBooster is available as pip package, and can be installed as follows:
pip install ragbooster
Installation for Development

Requires Python 3.9 and Rust to be available


Clone the repository: git clone [email protected]:amsterdata/ragbooster.git
Change to the project directory: cd ragbooster
Create a virtualenv: python3.9 -m venv venv
Activate the virtualenv source venv/bin/activate
Install the dev dependencies with pip install ".[dev]"
Build the project maturin develop --release


Optional steps:

Run the tests with cargo test --release
Run the benchmarks with RUSTFLAGS="-C target-cpu=native" cargo bench
Run linting for the Python code with flake8 python
Start jupyter with jupyter notebook and run the example notebooks

License:

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.