benchbench 1.0.0

Last updated:

0 purchases

benchbench 1.0.0 Image
benchbench 1.0.0 Images
Add to Cart

Description:

benchbench 1.0.0

BenchBench is a Python package that provides a suite of tools to evaluate multi-task benchmarks focusing on
diversity and sensitivity against irrelevant variations, such as label noise injection and the addition of irrelevant
candidate models. This package facilitates comprehensive analysis of multi-task benchmarks through a social choice lens,
exposing the fundamental trade-off between diversity and stability in both cardinal and ordinal benchmarks.
For more information, including the motivations behind the measures and our empirical findings, please
see our paper.
Quick Start
To install the package, simply run:
pip install benchbench

Example Usage
To evaluate a cardinal benchmark, you can use the following code:
from benchbench.data import load_cardinal_benchmark
from benchbench.measures.cardinal import get_diversity, get_sensitivity

data, cols = load_cardinal_benchmark('GLUE')
diversity = get_diversity(data, cols)
sensitivity = get_sensitivity(data, cols)

To evaluate an ordinal benchmark, you can use the following code:
from benchbench.data import load_ordinal_benchmark
from benchbench.measures.ordinal import get_diversity, get_sensitivity

data, cols = load_ordinal_benchmark('HELM-accuracy')
diversity = get_diversity(data, cols)
sensitivity = get_sensitivity(data, cols)

To use your own benchmark, you just need to provide a pandas DataFrame and a list of columns indicating the tasks.
Check the documentation for more details.
Reproduce the Paper



One could check out cardinal.ipynb, ordinal.ipynb and banner.ipynb to reproduce our results using Google Colab with one click.

License:

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.