benchbench 1.0.0

Last updated: September 3, 2024

0 purchases

Free

Donate

Creator: coderz1093

Languages

Python

Description:

benchbench 1.0.0

BenchBench is a Python package that provides a suite of tools to evaluate multi-task benchmarks focusing on
diversity and sensitivity against irrelevant variations, such as label noise injection and the addition of irrelevant
candidate models. This package facilitates comprehensive analysis of multi-task benchmarks through a social choice lens,
exposing the fundamental trade-off between diversity and stability in both cardinal and ordinal benchmarks.
For more information, including the motivations behind the measures and our empirical findings, please
see our paper.
Quick Start
To install the package, simply run:
pip install benchbench

Example Usage
To evaluate a cardinal benchmark, you can use the following code:
from benchbench.data import load_cardinal_benchmark
from benchbench.measures.cardinal import get_diversity, get_sensitivity

data, cols = load_cardinal_benchmark('GLUE')
diversity = get_diversity(data, cols)
sensitivity = get_sensitivity(data, cols)

To evaluate an ordinal benchmark, you can use the following code:
from benchbench.data import load_ordinal_benchmark
from benchbench.measures.ordinal import get_diversity, get_sensitivity

data, cols = load_ordinal_benchmark('HELM-accuracy')
diversity = get_diversity(data, cols)
sensitivity = get_sensitivity(data, cols)

To use your own benchmark, you just need to provide a pandas DataFrame and a list of columns indicating the tasks.
Check the documentation for more details.
Reproduce the Paper

One could check out cardinal.ipynb, ordinal.ipynb and banner.ipynb to reproduce our results using Google Colab with one click.

License:

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

There are no reviews.

zed

benchbench 1.0.0

Languages

Categories

Description:

License:

Share

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

More From This Creator

flutter_exts

desktop_info

structured_data

simplest

airex_flutter_plugin

benchbench 1.0.0

Languages

Categories

Description:

License:

Share

Customer Reviews

License

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

zed

More From This Creator

flutter_exts

desktop_info

structured_data

simplest

airex_flutter_plugin