gpt3datagen 0.1.0

Creator: bradpython12

Last updated:

0 purchases

gpt3datagen 0.1.0 Image
gpt3datagen 0.1.0 Images
Add to Cart

Description:

gpt3datagen 0.1.0

GPT3DataGen
GPT3DataGen is a python package that generates fake data for fine-tuning your openai models.
_ ___ _ _
( )_ /'_ ) ( ) ( )_
__ _ _ | ,_)(_)_) | _| | _ _ | ,_) _ _ __ __ ___
/'_ `\( '_`\ | | _(_ < /'_` | /'_` )| | /'_` ) /'_ `\ /'__`\/' _ `\
( (_) || (_) )| |_ ( )_) |( (_| |( (_| || |_ ( (_| |( (_) |( ___/| ( ) |
`\__ || ,__/'`\__)`\____)`\__,_)`\__,_)`\__)`\__,_)`\__ |`\____)(_) (_)v1.0.3
( )_) || | ( )_) |
\___/'(_) \___/'

Install with pip. See Install & Usage Guide
pip install -U gpt3datagen

Alternatively, the following command will pull and install the latest commit
from this repository, along with its Python dependencies:
pip install git+https://github.com/donwany/gpt3datagen.git --use-pep517

Or git clone repository:
git clone https://github.com/donwany/gpt3datagen.git
cd gpt3datagen
make install && pip install -e .

To update the package to the latest version of this repository, please run:
pip install --upgrade --no-deps --force-reinstall git+https://github.com/donwany/gpt3datagen.git

Command-Line Usage
Run the following to view all available options:
gpt3datagen --help
gpt3datagen --version

Output formats: jsonl, json, csv, tsv, xlsx
gpt3datagen \
--num_samples 500 \
--max_length 2048 \
--sample_type "classification" \
--output_format "jsonl" \
--output_dir .

gpt3datagen \
--num_samples 500 \
--max_length 2048 \
--sample_type completion \
--output_format csv \
--output_dir .

gpt3datagen \
--sample_type completion \
--output_format jsonl \
--output_dir .

gpt3datagen --sample_type completion -o . -f jsonl
gpt3datagen --sample_type news -o . -f jsonl

Data Format
{"prompt": "<prompt text> \n\n###\n\n", "completion": " <ideal generated text> END"}
{"prompt": "<prompt text> \n\n###\n\n", "completion": " <ideal generated text> END"}
{"prompt": "<prompt text> \n\n###\n\n", "completion": " <ideal generated text> END"}
...

Basic Usage
Only useful if you clone the repository
python prepare.py \
--num_samples 500 \
--max_length 2048 \
--sample_type "classification" \
--output_format "jsonl" \
--output_dir .

python prepare.py \
--num_samples 500 \
--max_length 2048 \
--sample_type "completion" \
--output_format "csv" \
--output_dir .

python prepare.py \
--num_samples 500 \
--max_length 2048 \
--sample_type "completion" \
--output_format "json" \
--output_dir /Users/<tsiameh>/Desktop

Validate Sample Data
pip install --upgrade openai

export OPENAI_API_KEY="<OPENAI_API_KEY>"

# validate sample datasets generated
openai tools fine_tunes.prepare_data -f <SAMPLE_DATA>.jsonl
openai tools fine_tunes.prepare_data -f <SAMPLE_DATA>.csv
openai tools fine_tunes.prepare_data -f <SAMPLE_DATA>.tsv
openai tools fine_tunes.prepare_data -f <SAMPLE_DATA>.json
openai tools fine_tunes.prepare_data -f <SAMPLE_DATA>.xlsx
openai tools fine_tunes.prepare_data -f /Users/<tsiameh>/Desktop/data_prepared.jsonl

# fine-tune
openai api fine_tunes.create \
-t <DATA_PREPARED>.jsonl \
-m <BASE_MODEL: davinci, curie, ada, babbage>

# List all created fine-tunes
openai api fine_tunes.list

Test Runs
# For multiclass classification
openai api fine_tunes.create \
-t <TRAIN_FILE_ID_OR_PATH> \
-v <VALIDATION_FILE_OR_PATH> \
-m <MODEL> \
--compute_classification_metrics \
--classification_n_classes <N_CLASSES>

# For binary classification
openai api fine_tunes.create \
-t <TRAIN_FILE_ID_OR_PATH> \
-v <VALIDATION_FILE_OR_PATH> \
-m <MODEL> \
--compute_classification_metrics \
--classification_n_classes 2 \
--classification_positive_class <POSITIVE_CLASS_FROM_DATASET>

Contribute
Please see CONTRIBUTING.
License
GPT3DataGen is released under the MIT License. See the bundled LICENSE file
for details.
Credits

Theophilus Siameh

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.