Last updated:
0 purchases
bigcodeembeddings 0.1.2
# bigcode-embeddings
NOTE: data must be generated with [bigcode-ast-tools][2] before being able to use
this tool
bigcode-embeddings allows to generate and visualize embeddings for
AST nodes.
## Install
This project should be used with Python 3.
To install the package either run
` pip install bigcode-embeddings `
or clone the repository and run
` cd bigcode-embeddings pip install -r requirements.txt python setup.py install `
NOTE: tensorflow needs to be installed separately.
## Usage
### Training embeddings
Training data can be generated using [bigcode-ast-tools][2]
Given a data.txt.gz generated from a vocabulary of size 30000,
100D embeddings can be trained using
` ./bin/bigcode-embeddings train -o embeddings/ --vocab-size 30000 --emb-size 100 --l2-value 0.05 --learning-rate 0.01 data.txt.gz `
[Tensorboard][2] can be used to visualize the progress
` tensorboard --logdir embeddings/ `
After the first epoch, embeddings visualization becomes available from
Tensorboard. The vocabulary TSV file generated by bigcode-ast-tools can
be loaded to have labels on the embeddings.
### Visualizing the embeddings
Trained embeddings can be visualized using the visualize subcommand
If the generated vocabulary file is vocab.tsv, the above embeddings
can be visualized with the following command
` ./bin/data-explorer visualize clusters -m embeddings/embeddings.bin-STEP -l vocab.tsv `
where STEP should be the largest value found in the embeddings/ directory.
The -i flag can be passed to generate an interactive plot.
[1]: ../bigcode-ast-tools/README.md
[2]: https://github.com/tensorflow/tensorboard
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.