dict-from-dict 0.0.4

Creator: bigcodingguy24

Last updated:

Add to Cart

Description:

dictfromdict 0.0.4

dict-from-dict







Command-line interface (CLI) to create a pronunciation dictionary from an other pronunciation dictionary with the possibility of ignoring punctuation and splitting on hyphens before lookup.
Features

ignore casing of words while lookup
trimming symbols at start and end of word before lookup
separate word on hyphen before lookup

if the dictionary contains words with hyphens they will be considered first (see example below)


words with multiple pronunciations are supported

weights will be multiplied for hyphenated words (see example below)


outputting OOV words
multiprocessing

Installation
pip install dict-from-dict --user

Usage
dict-from-dict-cli

Example
# Create example vocabulary
cat > /tmp/vocabulary.txt << EOF
Test?
abc,
"def
Test-def.
"xyz?
"uv-w?
EOF

# Create example dictionary
cat > /tmp/dictionary.dict << EOF
test 0.7 T E0 S T
test 0.3 T E1 S T
def 0.4 D E0 F
def 0.6 D E1 F
xyz 2.0 ?
"xyz? 1.0 ' X Y Z ??
uv 2.0 ?
w 2.0 ?
uv-w 1.0 U V - W
EOF

# Create dictionary from vocabulary and example dictionary
dict-from-dict-cli \
/tmp/vocabulary.txt \
/tmp/dictionary.dict --consider-weights \
/tmp/result.dict \
--ignore-case --split-on-hyphen \
--trim "?" "\"" "," "." \
--n-jobs 4 \
--oov-out /tmp/oov.txt

cat /tmp/result.dict
# -------
# Output:
# -------
Test? 0.7 T E0 S T ?
Test? 0.3 T E1 S T ?
"def 0.4 " D E0 F
"def 0.6 " D E1 F
Test-def. 0.27999999999999997 T E0 S T - D E0 F .
Test-def. 0.42 T E0 S T - D E1 F .
Test-def. 0.12 T E1 S T - D E0 F .
Test-def. 0.18 T E1 S T - D E1 F .
"xyz? 1.0 ' X Y Z ??
"uv-w? 1.0 " U V - W ?
# -------

cat /tmp/oov.txt
# -------
# Output:
# -------
# abc,
# -------

Development setup
# update
sudo apt update
# install Python 3.8-3.12 for ensuring that tests can be run
sudo apt install python3-pip \
python3.8 python3.8-dev python3.8-distutils python3.8-venv \
python3.9 python3.9-dev python3.9-distutils python3.9-venv \
python3.10 python3.10-dev python3.10-distutils python3.10-venv \
python3.11 python3.11-dev python3.11-distutils python3.11-venv \
python3.12 python3.12-dev python3.12-distutils python3.12-venv
# install pipenv for creation of virtual environments
python3.8 -m pip install pipenv --user

# check out repo
git clone https://github.com/stefantaubert/pronunciation-dict-creation.git
cd pronunciation-dict-creation
# create virtual environment
python3.8 -m pipenv install --dev

Running the tests
# first install the tool like in "Development setup"
# then, navigate into the directory of the repo (if not already done)
cd pronunciation-dict-creation
# activate environment
python3.8 -m pipenv shell
# run tests
tox

Final lines of test result output:
py38: commands succeeded
py39: commands succeeded
py310: commands succeeded
py311: commands succeeded
py312: commands succeeded
congratulations :)

License
MIT License
Acknowledgments
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410
Citation
If you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see About => Cite this repository).
Taubert, S. (2024). dict-from-dict (Version 0.0.4) [Computer software]. https://doi.org/10.5281/zenodo.10560441

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.