Last updated:
0 purchases
bpc 0.1.3
BURMESE PHONEMIZER AND CLEANER(BPC)
Burmese Language data prepartion for speech related tasks.
Installation
$ pip install bpc
or
$ pip install git+git://github.com:1chimaruGin/Burmese_Phomizer_and_Cleaner.git
Usage
For text Cleaning
from bpc import Cleaner
cc = Cleaner()
cc.clean_text("မင်္ဂလာပါ? မင်္ဂလာပါ။ ၀န်းရံ ဝ၁၂၃၄ 5B")
# output: မင်္ဂလာပါ မင်္ဂလာပါ။ ဝန်းရံ ၀၁၂၃၄ ၅ဘီ
For phonemization
from bpc import BurmesePhoneme
bp = BurmesePhonemizer()
bp.text_to_phone("မင်္ဂလာပါ")
# output: ['m', 'ŋ', 'ɡ', 'l', 't', 's', 'p', 'ˈe']
For data preparation
from bpc.dataset import PrepareDataset
dataset = PrepareDataset()
dataset.prepare_data(path='path/to/dataset', method='kfold', save=True)
References
https://github.com/espnet/espnet
https://github.com/bootphon/phonemizer
Citations
@inproceedings{watanabe2018espnet,
author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
title={{ESPnet}: End-to-End Speech Processing Toolkit},
year={2018},
booktitle={Proceedings of Interspeech},
pages={2207--2211},
doi={10.21437/Interspeech.2018-1456},
url={http://dx.doi.org/10.21437/Interspeech.2018-1456
}
@article{Bernard2021,
doi = {10.21105/joss.03958},
url = {https://doi.org/10.21105/joss.03958},
year = {2021},
publisher = {The Open Journal},
volume = {6},
number = {68},
pages = {3958},
author = {Mathieu Bernard and Hadrien Titeux},
title = {Phonemizer: Text to Phones Transcription for Multiple Languages in Python},
journal = {Journal of Open Source Software}
}
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.