PersianG2p 0.3.2

Creator: railscoder56

Last updated:

Add to Cart

Description:

PersianG2p 0.3.2

Simple persian (farsi) grapheme-to-phoneme converter

pip install PersianG2p

It uses this neural net to convertion persian texts (with arabic symbols) into phonemes text.

Simple persian (farsi) grapheme-to-phoneme converter

Features of farsi
How it works
"Tidy" argument
Comparison with epitran
Installation
Usage
Telegram bot @PersianG2Pbot
What can u do better



Features of farsi

arabic notation
the characters have different forms depended on position into word
vowels a, e, o are often not written but pronounced; for example:

سس pronounces sos but written ss
شش pronounces šeš but written šš
من pronounces man but written mn
سلام pronounces salām but written slām
شما pronounces šomā but written šmā
ممنون pronounces mamnun but written mmnun


the same symbols have different pronounces: in the word مو the symbol و pronounces u, but in the word میوه this symbol goes after vowel and pronounces v; the word تو pronounses to or tu depending on the meaning; symbol ه (hā-ye docešm) pronounces like a (e) in the word نه and like h it the word آنها
no overlap of vowel sounds
verbs are at the end of sentence
no sex
no cases
adjectives and definitions append to the end of nouns

How it works
There is the dictionary with 1867 pairs like (persian word, pronouncing of one); you also can load the dictionary with over 48 000 words by using use_large = True in constuctor. Some of these word (in English): water, there, feeling, use, people, throw, he, can, highway, was, hall, guarantee, production, sentence, account, god, self, they know, dollar, mind, novel, earthquake, organizing, weapons, personal, martyr, necessity, opinion, french, legal, london, deprived, people, studies, source, fruit, they take, system, the light, are, and, leg, bridge, what, done, do.
Firstly, your text is normalized by hazm, after --- tokenized.

If token is not a symbol of arabic alphabet then it does nothing.
If token is the word from dictionary then it chooses the pronouncing from dictionary.
Otherwise the pronouncing will be predicted by neural net.

If token was a word from dictionary then it's pronouncing is the word like ' t h i s ' (spaces between symbols and in the end and begin of word). If the word is continues then it's the predicted word. U can disable this option by setting secret = True.
"Tidy" argument



persian symbols
sound (tidy = False)
sound (tidy = True)




آ
A
ā


ش
S
š


ژ
Z
ž


چ
C
č


ء، ع
?
`



Comparison with epitran
Code



persian word
epitran convertion
PersianG2p conversion
expected




سلام
slɒm
salām
salām


ممنون
mmnvn
mamnun
mamnun


خب
xb
xab
xāb


ساحل
sɒhl
sāhel
sāhel


یخ
jx
yax
yax


لاغر
lɒɣr
lāġar
lāġar


پسته
پsth
peste
peste


مثلث
msls
mosles
mosles


سال ها
sɒl hɒ
sālehā
sālhā


لذت
lzt
lazt
lezzat


دژ

dož
dež


برف
brf
barf
barf


خدا حافظ
xdɒ hɒfz
x o d ā hāfez
xodā hāfez


دمپایی
dmپɒjj
dampāyi
dampāyi


نشستن
nʃstn
nešastan
nešastan


متأسفانه
mtɒʔsfɒnh
motsafe`āne
mota’assefāne



Installation
pip install PersianG2p

Usage
from PersianG2p import Persian_g2p_converter

PersianG2Pconverter = Persian_g2p_converter()
# or
## PersianG2Pconverter = Persian_g2p_converter(use_large = True)

PersianG2Pconverter.transliterate('ما الان درحال بازی بودیم', tidy = False)
# ' m A a l A n darhAl b A z i b u d i m '

PersianG2Pconverter.transliterate('ما الان درحال بازی بودیم')
# ' m ā a l ā n darhāl b ā z i b u d i m '

Persian_g2p_converter().transliterate( "زان یار دلنوازم شکریست با شکایت", secret = True)
# 'zān yār delnavāzam šokrist bā šekāyat'

PersianG2Pconverter.transliterate('نه تنها یک کلمه')
# ' n o h t a n h ā y e k kalame'

#object() and object.transliterate() are equal if they have same arguments
PersianG2Pconverter('نه تنها یک کلمه', secret = True)
# 'noh tanhA yek kalame'

Telegram bot @PersianG2Pbot
This telegram bot uses PersianG2P package. Write him to check results.
What can u do better


Fit better model (with another hyperparams or bigger dictionary)


Add many new words into dictionary. If u want, I will write Python/C# script for this task or even create Telegram bot

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.