Aquila-Resolve 0.1.4

Creator: codyrutscher

Last updated:

Add to Cart

Description:

AquilaResolve 0.1.4

Aquila Resolve - Grapheme-to-Phoneme Converter






Augmented Recurrent Neural G2P with Inflectional Orthography
Aquila Resolve presents a new approach for accurate and efficient English to
ARPAbet G2P resolution.
The pipeline employs a context layer, multiple transformer and n-gram morpho-orthographical search layers,
and an autoregressive recurrent neural transformer base. The current implementation offers state-of-the-art accuracy for out-of-vocabulary (OOV) words, as well as contextual
analysis for correct inferencing of English Heteronyms.
The package is offered in a pre-trained state that is ready for usage as a dependency or in
notebook environments. There are no additional resources needed, other than the model checkpoint which is
automatically downloaded on the first usage. See Installation for more information.
1. Dynamic Word Mappings based on context:
g2p.convert('I read the book, did you read it?')
# >> '{AY1} {R EH1 D} {DH AH0} {B UH1 K}, {D IH1 D} {Y UW1} {R IY1 D} {IH1 T}?'

g2p.convert('The researcher was to subject the subject to a test.')
# >> '{DH AH0} {R IY1 S ER0 CH ER0} {W AA1 Z} {T UW1} {S AH0 B JH EH1 K T} {DH AH0} {S AH1 B JH IH0 K T} {T UW1} {AH0} {T EH1 S T}.'





'The subject was told to read. Eight records were read in total.'




Ground Truth
The S AH1 B JH IH0 K T was told to R IY1 D. Eight R EH1 K ER0 D Z were R EH1 D in total.


Aquila Resolve
The S AH1 B JH IH0 K T was told to R IY1 D. Eight R EH1 K ER0 D Z were R EH1 D in total.


Deep Phonemizer(en_us_cmudict_forward.pt)
The S AH B JH EH K T was told to R EH D. Eight R AH K AO R D Z were R EH D in total.


CMUSphinx Seq2Seq(checkpoint)
The S AH1 B JH IH0 K T was told to R IY1 D. Eight R IH0 K AO1 R D Z were R IY1 D in total.


ESpeakNG (with phonecodes)
The S AH1 B JH EH K T was told to R IY1 D. Eight R EH1 K ER0 D Z were R IY1 D in total.



2. Leading Accuracy for unseen words:
g2p.convert('Did you kalpe the Hevinet?')
# >> '{AY1} {R EH1 D} {DH AH0} {B UH1 K}, {D IH1 D} {Y UW1} {R IY1 D} {IH1 T}?'





"tensorflow"
"agglomerative"
"necrophages"




Aquila Resolve
T EH1 N S ER0 F L OW2
AH0 G L AA1 M ER0 EY2 T IH0 V
N EH1 K R OW0 F EY2 JH IH0 Z


Deep Phonemizer(en_us_cmudict_forward.pt)
T EH N S ER F L OW
AH G L AA M ER AH T IH V
N EH K R OW F EY JH IH Z


CMUSphinx Seq2Seq(checkpoint)
T EH1 N S ER0 L OW0 F
AH0 G L AA1 M ER0 T IH0 V
N AE1 K R AH0 F IH0 JH IH0 Z


ESpeakNG (with phonecodes)
T EH1 N S OW0 R F L OW2
AA G L AA1 M ER0 R AH0 T IH2 V
N EH1 K R AH0 F IH JH EH0 Z



Installation
pip install aquila-resolve


A pre-trained model checkpoint (~106 MB) will be
automatically downloaded on the first use of relevant public methods that require inferencing. For example,
when instantiating G2p. You can also start this download manually by calling Aquila_Resolve.download().
If you are in an environment where remote file downloads are not possible, you can also transfer the checkpoint
manually, placing model.pt within the Aquila_Resolve.data module folder.

Usage
1. Module
from Aquila_Resolve import G2p

g2p = G2p()

g2p.convert('The book costs $5, will you read it?')
# >> '{DH AH0} {B UH1 K} {K AA1 S T S} {F AY1 V} {D AA1 L ER0 Z}, {W IH1 L} {Y UW1} {R IY1 D} {IH1 T}?'


Optional parameters when defining a G2p instance:




Parameter
Default
Description




device
'cpu'
Device for Pytorch inference model. GPU is supported using 'cuda'




Optional parameters when calling convert:




Parameter
Default
Description




process_numbers
True
Toggles conversion of some numbers and symbols to their spoken pronunciation forms. See numbers.py for details on what is covered.



2. Command Line
A simple wrapper for text conversion is available through the aquila-resolve command
~
❯ aquila-resolve
✔ Aquila Resolve v0.1.4
? Text to convert: I read the book, did you read it?
{AY1} {R EH1 D} {DH AH0} {B UH1 K}, {D IH1 D} {Y UW1} {R IY1 D} {IH1 T}?

Model Architecture
In evaluation[^1], neural G2P models have traditionally been extremely sensitive to orthographical variations
in graphemes. Attention-based mapping of contextual recognition has traditionally been poor for languages
like English with a low correlative relationship between grapheme and phonemes[^2]. Furthermore, both static
methods (i.e. CMU Dictionary), and dynamic methods (i.e.
G2p-seq2seq,
Phonetisaurus,
DeepPhonemizer)
incur a loss of sentence context during tokenization for training and inference, and therefore make it impossible
to accurately resolve words with multiple pronunciations based on grammatical context
(Heteronyms).
This model attempts to address these issues to optimize inference accuracy and run-time speed. The current architecture
employs additional natural language analysis steps, including Part-of-speech (POS) tagging, n-gram segmentation,
lemmatization searches, and word stem analysis. Some layers are universal for all text, such as POS tagging,
while others are activated when deemed required for the requested word. Layer information is retained with the token
in vectorized and tensor operations. This allows morphological variations of seen words, such as plurals, possessives,
compounds, inflectional stem affixes, and lemma variations to be resolved with near ground-truth level of accuracy.
This also improves out-of-vocabulary (OOV) inferencing accuracy, by truncating individual tensor size and
characteristics to be closer to seen data.
The inferencing layer is built as an autoregressive implementation of the forward
DeepPhonemizer model, as a 4-layer transformer with 256 hidden units.
The pre-trained checkpoint for Aquila Resolve
is trained using the CMU Dict v0.7b corpus, with 126,456 unique words. The validation dataset was split as a
uniform 5% sample of unique words, sorted by grapheme length. The learning rate was linearly increased during
the warmup steps, and step-decreased during fine-tuning.
Symbol Set

The 2 letter ARPAbet symbol set is used, with numbered vowel stress markers.

Vowels



Phoneme
Example

Phoneme
Example

Phoneme
Example

Phoneme
Example




AA0
Balm

AW0
Ourself

EY0
Mayday

OY0



AA1
Bot

AW1
Shout

EY1
Mayday

OY1



AA2
Cot

AW2
Outdo

EY2
airfreight

OY2



AE0
Bat

AY0
Ally

IH0
Cooking

UH0



AE1
Fast

AY1
Bias

IH1
Exist

UH1



AE2
Midland

AY2
Alibi

IH2
Outfit

UH2



AH0
Central

EH0
Enroll

IY0
Lady

UW0



AH1
Chunk

EH1
Bless

IY1
Beak

UW1



AH2
Outcome

EH2
Telex

IY2
Turnkey

UW2



AO0
Story

ER0
Chapter

OW0
Reo





AO1
Adore

ER1
Verb

OW1
So





AO2
Blog

ER2
Catcher

OW2
Cargo






License
The code in this project is released under Apache License 2.0.

References
[^1]: r-G2P: Evaluating and Enhancing Robustness of Grapheme to Phoneme Conversion by Controlled noise introducing
and Contextual information incorporation
[^2]: OTEANN: Estimating the Transparency of Orthographies with an Artificial
Neural Network

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.