Free

Languages

Python

Description:

tiramisuasr 0.2.3

TiramisuASR :cake:

Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2

TiramisuASR implements some speech recognition architectures such as CTC-based models (Deep Speech 2, etc.), RNN Transducer (Conformer, etc.). These models can be converted to TFLite to reduce memory and computation for deployment :smile:

What's New?

(10/10/2020) Update documents and upload package to pypi
(10/6/2020) Change nlpaug version to >=1.0.1
(9/18/2020) Support word-pieces (aka subwords) using tensorflow-datasets
Support transducer tflite greedy decoding (conversion and invocation)
Distributed training using tf.distribute.MirroredStrategy

:yum: Supported Models

CTCModel (End2end models using CTC Loss for training)
Transducer Models (End2end models using RNNT Loss for training)
Conformer Transducer (Reference: https://arxiv.org/abs/2005.08100)
See examples/conformer

Setup Environment and Datasets
Install tensorflow: pip3 install -U tensorflow or pip3 install tf-nightly (for using tflite)
Install packages (choose one of these options):

Run pip3 install -U tiramisu-asr
Clone the repo and run python3 setup.py install in the repo's directory

For setting up datasets, see datasets

For training, testing and using CTC Models, run ./scripts/install_ctc_decoders.sh

For training Transducer Models, export CUDA_HOME and run ./scripts/install_rnnt_loss.sh

Method tiramisu_asr.utils.setup_environment() enable mixed_precision if available.

To enable XLA, run TF_XLA_FLAGS=--tf_xla_auto_jit=2 $python_train_script

Clean up: python3 setup.py clean --all (this will remove /build contents)
TFLite Convertion
After converting to tflite, the tflite model is like a function that transforms directly from an audio signal to unicode code points, then we can convert unicode points to string.

Install tf-nightly using pip install tf-nightly
Build a model with the same architecture as the trained model (if model has tflite argument, you must set it to True), then load the weights from trained model to the built model
Load TFSpeechFeaturizer and TextFeaturizer to model using function add_featurizers
Convert model's function to tflite as follows:

func = model.make_tflite_function(greedy=True) # or False
concrete_func = func.get_concrete_function()
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.experimental_new_converter = True
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
tf.lite.OpsSet.SELECT_TF_OPS]
tflite_model = converter.convert()

Save the converted tflite model as follows:

if not os.path.exists(os.path.dirname(tflite_path)):
os.makedirs(os.path.dirname(tflite_path))
with open(tflite_path, "wb") as tflite_out:
tflite_out.write(tflite_model)

Then the .tflite model is ready to be deployed

Features Extraction
See features_extraction
Augmentations
See augmentations
Training & Testing
Example YAML Config Structure
speech_config: ...
model_config: ...
decoder_config: ...
learning_config:
augmentations: ...
dataset_config:
train_paths: ...
eval_paths: ...
test_paths: ...
tfrecords_dir: ...
optimizer_config: ...
running_config:
batch_size: 8
num_epochs: 20
outdir: ...
log_interval_steps: 500

See examples for some predefined ASR models and results
Corpus Sources and Pretrained Models
For pretrained models, go to drive
English

Name
Source
Hours

LibriSpeech
LibriSpeech
970h

Common Voice
https://commonvoice.mozilla.org
1932h

Vietnamese

Name
Source
Hours

Vivos
https://ailab.hcmus.edu.vn/vivos
15h

InfoRe Technology 1
InfoRe1 (passwd: BroughtToYouByInfoRe)
25h

InfoRe Technology 2 (used in VLSP2019)
InfoRe2 (passwd: BroughtToYouByInfoRe)
415h

German

Name
Source
Hours

Common Voice
https://commonvoice.mozilla.org/
750h

References & Credits

NVIDIA OpenSeq2Seq Toolkit
https://github.com/noahchalifour/warp-transducer
Sequence Transduction with Recurrent Neural Network
End-to-End Speech Processing Toolkit in PyTorch

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

tiramisu-asr 0.2.3

Languages

Categories

Description:

License

Share

Files In This Product:

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

zed

tiramisu-asr 0.2.3

Languages

Categories

Description:

License

Share

Files In This Product:

Customer Reviews

License

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

zed