0 purchases
TensorFlowASR 2.1.0
TensorFlowASR :zap:
Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2
TensorFlowASR implements some automatic speech recognition architectures such as DeepSpeech2, Jasper, RNN Transducer, ContextNet, Conformer, etc. These models can be converted to TFLite to reduce memory and computation for deployment :smile:
What's New?
Table of Contents
What's New?
Table of Contents
:yum: Supported Models
Baselines
Publications
Installation
Installing from source (recommended)
Installing via PyPi
Installing for development
Install for Apple Sillicon
Running in a container
Training & Testing Tutorial
Features Extraction
Augmentations
TFLite Convertion
Pretrained Models
Corpus Sources
English
Vietnamese
How to contribute
References & Credits
Contact
:yum: Supported Models
Baselines
Transducer Models (End2end models using RNNT Loss for training, currently supported Conformer, ContextNet, Streaming Transducer)
CTCModel (End2end models using CTC Loss for training, currently supported DeepSpeech2, Jasper)
Publications
Conformer Transducer (Reference: https://arxiv.org/abs/2005.08100)
See examples/models/transducer/conformer
ContextNet (Reference: http://arxiv.org/abs/2005.03191)
See examples/models/transducer/contextnet
RNN Transducer (Reference: https://arxiv.org/abs/1811.06621)
See examples/models/transducer/rnnt
Deep Speech 2 (Reference: https://arxiv.org/abs/1512.02595)
See examples/models/ctc/deepspeech2
Jasper (Reference: https://arxiv.org/abs/1904.03288)
See examples/models/ctc/jasper
Installation
For training and testing, you should use git clone for installing necessary packages from other authors (ctc_decoders, rnnt_loss, etc.)
Installing from source (recommended)
git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
# Tensorflow 2.x (with 2.x.x >= 2.5.1)
pip3 install ".[tf2.x]" # or ".[tf2.x-gpu]"
For anaconda3:
conda create -y -n tfasr tensorflow-gpu python=3.8 # tensorflow if using CPU, this makes sure conda install all dependencies for tensorflow
conda activate tfasr
pip install -U tensorflow-gpu # upgrade to latest version of tensorflow
git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
# Tensorflow 2.x (with 2.x.x >= 2.5.1)
pip3 install ".[tf2.x]" # or ".[tf2.x-gpu]"
Installing via PyPi
# Tensorflow 2.x (with 2.x >= 2.3)
pip3 install "TensorFlowASR[tf2.x]" # or pip3 install "TensorFlowASR[tf2.x-gpu]"
Installing for development
git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
pip3 install -e ".[dev]"
pip3 install -e ".[tf2.x]" # or ".[tf2.x-gpu]" or ".[tf2.x-apple]" for apple m1 machine
Install for Apple Sillicon
Due to tensorflow-text is not built for Apple Sillicon, we need to install it with the prebuilt wheel file from sun1638650145/Libraries-and-Extensions-for-TensorFlow-for-Apple-Silicon
git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
pip3 install -e "." # or pip3 install -e ".[dev] for development # or pip3 install "TensorFlowASR[dev]" from PyPi
pip3 install tensorflow~=2.14.0 # change minor version if you want
Do this after installing TensorFlowASR with tensorflow above
TF_VERSION="$(python3 -c 'import tensorflow; print(tensorflow.__version__)')" && \
TF_VERSION_MAJOR="$(echo $TF_VERSION | cut -d'.' -f1,2)" && \
PY_VERSION="$(python3 -c 'import platform; major, minor, patch = platform.python_version_tuple(); print(f"{major}{minor}");')" && \
URL="https://github.com/sun1638650145/Libraries-and-Extensions-for-TensorFlow-for-Apple-Silicon" && \
pip3 install "${URL}/releases/download/v${TF_VERSION_MAJOR}/tensorflow_text-${TF_VERSION_MAJOR}.0-cp${PY_VERSION}-cp${PY_VERSION}-macosx_11_0_arm64.whl"
Running in a container
docker-compose up -d
Training & Testing Tutorial
For training, please read tutorial_training
For testing, please read tutorial_testing
FYI: Keras builtin training uses infinite dataset, which avoids the potential last partial batch.
See examples for some predefined ASR models and results
Features Extraction
See features_extraction
Augmentations
See augmentations
TFLite Convertion
After converting to tflite, the tflite model is like a function that transforms directly from an audio signal to text and tokens
See tflite_convertion
Pretrained Models
Go to drive
Corpus Sources
English
Name
Source
Hours
LibriSpeech
LibriSpeech
970h
Common Voice
https://commonvoice.mozilla.org
1932h
Vietnamese
Name
Source
Hours
Vivos
https://ailab.hcmus.edu.vn/vivos
15h
InfoRe Technology 1
InfoRe1 (passwd: BroughtToYouByInfoRe)
25h
InfoRe Technology 2 (used in VLSP2019)
InfoRe2 (passwd: BroughtToYouByInfoRe)
415h
How to contribute
Fork the project
Install for development
Create a branch
Make a pull request to this repo
References & Credits
NVIDIA OpenSeq2Seq Toolkit
https://github.com/noahchalifour/warp-transducer
Sequence Transduction with Recurrent Neural Network
End-to-End Speech Processing Toolkit in PyTorch
https://github.com/iankur/ContextNet
Contact
Huy Le Nguyen
Email: [email protected]
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.