Last updated:
0 purchases
sherpa onnx
Supported functions #
Speech recognition
Speech synthesis
Speaker verification
Speaker identification
✔️
✔️
✔️
✔️
Spoken Language identification
Audio tagging
Voice activity detection
✔️
✔️
✔️
Keyword spotting
Add punctuation
✔️
✔️
Supported platforms #
Architecture
Android
iOS
Windows
macOS
linux
x64
✔️
✔️
✔️
✔️
x86
✔️
✔️
arm64
✔️
✔️
✔️
✔️
✔️
arm32
✔️
✔️
riscv64
✔️
Supported programming languages #
1. C++
2. C
3. Python
4. JavaScript
✔️
✔️
✔️
✔️
5. Java
6. C#
7. Kotlin
8. Swift
✔️
✔️
✔️
✔️
9. Go
10. Dart
11. Rust
12. Pascal
✔️
✔️
✔️
✔️
For Rust support, please see sherpa-rs
It also supports WebAssembly.
Introduction #
This repository supports running the following functions locally
Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
Text-to-speech (i.e., TTS)
Speaker identification
Speaker verification
Spoken language identification
Audio tagging
VAD (e.g., silero-vad)
Keyword spotting
on the following platforms and operating systems:
x86, x86_64, 32-bit ARM, 64-bit ARM (arm64, aarch64), RISC-V (riscv64)
Linux, macOS, Windows, openKylin
Android, WearOS
iOS
NodeJS
WebAssembly
Raspberry Pi
RV1126
LicheePi4A
VisionFive 2
旭日X3派
爱芯派
etc
with the following APIs
C++, C, Python, Go, C#
Java, Kotlin, JavaScript
Swift, Rust
Dart, Object Pascal
Links for Huggingface Spaces #
You can visit the following Huggingface spaces to try sherpa-onnx without
installing anything. All you need is a browser.
Description
URL
Speech recognition
Click me
Speech recognition with Whisper
Click me
Speech synthesis
Click me
Generate subtitles
Click me
Audio tagging
Click me
Spoken language identification with Whisper
Click me
We also have spaces built using WebAssembly. The are listed below:
Description
Huggingface space
ModelScope space
Voice activity detection with silero-vad
Click me
地址
Real-time speech recognition (Chinese + English) with Zipformer
Click me
地址
Real-time speech recognition (Chinese + English) with Paraformer
Click me
地址
Real-time speech recognition (Chinese + English + Cantonese) with Paraformer-large
Click me
地址
Real-time speech recognition (English)
Click me
地址
VAD + speech recognition (Chinese + English + Korean + Japanese + Cantonese) with SenseVoice
Click me
地址
VAD + speech recognition (English) with Whisper tiny.en
Click me
地址
VAD + speech recognition (English) with Zipformer trained with GigaSpeech
Click me
地址
VAD + speech recognition (Chinese) with Zipformer trained with WenetSpeech
Click me
地址
VAD + speech recognition (Japanese) with Zipformer trained with ReazonSpeech
Click me
地址
VAD + speech recognition (Thai) with Zipformer trained with GigaSpeech2
Click me
地址
VAD + speech recognition (Chinese 多种方言) with a TeleSpeech-ASR CTC model
Click me
地址
VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-large
Click me
地址
VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-small
Click me
地址
Speech synthesis (English)
Click me
地址
Speech synthesis (German)
Click me
地址
Links for pre-built Android APKs #
Description
URL
中国用户
Streaming speech recognition
Address
点此
Text-to-speech
Address
点此
Voice activity detection (VAD)
Address
点此
VAD + non-streaming speech recognition
Address
点此
Two-pass speech recognition
Address
点此
Audio tagging
Address
点此
Audio tagging (WearOS)
Address
点此
Speaker identification
Address
点此
Spoken language identification
Address
点此
Keyword spotting
Address
点此
Links for pre-built Flutter APPs #
Real-time speech recognition
Description
URL
中国用户
Streaming speech recognition
Address
点此
Text-to-speech
Description
URL
中国用户
Android (arm64-v8a, armeabi-v7a, x86_64)
Address
点此
Linux (x64)
Address
点此
macOS (x64)
Address
点此
macOS (arm64)
Address
点此
Windows (x64)
Address
点此
Note: You need to build from source for iOS.
Links for pre-built Lazarus APPs #
Generating subtitles
Description
URL
中国用户
Generate subtitles (生成字幕)
Address
点此
Links for pre-trained models #
Description
URL
Speech recognition (speech to text, ASR)
Address
Text-to-speech (TTS)
Address
VAD
Address
Keyword spotting
Address
Audio tagging
Address
Speaker identification (Speaker ID)
Address
Spoken language identification (Language ID)
See multi-lingual Whisper ASR models from Speech recognition
Punctuation
Address
Useful links #
Documentation: https://k2-fsa.github.io/sherpa/onnx/
Bilibili 演示视频: https://search.bilibili.com/all?keyword=%E6%96%B0%E4%B8%80%E4%BB%A3Kaldi
How to reach us #
Please see
https://k2-fsa.github.io/sherpa/social-groups.html
for 新一代 Kaldi 微信交流群 and QQ 交流群.
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.