sherpa_onnx

Creator: coderz1093

Last updated:

Add to Cart

Description:

sherpa onnx

Supported functions #



Speech recognition
Speech synthesis
Speaker verification
Speaker identification




✔️
✔️
✔️
✔️






Spoken Language identification
Audio tagging
Voice activity detection




✔️
✔️
✔️






Keyword spotting
Add punctuation




✔️
✔️



Supported platforms #



Architecture
Android
iOS
Windows
macOS
linux




x64
✔️

✔️
✔️
✔️


x86
✔️

✔️




arm64
✔️
✔️
✔️
✔️
✔️


arm32
✔️



✔️


riscv64




✔️



Supported programming languages #



1. C++
2. C
3. Python
4. JavaScript




✔️
✔️
✔️
✔️






5. Java
6. C#
7. Kotlin
8. Swift




✔️
✔️
✔️
✔️






9. Go
10. Dart
11. Rust
12. Pascal




✔️
✔️
✔️
✔️



For Rust support, please see sherpa-rs
It also supports WebAssembly.
Introduction #
This repository supports running the following functions locally

Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
Text-to-speech (i.e., TTS)
Speaker identification
Speaker verification
Spoken language identification
Audio tagging
VAD (e.g., silero-vad)
Keyword spotting

on the following platforms and operating systems:

x86, x86_64, 32-bit ARM, 64-bit ARM (arm64, aarch64), RISC-V (riscv64)
Linux, macOS, Windows, openKylin
Android, WearOS
iOS
NodeJS
WebAssembly
Raspberry Pi
RV1126
LicheePi4A
VisionFive 2
旭日X3派
爱芯派
etc

with the following APIs

C++, C, Python, Go, C#
Java, Kotlin, JavaScript
Swift, Rust
Dart, Object Pascal

Links for Huggingface Spaces #
You can visit the following Huggingface spaces to try sherpa-onnx without
installing anything. All you need is a browser.



Description
URL




Speech recognition
Click me


Speech recognition with Whisper
Click me


Speech synthesis
Click me


Generate subtitles
Click me


Audio tagging
Click me


Spoken language identification with Whisper
Click me



We also have spaces built using WebAssembly. The are listed below:



Description
Huggingface space
ModelScope space




Voice activity detection with silero-vad
Click me
地址


Real-time speech recognition (Chinese + English) with Zipformer
Click me
地址


Real-time speech recognition (Chinese + English) with Paraformer
Click me
地址


Real-time speech recognition (Chinese + English + Cantonese) with Paraformer-large
Click me
地址


Real-time speech recognition (English)
Click me
地址


VAD + speech recognition (Chinese + English + Korean + Japanese + Cantonese) with SenseVoice
Click me
地址


VAD + speech recognition (English) with Whisper tiny.en
Click me
地址


VAD + speech recognition (English) with Zipformer trained with GigaSpeech
Click me
地址


VAD + speech recognition (Chinese) with Zipformer trained with WenetSpeech
Click me
地址


VAD + speech recognition (Japanese) with Zipformer trained with ReazonSpeech
Click me
地址


VAD + speech recognition (Thai) with Zipformer trained with GigaSpeech2
Click me
地址


VAD + speech recognition (Chinese 多种方言) with a TeleSpeech-ASR CTC model
Click me
地址


VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-large
Click me
地址


VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-small
Click me
地址


Speech synthesis (English)
Click me
地址


Speech synthesis (German)
Click me
地址



Links for pre-built Android APKs #



Description
URL
中国用户




Streaming speech recognition
Address
点此


Text-to-speech
Address
点此


Voice activity detection (VAD)
Address
点此


VAD + non-streaming speech recognition
Address
点此


Two-pass speech recognition
Address
点此


Audio tagging
Address
点此


Audio tagging (WearOS)
Address
点此


Speaker identification
Address
点此


Spoken language identification
Address
点此


Keyword spotting
Address
点此



Links for pre-built Flutter APPs #
Real-time speech recognition



Description
URL
中国用户




Streaming speech recognition
Address
点此



Text-to-speech



Description
URL
中国用户




Android (arm64-v8a, armeabi-v7a, x86_64)
Address
点此


Linux (x64)
Address
点此


macOS (x64)
Address
点此


macOS (arm64)
Address
点此


Windows (x64)
Address
点此




Note: You need to build from source for iOS.

Links for pre-built Lazarus APPs #
Generating subtitles



Description
URL
中国用户




Generate subtitles (生成字幕)
Address
点此



Links for pre-trained models #



Description
URL




Speech recognition (speech to text, ASR)
Address


Text-to-speech (TTS)
Address


VAD
Address


Keyword spotting
Address


Audio tagging
Address


Speaker identification (Speaker ID)
Address


Spoken language identification (Language ID)
See multi-lingual Whisper ASR models from Speech recognition


Punctuation
Address



Useful links #

Documentation: https://k2-fsa.github.io/sherpa/onnx/
Bilibili 演示视频: https://search.bilibili.com/all?keyword=%E6%96%B0%E4%B8%80%E4%BB%A3Kaldi

How to reach us #
Please see
https://k2-fsa.github.io/sherpa/social-groups.html
for 新一代 Kaldi 微信交流群 and QQ 交流群.

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.