openmmlaaudio 0.1.4.post5
🎙️ OpenMMLA Audio
Audio module of the mBox - an open multimodal learning analytic platform. For more details, please refer
to mBox System Design.
Table of Contents
Related Modules
Installation
Uber Server Setup
Audio Base & Server Setup
Standalone Setup
Usage
Real-time Audio Analyzer
Post-time Audio Analyzer
Log & Visualization
FAQ
Citation
References
License
Related Modules
mbox-uber
mbox-video
Installation
Uber Server Setup
Before setting up the audio base, you need to set up a server hosting the InfluxDB, Redis, and Mosquitto services.
Please refer to mbox-uber module.
Audio Base & Server Setup
Clone the repository
git clone https://github.com/ucph-ccs/mbox-audio.git
Install required system dependencies
Mac
brew install ffmpeg portaudio mecab llvm
echo 'export PATH="/opt/homebrew/opt/llvm/bin:$PATH"' >> ~/.zshrc
echo 'export LDFLAGS="-L/opt/homebrew/opt/llvm/lib"' >> ~/.zshrc
echo 'export CPPFLAGS="-I/opt/homebrew/opt/llvm/include"' >> ~/.zshrc
source ~/.zshrc
Ubuntu 24.04
sudo apt update && sudo apt upgrade
sudo apt install build-essential git ffmpeg python3-pyaudio libsndfile1 libasound-dev
wget https://files.portaudio.com/archives/pa_stable_v190700_20210406.tgz
tar -zxvf pa_stable_v190700_20210406.tgz
cd portaudio
./configure && make
sudo make install
Raspberry Pi Bullseye or later
sudo apt-get install portaudio19-dev
Install openmmla-audio
Set up Conda environment
# For Raspberry Pi
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh
# For Mac and Linux
wget "https://repo.anaconda.com/miniconda/Miniconda3-latest-$(uname)-$(uname -m).sh"
bash Miniconda3-latest-$(uname)-$(uname -m).sh
Install Audio Base
conda create -c conda-forge -n audio-base python==3.10.12 -y
conda activate audio-base
pip install openmmla-audio[base] # for Linux and Raspberry Pi
pip install 'openmmla-audio[base]' # for Mac
Install Audio Server
conda create -c conda-forge -n audio-server python==3.10.12 -y
conda activate audio-server
pip install openmmla-audio[server] # for Linux and Raspberry Pi
pip install 'openmmla-audio[server]' # for Mac
Set up folder structure
cd mbox-audio
./reset.sh
Standalone Setup
If you want to run the entire mBox Audio system on a single machine (not in distributed mode), follow these steps:
Set up the Uber Server on your machine following the instructions in
the mbox-uber module.
Install system dependencies as described in the "Audio Base & Server Setup" section above.
Install openmmla-audio with all dependencies:
conda create -c conda-forge -n mbox-audio python==3.10.12 -y
conda activate mbox-audio
pip install openmmla-audio[all] # for Linux and Raspberry Pi
pip install 'openmmla-audio[all]' # for Mac
Set up the folder structure:
cd mbox-audio
./reset.sh
This setup will allow you to run all components of mBox Audio on a single machine.
Usage
Real-time Audio Analyzer
To run the real-time audio analyzer:
Start Audio Server (optional)
./server.sh
This script runs distributed audio services on audio servers. To configure your audio server cluster: please refer to
the nginx setup
running
on your uber server.Default setup: Three servers (server-01.local, server-02.local, server-03.local) with five
services
(transcribe, separate, infer, enhance, vad).
Start Audio Base
./run.sh [-b <num_bases>] [-s <num_synchronizers>] [-l <standalone>] [-p <speech_separation>]
Default parameter settings:
-b: 3 (number of audio base)
-s: 1 (number of audio base synchronizer)
-l: false (not standalone)
-p: false (no speech separation)
💡 You can switch the operating mode of the audio base during runtime:
Mode
Description
Record
Record audio segments without recognition
Recognize
Recognize pre-recorded segments and synchronize
Full
Default: record and recognize simultaneously
Start Control Base
./control.sh
Post-time Audio Analyzer
To run the post-time audio analyzer:
Create a speaker corpus folder: /audio_db/post-time/[audio_file_name]/
Add speaker audio files named [speaker_name].wav to the corpus folder
Run the analyzer:
cd examples/
conda activate mbox-audio
# process single audio file (supported formats: wav, m4a, mp3), if not specified, then would process all files in the /audio/post-time/origin/ folder
python3 run_audio_post_analyzer.py -f [audio_file_name.wav]
Logs & Visualization
After running the analyzers, logs and visualizations are stored in the /logs/ and /visualizations/ folders.
FAQ
Citation
If you use this code in your research, please cite the following paper:
@inproceedings{inproceedings,
author = {Li, Zaibei and Jensen, Martin and Nolte, Alexander and Spikol, Daniel},
year = {2024},
month = {03},
pages = {785-791},
title = {Field report for Platform mBox: Designing an Open MMLA Platform},
doi = {10.1145/3636555.3636872}
}
References
NeMo
Silero VAD
Denoiser
MossFormer
Whisper
License
This project is licensed under the MIT License - see the LICENSE file for details.
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.