GitLocker: The Coding Marketplace

Description:

openmmlaaudio 0.1.4.post5

🎙️ OpenMMLA Audio

Audio module of the mBox - an open multimodal learning analytic platform. For more details, please refer
to mBox System Design.
Table of Contents

Related Modules
Installation

Uber Server Setup
Audio Base & Server Setup
Standalone Setup

Usage

Real-time Audio Analyzer
Post-time Audio Analyzer

Log & Visualization
FAQ
Citation
References
License

Related Modules

mbox-uber
mbox-video

Installation
Uber Server Setup
Before setting up the audio base, you need to set up a server hosting the InfluxDB, Redis, and Mosquitto services.
Please refer to mbox-uber module.
Audio Base & Server Setup

Clone the repository
git clone https://github.com/ucph-ccs/mbox-audio.git

Install required system dependencies

Mac
brew install ffmpeg portaudio mecab llvm
echo 'export PATH="/opt/homebrew/opt/llvm/bin:$PATH"' >> ~/.zshrc
echo 'export LDFLAGS="-L/opt/homebrew/opt/llvm/lib"' >> ~/.zshrc
echo 'export CPPFLAGS="-I/opt/homebrew/opt/llvm/include"' >> ~/.zshrc
source ~/.zshrc

Ubuntu 24.04
sudo apt update && sudo apt upgrade
sudo apt install build-essential git ffmpeg python3-pyaudio libsndfile1 libasound-dev
wget https://files.portaudio.com/archives/pa_stable_v190700_20210406.tgz
tar -zxvf pa_stable_v190700_20210406.tgz
cd portaudio
./configure && make
sudo make install

Raspberry Pi Bullseye or later
sudo apt-get install portaudio19-dev

Install openmmla-audio

Set up Conda environment
# For Raspberry Pi
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh

# For Mac and Linux
wget "https://repo.anaconda.com/miniconda/Miniconda3-latest-$(uname)-$(uname -m).sh"
bash Miniconda3-latest-$(uname)-$(uname -m).sh

Install Audio Base
conda create -c conda-forge -n audio-base python==3.10.12 -y
conda activate audio-base
pip install openmmla-audio[base] # for Linux and Raspberry Pi
pip install 'openmmla-audio[base]' # for Mac

Install Audio Server
conda create -c conda-forge -n audio-server python==3.10.12 -y
conda activate audio-server
pip install openmmla-audio[server] # for Linux and Raspberry Pi
pip install 'openmmla-audio[server]' # for Mac

Set up folder structure
cd mbox-audio
./reset.sh

Standalone Setup
If you want to run the entire mBox Audio system on a single machine (not in distributed mode), follow these steps:

Set up the Uber Server on your machine following the instructions in
the mbox-uber module.

Install system dependencies as described in the "Audio Base & Server Setup" section above.

Install openmmla-audio with all dependencies:
conda create -c conda-forge -n mbox-audio python==3.10.12 -y
conda activate mbox-audio
pip install openmmla-audio[all] # for Linux and Raspberry Pi
pip install 'openmmla-audio[all]' # for Mac

Set up the folder structure:
cd mbox-audio
./reset.sh

This setup will allow you to run all components of mBox Audio on a single machine.
Usage
Real-time Audio Analyzer

To run the real-time audio analyzer:

Start Audio Server (optional)
./server.sh

This script runs distributed audio services on audio servers. To configure your audio server cluster: please refer to
the nginx setup
running
on your uber server.Default setup: Three servers (server-01.local, server-02.local, server-03.local) with five
services
(transcribe, separate, infer, enhance, vad).

Start Audio Base
./run.sh [-b <num_bases>] [-s <num_synchronizers>] [-l <standalone>] [-p <speech_separation>]

Default parameter settings:

-b: 3 (number of audio base)
-s: 1 (number of audio base synchronizer)
-l: false (not standalone)
-p: false (no speech separation)

💡 You can switch the operating mode of the audio base during runtime:

Mode
Description

Record
Record audio segments without recognition

Recognize
Recognize pre-recorded segments and synchronize

Full
Default: record and recognize simultaneously

Start Control Base
./control.sh

Post-time Audio Analyzer

To run the post-time audio analyzer:

Create a speaker corpus folder: /audio_db/post-time/[audio_file_name]/
Add speaker audio files named [speaker_name].wav to the corpus folder
Run the analyzer:
cd examples/
conda activate mbox-audio

# process single audio file (supported formats: wav, m4a, mp3), if not specified, then would process all files in the /audio/post-time/origin/ folder
python3 run_audio_post_analyzer.py -f [audio_file_name.wav]

Logs & Visualization
After running the analyzers, logs and visualizations are stored in the /logs/ and /visualizations/ folders.
FAQ
Citation
If you use this code in your research, please cite the following paper:
@inproceedings{inproceedings,
author = {Li, Zaibei and Jensen, Martin and Nolte, Alexander and Spikol, Daniel},
year = {2024},
month = {03},
pages = {785-791},
title = {Field report for Platform mBox: Designing an Open MMLA Platform},
doi = {10.1145/3636555.3636872}
}

References

NeMo
Silero VAD
Denoiser
MossFormer
Whisper

License
This project is licensed under the MIT License - see the LICENSE file for details.