GitLocker: The Coding Marketplace

Description:

openmmlavision 0.1.0.post3

🎥 OpenMMLA Vision

Video module of the mBox - an open multimodal learning analytic platform. For more details, please refer
to mBox System Design.
Table of Contents

Related Modules
Installation

Uber Server Setup
Video Base & Server Setup
Standalone Setup

Usage

Realtime Indoor-Positioning
Video Frame Analyzer

Visualization
FAQ
Citation
References
License

Related Modules

mbox-uber
mbox-audio

Installation
Uber Server Setup
Before setting up the video base, you need to set up a server hosting the InfluxDB, Redis, Mosquitto, and Nginx
services. Please refer to mbox-uber module.
Video Base & Server Setup

Clone the repository
git clone https://github.com/ucph-ccs/mbox-video.git

Install openmmla-vision

Set up Conda environment
# For Raspberry Pi
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh

# For Mac and Linux
wget "https://repo.anaconda.com/miniconda/Miniconda3-latest-$(uname)-$(uname -m).sh"
bash Miniconda3-latest-$(uname)-$(uname -m).sh

Install Video Base
conda create -c conda-forge -n video-base python=3.10.12 -y
conda activate video-base
pip install openmmla-vision[base] # for Linux and Raspberry Pi
pip install 'openmmla-vision[base]' # for Mac

Install Video Server
The video server provides video frame analyzer services.
conda create -c conda-forge -n video-server python=3.10.12 -y
conda activate video-server
pip install openmmla-vision[server] # for Linux and Raspberry Pi
pip install 'openmmla-vision[server]' # for Mac

Set up folder structure
cd mbox-video
./reset.sh

Standalone Setup
If you want to run the entire mBox Video system on a single machine, follow these steps:

Set up the Uber Server on your machine following the instructions in
the mbox-uber module.

Install openmmla-vision with all dependencies:
conda create -c conda-forge -n mbox-video python=3.10.12 -y
conda activate mbox-video
pip install openmmla-vision[all] # for Linux and Raspberry Pi
pip install 'openmmla-vision[all]' # for Mac

Set up the folder structure:
cd mbox-video
./reset.sh

This setup will allow you to run all components of mBox Video on a single machine.
Usage
Realtime Indoor-Positioning

Stream video from camera(s)

Distributed: stream on each camera host machine (e.g. Raspberry Pi, Mac, Linux, etc.)
Centralized: stream to a centralized RTMP server (e.g. client/server, see Raspberry Pi RTMP streaming setup)

Calibrate camera's intrinsic parameters

Print chessboard image from ./camera_calib/pattern/ and stick it on a flat surface
Capture chessboard image with your camera and calibrate it by running ./calib_camera.sh

Synchronize multi-cameras' coordinate systems
Calculate transformation matrix between main and alternative cameras:
./sync_camera.sh [-d <num_cameras>] [-s <num_sync_managers>]

Default parameter settings:

-d: 2 (number of cameras to sync)
-s: 1 (number of camera sync manager)

Modes:

Centralized:
./sync_camera.sh -d 2 -s 1

Distributed:
# On camera host (e.g., Raspberry Pi)
./sync_camera.sh -d 1 -s 0
# On synchronizer (e.g., MacBook)
./sync_camera.sh -d 0 -s 1

Run real-time indoor-positioning system
./run.sh [-b <num_bases>] [-s <num_synchronizers>] [-v <num_visualizers>] [-g <display_graphics>] [-r <record_frames>] [-v <store_visualizations>]

Default parameter settings:

-b: 1 (number of video base)
-s: 1 (number of video base synchronizer)
-v: 1 (number of visualizer)
-g: true (display graphic window)
-r: false (record video frames as images)
-v: false (store real-time visualizations)

Modes:

Centralized:
./run.sh

Distributed:
# On camera host (e.g., Raspberry Pi)
./run.sh -b 1 -s 0 -v 0 -g false
# On synchronizer (e.g., MacBook)
./run.sh -b 0 -s 1 -v 1

Video Frame Analyzer

Serve VLM and LLM on video server

vllm
vllm serve openbmb/MiniCPM-V-2_6 --dtype auto --max-model-len 2048 --port 8000 --api-key token-abc123 --gpu_memory_utilization 1 --trust-remote-code --enforce-eager
vllm serve microsoft/Phi-3-small-128k-instruct --dtype auto --max-model-len 1028 --port 8001 --api-key token-abc123 --gpu_memory_utilization 0.8 --trust-remote-code --enforce-eager

ollama
Install Ollama from official website.
ollama pull llava:13b
ollama pull llama3.1

Configure conf/video_base.ini
[Server]
backend = ollama
top_p = 0.1
temperature = 0
vlm_model = llava:13b
llm_model = llama3.1

Serve frame analyzer on video server
cd examples/
python serve_video_frame_analyzer.py

Run client script on video base
python request_video_frame_analyzer.py

Visualization
After running the analyzers, logs and visualizations are stored in the /logs/ and /visualizations/ folders.
The following image shows a simple demo of the video frame analyzer:

FAQ
Citation
If you use this code in your research, please cite the following paper:
@inproceedings{inproceedings,
author = {Li, Zaibei and Jensen, Martin and Nolte, Alexander and Spikol, Daniel},
year = {2024},
month = {03},
pages = {785-791},
title = {Field report for Platform mBox: Designing an Open MMLA Platform},
doi = {10.1145/3636555.3636872}
}

References

apriltags

License
This project is licensed under the MIT License - see the LICENSE file for details.