🍿 Intro
Juxtapose is a 2D multi person pose detection, tracking, and estimation inference toolbox for sports + kinematics analysis. Visit Docs.

See how we integrated juxtapose into this app: Juxt Space
🍄 Overview
Code mostly adopted from four repos -> ultralytics, mmdeploy, mmdetection, mmpose.
Supported Detectors: rtmdet-s, rtmdet-m, rtmdet-l, groundingdino, yolov8
Supported Pose Estimators: rtmpose-s, rtmpose-m, rtmpose-l
Supported Trackers: bytetrack, botsort
Supported Point Trackers: Tapnet
🥒 Updates

2024/05/16 Remove ultralytics dependency, port yolov8 to run in ONNX directly to improve speed.
2024/04/27 Added FastAPI to EXE example with ONNX GPU Runtime in examples/fastapi-pyinstaller.
2024/01/11 Added Nextra docs + deployed to Vercel at sdk.juxt.space.
2024/01/07 Reduce dependencies by removing MMCV, MMDet, MMPose SDK, run fully on ONNX.
2023/11/01 Added juxtapose to PYPI repository so that we can install it using pip install juxtapose.
2023/08/25 Added custom region of interests (ROI) drawing tools that enables multi ROIs filtering while performing pose estimation/tracking. See usage below.
2023/08/15 Added GroundingDino & YOLOv8 object detector.
2023/08/09 Added keypoints streaming to csv file using csv module.
2023/07/31 Added ByteTrack and BotSORT. Completed engineering effort for top down inferences in any sources. See supported sources below.
2023/06/15 Converted RTMDET (s/m/l) and RTMPOSE (s/m/l) to ONNX using MMDeploy.

👉 Getting Started
Install Using PIP
pip install juxtapose
Note: If you faced any issues, kindly review this github issue
🧀 Local Development
git clone https://github.com/ziqinyeow/juxtapose
pip install .

🤩 Feel The Magic
🌄 Basic Usage
from juxtapose import RTM

# Init a rtm model (including rtmdet, rtmpose, tracker)
model = RTM(
det="rtmdet-m", # see type hinting
pose="rtmpose-m", # see type hinting
tracker="bytetrack", # see type hinting
device="cpu", # see type hinting

# Inference with directory (all the images and videos in the dir will get inference sequentially)

# Inference with image
model("data/football.jpeg", verbose=False) # verbose -> disable terminal printing

# Inference with video

# Inference with the YouTube Source
model("https://www.youtube.com/watch?v=1vYvTbDJuFs&ab_channel=PeterGrant", save=True)

🎨 Select Region of Interests (ROIs)
It will first prompt the user to draw the ROIs, press r to remove the existing ROI drawn.
After drawing, press SPACE or ENTER or q to accept the ROI drawn. The model will filter
out the bounding boxes based on the ROIs.
😁 Note: Press SPACE again to redraw the bounding boxes. See custom implementation with cv2 here.
from juxtapose import RTM

model = RTM(det="groundingdino", pose="rtmpose-l", tracker="none")
model("data/bike.mp4", roi="rect") # rectangle roi

# 1. Draw ROI first
# 2. Press r or R to reset ROI
# 3. Press SPACE or Enter or q or Q to continue with the ROI

🚴‍♂️ Accessing result for each frame: More Flexibility
# Adding custom plot
import cv2
from juxtapose import RTM, Annotator

model = RTM()
annotator = Annotator(thickness=3, font_color=(128, 128, 128)) # see rtm.utils.plotting

# set show to true -> cv2.imshow the frame (you can use cv2 to plot anything in the frame)
# set plot to false -> if you want to ignore default plot -> see rtm.rtm (line `if plot:`)
for result in model("data/bike.mp4", show=True, plot=False, stream=True):
# do what ever you want with the data
im, bboxes, kpts = result.im, result.bboxes, result.kpts

# e.g custom plot anything using cv2 API
im, "custom text", (100, 100), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (128, 128, 128)

# use the annotator class -> see rtm.utils.plotting
im, bboxes, labels=[f"children_{i}" for i in range(len(bboxes))]
annotator.draw_kpts(im, kpts, thickness=4)
annotator.draw_skeletons(im, kpts)

⚽️ Custom Forward Pass: Full Flexibility
# Custom model forward pass
import cv2
import torch
from juxtapose import RTMDet, RTMPose, Annotator

frame = cv2.imread("data/football.jpeg")
device = "cuda" if torch.cuda.is_available() else "cpu"

# s, m, l
rtmdet = RTMDet("l", device=device)
rtmpose = RTMPose("l", device=device)
annotator = Annotator()

bboxes, scores, labels = rtmdet(frame) # [[x1, y1, x2, y2], ...], [], []
kpts = rtmpose(frame, bboxes=bboxes) # shape: (number of human, 17, 2)

annotator.draw_bboxes(frame, bboxes, labels=[f"person_{i}" for i in range(len(bboxes))])
annotator.draw_kpts(frame, kpts, thickness=4)
annotator.draw_skeletons(frame, kpts)

cv2.imshow("frame", frame)

Supported Sources
Adopted from ultralytics repository -> see https://docs.ultralytics.com/modes/predict/


str or Path
Single image file.

URL to an image.

Capture a screenshot.

HWC format with RGB channels.

np.ndarray of uint8 (0-255)
HWC format with BGR channels.

np.ndarray of uint8 (0-255)
HWC format with BGR channels.

torch.Tensor of float32 (0.0-1.0)
BCHW format with RGB channels.

str or Path
CSV file containing paths to images, videos, or directories.

str or Path
Video file in formats like MP4, AVI, etc.

str or Path
Path to a directory containing images or videos.

Glob pattern to match multiple files. Use the * character as a wildcard.

URL to a YouTube video.

URL for streaming protocols such as RTSP, RTMP, or an IP address.


