maestro 0.1.0
multimodal-maestro
👋 hello
Multimodal-Maestro gives you more control over large multimodal models to get the
outputs you want. With more effective prompting tactics, you can get multimodal models
to do tasks you didn't know (or think!) were possible. Curious how it works? Try our
HF space!
🚧 The project is still under construction and the API is prone to change.
💻 install
⚠️ Our package has been renamed to maestro. Install package in a
3.11>=Python>=3.8 environment.
pip install maestro
🚀 examples
GPT-4 Vision
Find dog.
>>> The dog is prominently featured in the center of the image with the label [9].
👉 read more
load image
import cv2
image = cv2.imread("...")
create and refine marks
import maestro as mm
generator = mm.SegmentAnythingMarkGenerator(device='cuda')
marks = generator.generate(image=image)
marks = mm.refine_marks(marks=marks)
visualize marks
mark_visualizer = mm.MarkVisualizer()
marked_image = mark_visualizer.visualize(image=image, marks=marks)
prompt
prompt = "Find dog."
response = mm.prompt_image(api_key=api_key, image=marked_image, prompt=prompt)
>>> "The dog is prominently featured in the center of the image with the label [9]."
extract related marks
masks = mm.extract_relevant_masks(text=response, detections=refined_marks)
>>> {'6': array([
... [False, False, False, ..., False, False, False],
... [False, False, False, ..., False, False, False],
... [False, False, False, ..., False, False, False],
... ...,
... [ True, True, True, ..., False, False, False],
... [ True, True, True, ..., False, False, False],
... [ True, True, True, ..., False, False, False]])
... }
🚧 roadmap
Documentation page.
Segment Anything guided marks generation.
Non-Max Suppression marks refinement.
LLaVA demo.
💜 acknowledgement
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding
in GPT-4V by Jianwei Yang, Hao Zhang, Feng Li, Xueyan
Zou, Chunyuan Li, Jianfeng Gao.
🦸 contribution
We would love your help in making this repository even better! If you noticed any bug,
or if you have any suggestions for improvement, feel free to open an
issue or submit a
pull request.
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.