HomeAI Tutorial

Vision-to-VibeVoice-en

Public

A Gradio-based demo for end-to-end vision-to-speech inference: Extract text or descriptions from images using Qwen2.5-VL-7B-Instruct, then convert to natural speech audio via Microsoft VibeVoice-Realtime-0.5B.

Creat2025-12-06T00:33:33
Update2025-12-09T11:22:15
https://huggingface.co/spaces/prithivMLmods/Vision-to-VibeVoice-en
3
Stars
0
Stars Increase