vLLM-Omni Open Source: Integrating Diffusion Models, ViT, and LLM into a Pipeline, Completing Multimodal Inference in One Go
vLLM-Omni is the first 'full-modal' inference framework, enabling unified generation of text, images, audio, and video. It features a decoupled pipeline with modality encoders, an LLM core, and generators, supporting multi-modal I/O. Available on GitHub and installable via pip.....