Use late-interaction multi-modal models such as ColPali in just a few lines of code.
Dear ImGui: Bloat-free Graphical User interface for C++ with minimal dependencies
? The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
LlamaIndex is the leading framework for building LLM-powered agents over your data.
?? - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
The container platform tailored for Kubernetes multi-cloud, datacenter, and edge management ? ? ??
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.