Ovis2-1B is the latest member of the Ovis series of multimodal large language models (MLLM), focusing on structural alignment of vision and text embeddings, featuring high performance for small models, enhanced reasoning capabilities, video and multi-image processing, and multilingual OCR enhancement.
Multimodal
TransformersMultiple Languages