MiniGemini
A multimodal large language model capable of understanding and generating images
CommonProductProgrammingMultimodalVisual Language Model
Mini-Gemini is a multimodal visual language model supporting a series of dense and MoE large language models ranging from 2B to 34B. It possesses capabilities for image understanding, reasoning, and generation. Based on LLaVA, it utilizes dual vision encoders to provide low-resolution visual embeddings and high-resolution candidate regions. It employs patch-level information mining to perform patch-level mining between high-resolution regions and low-resolution visual queries, fusing text and images for understanding and generation tasks. It supports multiple visual understanding benchmark tests, including COCO, GQA, OCR-VQA, and VisualGenome.
MiniGemini Visit Over Time
Monthly Visits
No Data
Bounce Rate
No Data
Page per Visit
No Data
Visit Duration
No Data
MiniGemini Visit Trend
No Visits Data
MiniGemini Visit Geography
No Geography Data
MiniGemini Traffic Sources
No Traffic Sources Data