Best LLaVA-1.5 AI Tools & Models - Premium LLaVA-1.5 News

AI News

Microsoft Open Sources Multimodal Model LLaVA-1.5 Comparable to GPT-4V Performance

Microsoft has open-sourced the multimodal model LLaVA-1.5, inheriting the LLaVA architecture and introducing new features. Researchers have tested it in visual question answering, natural language processing, image generation, and other areas, showing that LLaVA-1.5 has reached the highest level among open-source models.

8k 4 days ago

Microsoft Open Sources Multimodal Model LLaVA-1.5 Comparable to GPT-4V Performance

Models

Llava 1.5 7b Hf Q4_K_M GGUF

Marwan02

This model is a GGUF format conversion of llava-hf/llava-1.5-7b-hf, supporting image-to-text generation tasks.

Multimodal Gguf

GgufEnglish

Marwan02

Llava 1.5 13b Hf I1 GGUF

mradermacher

This project provides weighted/matrix quantized versions of the llava-1.5-13b-hf model, including various quantization types to meet the usage requirements in different scenarios.

Llava 1.5 13b Hf GGUF

mradermacher

This is a static quantized version of the llava-hf/llava-1.5-13b-hf model, offering multiple quantization type options to help users use this vision-language model more efficiently. The model supports image understanding and text generation tasks.

LLaVA V1.5 7B Plant Leaf Diseases Detection

YuchengShi

A multimodal foundation model fine-tuned based on LLaVA-1.5-7B, optimized for plant leaf disease detection and interpretation

Prism Qwen25 Extra Dinosiglip 224px 0_5b

Stanford-ILIAD

A multimodal vision-language model trained on the Llava-1.5-Instruct dataset, compatible with the Prismatic version.

Llava 1.5 7b Llara D InBC Aux B VIMA 80k

variante

LLaRA is an open-source visual motion strategy model, fine-tuned from LLaVA-7b-v1.5 on instruction-following data and auxiliary datasets, primarily used for robotics research.

Math LLaVA

Zhiqiang007

Math-LLaVA-13B is an open-source multimodal large language model fine-tuned on the MathV360K dataset based on LLaVA-1.5-13B, suitable for scenarios such as multimodal reasoning and Q&A.

Chinese LLaVA Med 7B

BUAADreamer

A Chinese medical multimodal large language model based on the LLaVA-1.5 architecture, focusing on visual question answering tasks in the medical field.

Multimodal

TransformersChinese

BUAADreamer

Vsft Llava 1.5 7b Hf Trl

HuggingFaceH4

A multimodal vision-language model based on LLaVA-1.5-7B trained through Visual Supervised Fine-Tuning (VSFT), supporting image understanding and dialogue generation

Multimodal

TransformersEnglish

HuggingFaceH4

SpaceLLaVA

remyxai

SpaceLLaVA is an improved vision-language model based on LLaVA-1.5, enhanced with LoRA fine-tuning for spatial reasoning capabilities, suitable for both quantitative and qualitative spatial reasoning tasks.

Multimodal Gguf

GgufEnglish

remyxai

324

ChartLlama 13b

listen2you002

ChartLlama is a multimodal model based on the LLaVA-1.5 architecture, specializing in chart understanding and analysis tasks.

Natural Language Processing

TransformersEnglish

listen2you002

144

Empowering the future, your artificial intelligence solution think tank

English 简体中文繁體中文にほんご

FirendLinks:

AI Newsletters AI Tools MCP Servers AI News AIBase LLM Leaderboard AI Ranking

Business Cooperation Site Map