Zhejiang University and Microsoft Develop LLaVA-1.5, a Multimodal AI System Competing with GPT-4

站长之家

Published inAI News · 2 min read · Oct 8, 2023

Data to be translated: Researchers from Zhejiang University, Microsoft Research, and Columbia University have jointly developed a new multimodal AI system called LLaVA-1.5, which has set new records in 11 benchmark tests, surpassing GPT-4V in multimodal understanding capabilities and establishing itself as a competitive contender. LLaVA-1.5 has achieved these advancements with a simple system architecture and publicly available datasets, demonstrating that open-source models, when designed appropriately, can possess formidable capabilities, offering inspiration for the development of AI. The open-source nature of LLaVA-1.5 fills a gap in multimodal AI and is regarded by the industry as a strong contender to "challenge GPT-4."

LLaVA Multimodal AI Visual Question Answering

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

China's First Multimodal AI Programmer Officially Launches: Wenxin Quick Code Coding Intelligent Agent Zulu

Baidu's Create AI Developer Conference was grandly held in Beijing. At this highly anticipated technology event, Baidu officially released the Wenxin Quick Code 3.5 version and China's first multimodal AI programmer – the Wenxin Quick Code Comate Zulu intelligent agent, marking a new stage in the development of AI programming tools.

Apr 27, 2025

710

Moonshot AI Unveils Kimi-Audio: A New Benchmark for Open-Source Audio Foundation Models

Moonshot AI recently announced the launch of Kimi-Audio, a new open-source audio foundation model aimed at advancing the field of audio understanding, generation, and interaction. This release has garnered significant attention from the global AI community and is considered a major milestone in the development of multimodal AI. This report provides a comprehensive overview of Kimi-Audio's core features, performance, and industry impact. Breakthrough Features: Versatile Audio Processing Capabilities Kimi-Audio-7B-Instruct based on Qwen

Apr 27, 2025

670

Apple and Sorbonne University Joint Research: Early Fusion and Sparse Architectures Advance Multimodal AI

In the field of multimodal artificial intelligence (AI), engineers from Apple have collaborated with a research team from Sorbonne University in France on a significant study. Recently, tech media outlet marktechpost published a blog post discussing the application and prospects of early and late fusion models in multimodal AI. The research indicates that early fusion models trained from scratch offer superior computational efficiency and scalability. Multimodal AI aims to process multiple data types simultaneously, such as images and text; however, integrating these diverse sources presents challenges.

Apr 16, 2025

440

MiniMax MCP Server Officially Launches, Ushering in a New Era of Multimodal AI

The boundaries of artificial intelligence technology are constantly expanding. AIbase learned from social media that MiniMax, a Chinese AI startup, recently announced the official launch of its MiniMax MCP Server. This server allows users to access various capabilities, including video generation, image generation, voice generation, and voice cloning, simply through text input. It's compatible with multiple mainstream MCP clients, providing developers and creators with a powerful multimodal AI tool. Below is AIbase's in-depth analysis of this significant release.

Apr 15, 2025

340

Report: OpenAI to Release GPT-4.1 Series Next Week, Including Mini and Nano Versions

AI leader OpenAI is poised to unleash a new wave of technological advancements next week! According to tech media outlet The Verge, OpenAI plans to launch a major update including the GPT-4.1 series, o3 series, and several other AI models. This flurry of releases not only demonstrates OpenAI's ambition for accelerated innovation but also provides the industry with more powerful AI tools. GPT-4.1 Series: A Comprehensive Upgrade in Multimodal Capabilities As the successor to GPT-4.0, the GPT-4.1 series...

Apr 11, 2025

2.4k

SenseTime's DayDayUp V6 Released: Multimodal AI Upgraded, API Opens Tomorrow!

SenseTime founder Xu Li recently unveiled DayDayUp V6, their latest generation of AI large model, sparking widespread discussion in the tech community. According to AIbase, DayDayUp V6 achieves significant breakthroughs in multimodal capabilities, further solidifying SenseTime's leading position in the AI field. Even more exciting, the model's API will officially open tomorrow, providing developers with stronger technical support and accelerating the implementation of AI applications. Multimodal capabilities are comprehensively upgraded. DayDayUp V6, as SenseTime's...

Apr 10, 2025

510

Lenovo CTO: Betting on Multimodal AI Collaboration to Build a Model Factory and Accelerate Intelligent Agent Deployment

Mar 31, 2025

390

Musk's xAI Acquires Video Generation Startup Hotshot AI, Intensifying Competition in the Video Sector

Another chapter in the expansion of Silicon Valley tech giants! Elon Musk's xAI company today announced the acquisition of Hotshot, a startup focused on AI-powered video generation. This strategic acquisition will inject new vitality into xAI's multimodal AI technology. Hotshot CEO Aakash Sastry officially announced the news on the X platform, but did not disclose the specific transaction amount. Previously backed by investors including Reddit co-founder Alexis Ohanian and SV Angel...

Mar 18, 2025

560

Cohere Releases New Multimodal AI Model Aya Vision in 32B and 8B Versions

Mar 6, 2025

200

Microsoft Open-Sources Multimodal AI Agent "Magma": Revolutionizing Shopping and Robotics

Microsoft has open-sourced Magma, a multimodal AI agent designed to enhance shopping experiences and improve robotic control. This powerful tool offers new possibilities for interaction and automation.

Feb 26, 2025

1.1k

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview