Confronting GPT-4V! Zhejiang University Alumni Open Source Multimodal Model LLaVA-1.5, 13 Billion Parameters Trained in One Day on 8 A100 GPUs

新智元

Published inAI News · 1 min read · Oct 8, 2023

152

Researchers from the University of Wisconsin-Madison, Microsoft Research, and Columbia University have open-sourced the multimodal large model LLaVA-1.5, which has demonstrated exceptional performance in 11 benchmark tests, including visual question answering and image captioning tasks. LLaVA-1.5 requires only 8 A100 GPUs to complete training within a day, showcasing significant performance. The researchers proposed a method of adding output format prompts during the fine-tuning process, enabling the model to better adapt to different tasks. LLaVA-1.5's powerful multimodal understanding capabilities challenge the status of GPT-4V.

LLaVA-1.5 Multimodal Model GPT-4V

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

A Daily: Bilibili Upgrades Anime Video Generation Model AniSora V3; ByteDance Open Sources 4D Video Generation Framework EX-4D; DeepSWE Open Sources AI Agent System Rises to the Top

Jul 3, 2025

ByteDance Open Sources New Model VINCIE-3B: 300 Million Parameters Support Continuous Image Editing with Context

Jul 3, 2025

370

Bilibili Open-Sourced Anime Video Generation Model AniSora V3 Version - One-Click Generation of Various Style Anime Video Shots

Jul 3, 2025

180

Byte EX-4D Technology Achieves Monocular Video 4D Conversion, Unlocking High-Quality Content Generation Under Extreme Perspectives

The EX-4D (Extreme Viewpoint 4D Video Generation) technology, developed by the research team tau-yihouxiang, is a groundbreaking innovation in video generation that is gaining widespread attention globally. This technology aims to transform monocular videos into controllable 4D experiences, particularly demonstrating excellent performance under extreme camera angles. The core of the EX-4D technology lies in its unique 'depth watertight mesh' construction method. This novel geometric representation

Jul 3, 2025

ByteDance EX-4D Shakes Open Source: Turn Monocular Video into Free Perspective 4D Movie

Jul 3, 2025

140

Amazon Launches New AI Model Deep Fleet, Robot Count Exceeds One Million

In a recent major announcement, the global e-commerce and cloud computing giant Amazon revealed two important milestones in its robotics and artificial intelligence (AI) fields: the launch of a new AI foundation large model called Deep Fleet, and the successful deployment of more than one million robots. The launch of the Deep Fleet model aims to enhance the intelligence and efficiency of Amazon's largest industrial mobile robot fleet. The application of this model is expected to improve the operational efficiency of the robot fleet by 10%, thereby accelerating...

Jul 3, 2025

220

Honor Magic V5 Launch: Li Jian Emphasizes Open Ecosystem, Collaborating with Giants to Build the AI Future

In the media Q&A session after today's Honor Magic V5 and AI Terminal Ecosystem Launch, Honor CEO Li Jian, CFO Peng Qiuen, and Product Line President Fang Fei had in-depth discussions with the media. During the event, Honor officially announced support for the MCP and A2A protocols, and revealed that it will collaborate deeply with partners such as Alibaba, BYD, and Midea in the fields of intelligent service ecosystem, smart vehicle networking, and smart home. Honor CEO Li Jian emphasized in the conversation that 'openness' is the core philosophy of Honor. He pointed out...

Jul 3, 2025

Baidu Launches the World's First Chinese Audio-Visual Generation Model MuseSteamer, Revolutionizing the Creative Process

Jul 2, 2025

570

JD.com's Embodied Intelligence Strategy Accelerates Rapidly, JoyInside Collaboration Map Exposed

According to NetEase Technology, JD.com's layout in the field of embodied intelligence is accelerating rapidly. The embodied intelligence brand JoyInside under JD.com has reached cooperation with more than ten leading robot companies, becoming the core engine for JD.com to seize the smart robot market. According to insiders, JoyInside is supported by JD's large model technology, focusing on providing smart interaction capabilities between robots and consumers. Its product strategy focuses on scenario-based applications such as one person, one dog, and one toy. Since its launch, the brand has successfully attracted leading enterprises from multiple niche fields to join.

Jul 2, 2025

450

Foxconn Launches Its First AI Inference Large Model FoxBrain, Trademark Application Submitted

Recently, Hon Hai Precision Industrial Co., Ltd. (commonly known as Foxconn) submitted a trademark registration application for "FoxBrain" to the Trademark Office of the National Intellectual Property Administration. This AI inference large model is not only Foxconn's first attempt but also the first AI model of this type in Taiwan. According to public information, the international classification of this trademark is scientific instruments, and it is currently in the "waiting for substantive examination" status. "FoxBrain" is an AI inference large model launched by the Hon Hai Research Institute, covering data analysis

Jul 2, 2025

450

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Confronting GPT-4V! Zhejiang University Alumni Open Source Multimodal Model LLaVA-1.5, 13 Billion Parameters Trained in One Day on 8 A100 GPUs

新智元

This article is from AIbase Daily

AI News Recommendations

A Daily: Bilibili Upgrades Anime Video Generation Model AniSora V3; ByteDance Open Sources 4D Video Generation Framework EX-4D; DeepSWE Open Sources AI Agent System Rises to the Top

ByteDance Open Sources New Model VINCIE-3B: 300 Million Parameters Support Continuous Image Editing with Context

Bilibili Open-Sourced Anime Video Generation Model AniSora V3 Version - One-Click Generation of Various Style Anime Video Shots

Byte EX-4D Technology Achieves Monocular Video 4D Conversion, Unlocking High-Quality Content Generation Under Extreme Perspectives

ByteDance EX-4D Shakes Open Source: Turn Monocular Video into Free Perspective 4D Movie

Amazon Launches New AI Model Deep Fleet, Robot Count Exceeds One Million

Honor Magic V5 Launch: Li Jian Emphasizes Open Ecosystem, Collaborating with Giants to Build the AI Future

Baidu Launches the World's First Chinese Audio-Visual Generation Model MuseSteamer, Revolutionizing the Creative Process

JD.com's Embodied Intelligence Strategy Accelerates Rapidly, JoyInside Collaboration Map Exposed

Foxconn Launches Its First AI Inference Large Model FoxBrain, Trademark Application Submitted