Peking University Releases Open Source Video-LLaVA Model, AI Instantly Understands Humor in Videos

量子位

Published inAI News · 1 min read · Nov 21, 2023

243

Translated data: The Peking University Mengchen Team has open-sourced the Video-LLaVA large model, enabling instant comprehension of comedic elements in funny videos. The model has achieved advanced performance on multiple benchmarks, requiring no paired data, and understands both images and videos through a unified visual feature space. Comparative experiments show that pre-aligned visual representations enhance performance in video question-answering tasks. Joint training on video data benefits the model in both image and video understanding tasks.

Peking University Video-LLaVA Multimodal

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Xiaomi AI Team Collaborates with Peking University to Publish New Paper, 'Talented Girl' Hired by Lei Jun Participates in Research

Xiaomi and Peking University co-published a paper on arXiv. Corresponding author Luo Fuli, noted for Lei Jun's high-salary recruitment, is affiliated with PKU's Computational Linguistics Institute, not Xiaomi's model team.....

Oct 17, 2025

OpenAI Suspends Sora from Generating Video of Martin Luther King Jr. to Protect Historical Figures' Image

OpenAI suspended Sora's generation of Martin Luther King Jr.'s portrait at his estate's request, citing disrespectful user content. The company emphasized balancing free speech with public figures' control over their likeness.....

Oct 17, 2025

A Single Sentence Can Change AI's Creative Potential: Study Finds Simple Prompts Can Significantly Improve Output Diversity

A team from Stanford and other universities proposed the 'language sampling' method, which improves the creative diversity of generative AI by asking the model to generate five responses and their probabilities in the prompt. This method applies to both language and image models, and can stimulate richer creative outputs.

Oct 17, 2025

AI Video Company Ai Shi Technology Completes 100 Million RMB B+ Round Financing: ARR Exceeds 40 Million USD, Users Exceed 100 Million

Aishitech raised 100M yuan in Series B+ funding. With 10M registered and 16M monthly active users, its annual recurring revenue exceeds $40M, growing tenfold since commercialization began in Nov 2024.....

Oct 17, 2025

OpenAI Video Generation Model Sora 2 Launches on Microsoft Azure Platform: Pricing at $0.10 per Second, Enters Public Preview Phase

Microsoft launches OpenAI's Sora2 video generation model on Azure AI for public preview, offering cloud API access to businesses and developers. This multimodal tool processes text, image, and video inputs to create new content, advancing generative AI video into commercial applications like advertising.....

Oct 17, 2025

110

LLaVA-OneVision-1.5, a Fully Open-Source Multimodal Model That Exceeds Qwen2.5-VL

LLaVA-OneVision-1.5, a breakthrough multimodal model, evolved over two years from basic image-text alignment to handling images/videos. It offers an open, efficient training framework for building high-quality vision-language models via three-stage training.....

Oct 17, 2025

Game Video Platform Medal Splits AI Lab General Intuition: Raises $137.7 Million Seed Round, Focuses on Spatiotemporal Reasoning

Medal's AI lab General Intuition raised $133.7M seed funding led by Khosla Ventures and General Catalyst. It leverages Medal's game videos to train AI models, focusing on spatio-temporal reasoning with superior datasets.....

Oct 17, 2025

Google DeepMind and Yale University Collaborate to Develop AI Model C2S-Scale 27B for Cancer Treatment Pathways

Google DeepMind & Yale developed C2S-Scale27B, a 27B-parameter AI model based on Gemma, analyzing cell behavior & cancer-immune interactions. Validated in live cells, it offers new cancer treatment insights.....

Oct 17, 2025

Wondershare Launches Video Tutorial Co-creation Incentive Program to Accelerate the Development of an Inclusive AI Video Creation Ecosystem

Wondershare promotes 'creative equality' via its AIGC video platform, launching a tutorial co-creation incentive to foster an inclusive creator ecosystem for AI video production.....

Oct 16, 2025

100

Google Launches Veo 3.1 Video Generation Model: New Audio Features and Fine-Grained Editing Capabilities

Google upgrades the video generation model Veo 3.1, improving audio output, editing control accuracy, and image-to-video quality, enabling more realistic videos and precise response to instructions. New features allow adding objects to videos and automatically matching the visual style. The ability to remove objects will be introduced in the Flow tool, enhancing editing flexibility.

Oct 16, 2025

140

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Peking University Releases Open Source Video-LLaVA Model, AI Instantly Understands Humor in Videos

量子位

This article is from AIbase Daily

AI News Recommendations

Xiaomi AI Team Collaborates with Peking University to Publish New Paper, 'Talented Girl' Hired by Lei Jun Participates in Research

OpenAI Suspends Sora from Generating Video of Martin Luther King Jr. to Protect Historical Figures' Image

A Single Sentence Can Change AI's Creative Potential: Study Finds Simple Prompts Can Significantly Improve Output Diversity

AI Video Company Ai Shi Technology Completes 100 Million RMB B+ Round Financing: ARR Exceeds 40 Million USD, Users Exceed 100 Million

OpenAI Video Generation Model Sora 2 Launches on Microsoft Azure Platform: Pricing at $0.10 per Second, Enters Public Preview Phase

LLaVA-OneVision-1.5, a Fully Open-Source Multimodal Model That Exceeds Qwen2.5-VL

Game Video Platform Medal Splits AI Lab General Intuition: Raises $137.7 Million Seed Round, Focuses on Spatiotemporal Reasoning

Google DeepMind and Yale University Collaborate to Develop AI Model C2S-Scale 27B for Cancer Treatment Pathways

Wondershare Launches Video Tutorial Co-creation Incentive Program to Accelerate the Development of an Inclusive AI Video Creation Ecosystem

Google Launches Veo 3.1 Video Generation Model: New Audio Features and Fine-Grained Editing Capabilities

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Peking University Releases Open Source Video-LLaVA Model, AI Instantly Understands Humor in Videos

量子位

This article is from AIbase Daily

AI News Recommendations

Xiaomi AI Team Collaborates with Peking University to Publish New Paper, 'Talented Girl' Hired by Lei Jun Participates in Research

OpenAI Suspends Sora from Generating Video of Martin Luther King Jr. to Protect Historical Figures' Image

A Single Sentence Can Change AI's Creative Potential: Study Finds Simple Prompts Can Significantly Improve Output Diversity

AI Video Company Ai Shi Technology Completes 100 Million RMB B+ Round Financing: ARR Exceeds 40 Million USD, Users Exceed 100 Million

OpenAI Video Generation Model Sora 2 Launches on Microsoft Azure Platform: Pricing at $0.10 per Second, Enters Public Preview Phase

LLaVA-OneVision-1.5, a Fully Open-Source Multimodal Model That Exceeds Qwen2.5-VL

Game Video Platform Medal Splits AI Lab General Intuition: Raises $137.7 Million Seed Round, Focuses on Spatiotemporal Reasoning

Google DeepMind and Yale University Collaborate to Develop AI Model C2S-Scale 27B for Cancer Treatment Pathways

Wondershare Launches Video Tutorial Co-creation Incentive Program to Accelerate the Development of an Inclusive AI Video Creation Ecosystem

Google Launches Veo 3.1 Video Generation Model: New Audio Features and Fine-Grained Editing Capabilities

GEO Services