Apple has confirmed that it will showcase multiple important research achievements at the upcoming International Conference on Computer Vision (ICCV), which is highly anticipated in the global technology community and will be held in beautiful Honolulu from October 19 to 23, 2025. The conference aims to focus on cutting-edge technologies and research advances in the field of computer vision. Apple will also present its latest research in hot areas such as multimodal models and video generation.
Image source note: The image is AI-generated, and the image licensing service provider is Midjourney.
Apple will submit and present eight papers covering a variety of important topics. The subjects include "Evaluation Methods for Text-to-Video Alignment through Fine-Grained Question Generation and Answering," "Three-Dimensional Spatial Understanding in Multimodal Large Language Models," and "Scalable Video Generation Methods," demonstrating Apple's deep strength and innovation capabilities in the fields of artificial intelligence and computer vision. In addition, Dr. C. Thomas, manager of Apple's Machine Learning Applications Research Division, will also participate as a keynote speaker, sharing his insights on current technological trends.
Notably, Apple will also participate in the "Women in Computer Vision Workshop," emphasizing support and advocacy for female tech talent. During the conference, Apple researchers Patricia Vitoria Carrera and Tanya Glozman will serve as mentors, engaging in in-depth exchanges of experiences and insights with attendees.
Below are the eight paper titles that Apple will present at the 2025 ICCV:
1. ETVA: Evaluation of Text-to-Video Alignment through Fine-Grained Question Generation and Answering
2. MM-Spatial: Exploring Three-Dimensional Spatial Understanding in Multimodal Large Language Models
3. Study on the Expansion Laws of Native Multimodal Models
4. Implicit Advantages of Stable Diffusion Models in Visual Context Learning
5. STIV: Scalable Text and Image Conditioned Video Generation Method
6. UINavBench: Interactive Digital Agent Comprehensive Evaluation Framework
7. Unified Open-World Segmentation Technology Based on Multimodal Prompts
8. UniVG: A General Diffusion Model for Unified Image Generation and Editing
Apple's participation marks its continuous investment and innovation in the fields of computer vision and artificial intelligence. We look forward to its outstanding performance at the conference, bringing new insights for the future development of technology.