Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

GAIA Benchmark Reveals Astonishing Gap Between Humans and GPT-4

站长之家

Published inAI News · 1 min read · Nov 29, 2023

113

Translated data: Researchers from FAIR Meta, HuggingFace, AutoGPT, and GenAI Meta have jointly introduced the GAIA benchmark, highlighting the superior capabilities of humans in handling complex tasks and multimodal processing. By simulating real-world scenarios, GAIA avoids the pitfalls of traditional LLM evaluations and provides insights for the development of next-generation AI systems. The study results indicate that humans perform exceptionally well against GPT-4, and GAIA also demonstrates that through API or web access, the accuracy and use cases of LLMs can be enhanced, offering opportunities for collaboration between AI and humans.

GPT-4 GAIA General Artificial Intelligence

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Netflix Introduces Generative Artificial Intelligence, Revolutionizing Film and Television Production Efficiency!

Jul 22, 2025

ByteDance Launches VLA General-Purpose Robot Model GR-3 Supporting High Dexterity Operations

Recently, ByteDance's Seed team officially launched a new Vision-Language-Action Model (VLA) called GR-3. This model demonstrates breakthrough capabilities in the field of robotic manipulation, not only understanding language instructions that include abstract concepts, but also precisely handling flexible objects. It also has the ability to generalize quickly to new tasks and recognize new objects. This achievement is seen as an important advancement toward a 'general-purpose robot brain'. Traditional robotic manipulation models often rely on large amounts of robotic trajectory data for training.

Jul 22, 2025

ChatGPT Receives 2.5 Billion Questions Daily, Sparking a Trend in Artificial Intelligence Usage!

ChatGPT handles 2.5B daily queries, with 330M from the US. While smaller than Google's 14B searches, its rapid growth (doubling in 8 months) reflects surging demand for AI assistants in tasks like research and writing, signaling deeper AI integration into daily life.....

Jul 22, 2025

JD.com Launches Open-Source JoyAgent-JDGenie! GAIA Accuracy of 75.15% Leads in Multi-Agent Systems

JD.com open-sources the multi-agent system JoyAgent-JDGenie, leading the industry with a 75.15% accuracy rate on the GAIA benchmark. The system features end-to-end multi-agent collaboration capabilities, supports multi-modal processing such as text and images, and uses a memory optimization mechanism to improve efficiency. The framework includes sub-agent modules for report generation, code, PPT, etc., allowing developers to expand and customize functions. It adopts the Apache 2.0 open-source license, provides complete code and documentation support, and is convenient for secondary development. Its modular design and multi-agent collaboration capabilities

Jul 21, 2025

220

JD.com responds to investing in three robotics companies in one day: highly valuing embodied intelligence and other technologies

This morning, the robotics industry saw a new trend of capital inflow — Qianxun Intelligent announced a 600 million yuan Pre-A+ round financing, Luxi Dynamics (LimX Dynamics) disclosed a new strategic financing round, and Zhongqing Robotics officially announced an A1 round financing. All three companies were led by JD.com Group. This series of actions marks that JD.com is accelerating its investment layout in the field of embodied intelligence, drawing widespread attention from the market on the technological transformation of intelligent supply chain. Regarding the intensive investment activities, a relevant person in charge of JD.com Group responded that the company is continuously increasing its investment in embodied

Jul 21, 2025

130

OpenAI to Launch GPT-5, with Mathematical Capabilities Different from the IMO Gold Medal Model

OpenAI announced that GPT-5 will be launched soon, but clarified that it is not the experimental model that won the International Mathematical Olympiad. The CEO stated that the winning model used new technology and has mathematical capabilities far beyond current levels, while GPT-5, though expected to provide a surprising experience, will have different mathematical abilities. The community is热议 a suspected GPT-5 test model appearing on GitHub. OpenAI emphasized the need to distinguish between the actual capabilities of different models, providing clear guidance for market expectations. The AI field continues to closely watch the release of GPT-5 and technological breakthroughs. (140 characters)

Jul 21, 2025

160

Meta's Super Intelligence Lab Welcomes Top Talent: 40% Previously Worked at OpenAI, Salaries Up to $1 Billion!

Meta forms 'Super AI Lab' with 44 top talents (50% Chinese, 40% ex-OpenAI), shifting focus from metaverse to AI. Offering huge incentives like $200M signing bonus. Team: 75% PhDs, 70% researchers.....

Jul 21, 2025

Sentient Intelligence Company Qianxun Intelligence Completes Pre-A+ Round Financing of Nearly 600 Million Yuan

Leading company in the field of sentient intelligence, Qianxun Intelligence (Spirit AI), recently announced the completion of a Pre-A+ round financing of nearly 600 million yuan. The round was led by JD.com, with participation from well-known investment institutions such as China Internet Investment Fund (CIIIF), Zhejiang Science and Technology Innovation Fund, Huatai Zijin, Fosun Ruizheng, and others. At the same time, existing shareholders such as Prosperity7Ventures (P7) and Shunwei Capital also made additional investments, demonstrating strong confidence in the development potential of Qianxun Intelligence. The financing was exclusively advised by GaoHu Capital.

Jul 21, 2025

The Confidence Crisis of Large Language Models: Why GPT-4o Abandons Correct Answers Easily?

Research reveals that large language models (such as GPT-4o) exhibit a tendency to be easily swayed: they may abandon correct answers when confronted with doubts. Experiments show that the model initially responds confidently, but after being influenced by opposing opinions, it overestimates its own uncertainty and even accepts incorrect information. This phenomenon may stem from an over-approving tendency due to reinforcement learning training, reliance on statistical patterns rather than logical reasoning, and the lack of a memory mechanism. The study reminds users to be aware of the model's sensitivity to opposing opinions during multi-turn conversations.

Jul 21, 2025

OpenAI Advisory Board Calls for Strengthened Nonprofit Regulation to Ensure Artificial Intelligence Benefits All Humanity

OpenAI advisory report advocates nonprofit AI governance for democratic participation, suggesting transition to a public benefit corporation to balance profit and social goals, with increased public interest funding.....

Jul 18, 2025