In-Depth Analysis of AI Model Performance and Cost: Benchmark Test Results of Grok4 vs. GPT-5

AIbase基地

Published inAI News · 4 min read · Aug 8, 2025

According to the latest test results released by the ARC Prize, there are significant differences in performance and cost among mainstream AI models. In the ARC-AGI-2 benchmark test, which evaluates a model's general reasoning ability, GPT-5 (Advanced) scored 9.9%, with a cost of $0.73 per task. Grok4 (Thinking) performed slightly better, achieving an accuracy rate of 16%, but its cost is higher, at $2 to $4 per task. This indicates that while Grok4 outperforms in complex reasoning tasks, its cost-effectiveness is far worse than GPT-5.

Performance and cost comparison of leading language models on the ARC-AGI benchmark. | Image: ARC-AGI

On the relatively less demanding ARC-AGI-1 test, Grok4 again led with an accuracy of 68%, slightly higher than GPT-5's 65.7%. Although Grok4 has a higher accuracy rate, its cost of about $1 per task is much higher than GPT-5's $0.51, making GPT-5 more cost-effective in this test. However, xAI may still have the potential to narrow this gap through price adjustments.

Additionally, the report mentioned a lightweight version of GPT-5. GPT-5Mini scored 54.3% and 4.4% on AGI-1 and AGI-2, respectively, with costs of $0.12 and $0.20. The smaller GPT-5Nano reached 16.5% (0.03 dollars) on AGI-1 and 2.5% (0.03 dollars) on AGI-2.

Test results for Grok4, GPT-5, and smaller model variants on the ARC-AGI-1. | Image: ARC Prize

Notably, in the ARC-AGI-1 test, the o3-preview model, released in December 2024, achieved an impressive accuracy rate of nearly 80%, far surpassing other competitors, but its cost was much higher than others. Although OpenAI did not mention the ARC Prize in its GPT-5 demonstration, according to The Information, the company may have significantly reduced the capabilities of o3-preview to adapt to subsequent chat versions.

Aside from the above benchmark tests, the ARC-AGI-3 is also underway, requiring models to solve tasks in a game-like interactive environment through repeated trials. Although humans can easily handle it, most AI agents still face challenges in visual puzzle games.

DeepSeek V4 Grey Scale Test Exposure: New Visual Version and Expert Mode Revealed

DeepSeek V4 is in beta testing, featuring breakthroughs in architecture, interaction, and multimodal capabilities. Its core innovation is a 'three-pillar' functional framework: a fast version for lightweight daily tasks, a standard version balancing performance and efficiency, and a professional version for complex tasks, marking a comprehensive evolution of the product lineup.....

Google Gemma 4 Fully Open-Sourced: Small Models Demonstrate Strong AI Capabilities

Google released the Gemma4 series of open-source models, achieving breakthroughs in lightweight AI capabilities. One model activates only 380 million parameters but outperforms a large model with 20 times the parameter size, making powerful AI easily deployable on smartphones and thin laptops. The series includes models with different parameter sizes, such as 2.3B and 4.5B, promoting more convenient and widespread access to AI services.

Google quietly releases Google AI Edge Eloquent: a free offline AI dictation tool based on Gemma4

Google has launched the experimental voice input app 'Google AI Edge Eloquent' on the iOS platform, focusing on offline-first and intelligent polishing features. It uses edge AI technology to convert spoken language into professional text in real time. This move marks Google's entry into the high-end AI speech-to-text market, competing with Wispr Flow and SuperWhisper. The app is powered by the Gemma4 series technology, emphasizing real-time processing and text optimization capabilities.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

In-Depth Analysis of AI Model Performance and Cost: Benchmark Test Results of Grok4 vs. GPT-5

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Mother of GPT-4o Announces Resignation, OpenAI Leadership Faces Further Turmoil

DeepSeek V4 Grey Scale Test Exposure: New Visual Version and Expert Mode Revealed

Musk AI Challenges League of Legends Champion Team Faker: We Are Ready to Fight!

Spanish satellite company Xoople secures $130 million in funding: Providing ground truth data for AI

Google Gemma 4 Fully Open-Sourced: Small Models Demonstrate Strong AI Capabilities

Google quietly releases Google AI Edge Eloquent: a free offline AI dictation tool based on Gemma4

Powered by the Apache 2.0 License! Google Gemma 4 is Now Open Source: 31B Parameters Performance Approaches Leading Large Models

Microsoft Launches AI Self-Development Campaign: Aiming to Unveil the Strongest In-House Model by 2027

Apple Collaborates with the University of Hong Kong to Launch the LGTM Rendering Framework, Breaking the 4K Video Quality Bottleneck

Rejecting Compute Anxiety! Apple's LGTM Framework Launches: Enabling 4K-Grade 3D Rendering to Take Off on Vision Pro

AI News Recommendations

Mother of GPT-4o Announces Resignation, OpenAI Leadership Faces Further Turmoil

DeepSeek V4 Grey Scale Test Exposure: New Visual Version and Expert Mode Revealed

Musk AI Challenges League of Legends Champion Team Faker: We Are Ready to Fight!

Spanish satellite company Xoople secures $130 million in funding: Providing ground truth data for AI

Google Gemma 4 Fully Open-Sourced: Small Models Demonstrate Strong AI Capabilities

Google quietly releases Google AI Edge Eloquent: a free offline AI dictation tool based on Gemma4

Powered by the Apache 2.0 License! Google Gemma 4 is Now Open Source: 31B Parameters Performance Approaches Leading Large Models

Microsoft Launches AI Self-Development Campaign: Aiming to Unveil the Strongest In-House Model by 2027

Apple Collaborates with the University of Hong Kong to Launch the LGTM Rendering Framework, Breaking the 4K Video Quality Bottleneck

Rejecting Compute Anxiety! Apple's LGTM Framework Launches: Enabling 4K-Grade 3D Rendering to Take Off on Vision Pro

GEO Services