AIbase
Product LibraryTool Navigation

Search AI Products and News

  • AI News
  • AI Tools
2025-04-18 10:53:12.AIbase

LMArena Officially Launches, Dedicated to Providing a Neutral AI Evaluation Platform

2025-04-16 11:24:23.AIbase

OpenAI Acquires Context.ai Team to Enhance AI Model Evaluation

2025-04-11 11:54:41.AIbase

Vector Institute Releases AI Model Performance Report to Boost Transparency and Trust

2025-04-10 14:35:16.AIbase

ByteDance Open-Sources Multi-SWE-bench to Drive Intelligent Upgrades for Large Model Code

2025-04-10 09:47:04.AIbase

OpenAI Launches Pioneers Program to Redefine AI Model Evaluation

2025-04-09 10:29:31.AIbase

OpenAI Launches Evals API: Ushering in a New Era of Programmatic AI Model Testing

2025-04-07 09:20:30.AIbase

Meta Accused of AI Model Double Standard: Maverick's Performance Varies Widely Between Evaluation and Public Versions

2025-04-03 14:00:32.AIbase

Gemini-2.5-pro Demonstrates Superior Mathematical Abilities in MathArena Evaluation, Surpassing Other Models

2025-04-02 14:47:08.AIbase

Arthur Launches First Open-Source Real-time AI Evaluation Engine: Arthur Engine

2025-03-21 11:48:03.AIbase

High School Student Creates AI Model Evaluation Website Using Minecraft

2025-03-21 09:45:00.AIbase

Minecraft Transformed into an AI Arena: High School Student Builds Innovative Model Evaluation Platform

2025-03-12 15:28:43.AIbase

Ant Group's Medical Large Language Model Wins Double Championship in MedBench Evaluation, Ushering in a New Era for Medical AI

2025-01-16 10:42:26.AIbase

Alibaba Qwen Team Releases New Process Reward Model, Advancing Mathematical Reasoning

2025-01-10 15:49:29.AIbase

The Glorious GLM-4-9B Model Achieves Only 1.3% Hallucination Rate, Winning First Place in Global Large Model Evaluation

2025-01-02 14:30:40.AIbase

Microsoft Paper Reveals OpenAI Model Parameters? Medical AI Evaluation Unexpectedly Exposes 4o-mini with Only 8B

2024-12-19 17:47:00.AIbase

CompassArena Upgrade: Launch of New Judge Copilot Feature

2024-12-19 14:07:19.AIbase

AI is Not Omnipotent: Latest Research Reveals Top AI Models Exhibit Cognitive Impairments Similar to Early Dementia

2024-12-09 17:08:28.AIbase

The AI Evaluation Landscape: How Chatbot Arena is Changing the 'Survival Rules' for Tech Companies

2024-12-05 14:45:53.AIbase

Byte's New Code Model Evaluation Benchmark 'FullStack Bench'

2024-11-06 14:17:46.AIbase

CMU and Meta Join Forces to Unveil VQAScore! A Single Question Addresses Evaluation of Text-to-Image Models, Achieving Accuracy that Far Surpasses Traditional Methods!