AI News

Don't miss any moment of global AI innovation

AI Daily

Daily three-minute AI industry trends

AI Timeline

AI industry milestones

Al Hardware

Lists all AI hardware products.

AI Monetization Guide

Latest Cases

AI monetization case sharing

Image Collection

AI image creation monetization cases

Video Collection

AI video creation monetization cases

Audio Collection

AI audio creation monetization cases

Content Collection

AI content writing monetization cases

AI Tutorials

Latest Tutorials

Free sharing of the latest AI tutorials

AI Product Rankings

AI Product Ranking

Shows total visits ranking of AI websites

AI Traffic Growth Ranking

Track fastest growing AI websites by traffic

AI Traffic Decline Ranking

Focus on AI websites with significant traffic drops

AI Weekly Ranking

Shows weekly visits ranking of AI websites

Popular Country Rankings

United States

AI websites most popular with US users

China

AI websites most popular with Chinese users

India

AI websites most popular with Indian users

Brazil

AI websites most popular with Brazilian users

Popular Category Rankings

Image Generation

Total visits ranking of AI image generation websites

Personal Assistant

Total visits ranking of AI personal assistant websites

Character Generation

Total visits ranking of AI character generation websites

Video Generation

Total visits ranking of AI video generation websites

Popular Open Source Data Rankings

AI Project Ranking

GitHub popular AI projects by total stars

AI Project Growth Ranking

GitHub popular AI projects by growth rate

AI Developer Ranking

GitHub popular AI developer ranking

AI Organization Ranking

GitHub popular AI organization ranking

Popular Open Source Categories

Deepseek

GitHub popular deepseek open source projects

TTS

GitHub popular TTS open source projects

LLM

GitHub popular LLM open source projects

ChatGPT

GitHub popular ChatGPT open source projects

AI Open Source Project Library

Overview

Overview of GitHub popular AI open source projects

Product Library Tool Navigation MCP

Google Launches LMEval: A New Tool for Uniformly Evaluating Large Language and Multimodal Models

AIbase基地

Published inAI News · 4 min read · May 27, 2025

Recently, Google announced the release of LMEval, an open-source framework designed to simplify and standardize the evaluation of large language and multimodal models. This tool provides researchers and developers with a unified evaluation process, making it easy to compare AI models from different companies, such as GPT-4o, Claude3.7Sonnet, Gemini2.0Flash, and Llama-3.1-405B.

In the past, comparing new AI models was often complex because each provider used its own APIs, data formats, and benchmark settings, leading to inefficient evaluations and difficulties in comparison. Therefore, LMEval was developed to standardize the evaluation process, allowing once a benchmark is set up, it can be easily applied to any supported model with minimal additional effort.

LMEval not only supports text evaluation but also extends to image and code assessment. Google stated that users can easily add new input formats. The system can handle various types of evaluations, including true/false questions, multiple-choice questions, and free-text generation. At the same time, LMEval can identify "evasive strategies," where models intentionally provide ambiguous answers to avoid generating problematic or risky content.

This system runs on the LiteLLM framework, smoothing out API differences across providers like Google, OpenAI, Anthropic, Ollama, and Hugging Face. This means the same tests can run on multiple platforms without rewriting code. A standout feature is incremental evaluation; users don’t need to rerun the entire test suite every time but can execute only the new tests, saving time and reducing computational costs. Additionally, LMEval uses a multithreaded engine to speed up computation, enabling parallel execution of multiple calculations.

Google also offers a visualization tool called LMEvalboard, which users can use to analyze test results. By generating radar charts, users can view a model's performance across different categories and delve into individual model performance. The tool allows users to compare models, including side-by-side graphical displays for specific questions, making it easier to understand the differences between models.

The source code and example notebooks for LMEval are available on GitHub for broad developer use and research.

Project: https://github.com/google/lmeval

Key Points:

🌟 LMEval is an open-source framework released by Google to unify the evaluation of large AI models from different companies.

🖼️ Supports multimodal evaluation of text, images, and code, and allows easy addition of new input formats.

📊 Provides the LMEvalboard visualization tool to help users deeply analyze and compare model performance.

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team