Google DeepMind Launches Evo-Memory Benchmark and ReMem Framework to Promote Experience Reuse in LLM Agents

AIbase基地

Published inAI News · 4 min read · Dec 3, 2025

116

In the development of large language model (LLM) agents, how to effectively store and utilize experience has become a key issue. Recently, a research team from the University of Illinois Urbana-Champaign and Google DeepMind proposed Evo-Memory, a streaming benchmark and agent framework designed to address current technological shortcomings. Evo-Memory not only evaluates an agent's learning ability during testing but also focuses on self-evolving memory, challenging whether agents can accumulate and reuse strategies from continuous task streams, rather than relying solely on static conversation records.

Traditional agents mainly rely on conversation recall, storing conversation history, tool usage records, and document retrieval to re-integrate this information in future queries. However, this memory approach only passively buffers information and cannot actively modify the agent's processing strategies for related tasks. In contrast, Evo-Memory emphasizes experience reuse, treating each interaction as an experience containing input, output, and feedback, evaluating whether the agent can retrieve these experiences and convert them into reusable strategies in subsequent tasks.

The research team formalized memory-enhanced agents as a tuple (F, U, R, C), where F is the base model, R is the retrieval module, C is the context construction, and U writes new experiences and evolves the memory after each step. Evo-Memory evaluates agents' performance in various environments by reconfiguring the dataset into an ordered task stream.

To set a baseline, the research team also defined the ExpRAG model, which transforms each interaction into structured experience text. In new tasks, agents process by retrieving similar experiences and combining them with the current input.

Additionally, the ReMem framework introduces a "think - act - memory refinement" control loop, allowing agents to actively retrieve, prune, and reorganize their memory during reasoning. This approach makes memory an explicit object that can be dynamically edited during reasoning.

Research results show that agents using self-evolving memory such as ReMem and ExpRAG significantly improve performance during testing, completing tasks with fewer steps, demonstrating higher success rates and accuracy. This research provides new directions for the future development of LLM agents.

Paper: https://arxiv.org/pdf/2511.20857

Key Points:
🧠 Evo-Memory is a newly launched streaming benchmark, focusing on experience reuse for agents.
🚀 The ReMem framework allows agents to dynamically manage memory during reasoning, improving task completion efficiency.
📈 Research shows that agents using self-evolving memory demonstrate significant improvements in accuracy and success rate.

SkyReels V4 Tops the Global Video Generation Rankings, Chinese AI Audio-Visual Technology Achieves World-Class Leadership

TianGong AI large model SkyReels V4 won the global video generation track, surpassing mainstream models such as Kling and Google Veo. Its core breakthrough lies in the use of reinforcement learning and logical reasoning technology, effectively solving the issues of consistency and narrative logic in video generation, becoming the AI model with the strongest video generation capabilities globally.

Musk Confirms SpaceX AI and Tesla Will Continue Large-Scale Orders of NVIDIA Chips

Musk confirms SpaceX and Tesla will continue large-scale purchases of Nvidia chips, praising the company and its founder, affirming its valuation. This solidifies long-term collaboration between tech giants and computing power suppliers, highlighting the crucial role of high-performance computing in AI competition.....

Stripe Launches Machine Payments Protocol (MPP): The Era of AI Agents Making Autonomous Payments Begins

On March 18, 2026, Stripe, a leading global payment infrastructure company, officially announced the launch of the Machine Payments Protocol (MPP) on the X platform. This is an open standard co-authored by Stripe and Tempo, aimed at providing programmatic payment solutions for internet-native AI agents (Agents), supporting various scenarios such as microtransactions, pay-per-use, and recurring payments.

Google DeepMind Upgrades Gemini API with Multi-Toolchain and Context Loop Features

In March 2026, Google DeepMind upgraded the Gemini API, introducing the "multi-toolchain" and "context loop" mechanism. This move simplifies the development process, allowing integration of built-in tools such as Google Search and Maps along with custom functions in a single request. The "context loop" enables automated data transfer across tools, improving response efficiency and task handling capabilities.

Tencent Hunyuan 3.0 Scheduled for April: Significant Upgrades in Reasoning and Intelligence Capabilities

Tencent announced at its latest financial report conference that its self-developed large model, Hunyuan 3.0, has entered internal testing and is planned to be released publicly in April 2026. This upgrade focuses on transitioning from a large model to a strong intelligence agent, achieving significant breakthroughs in reasoning capabilities and complex task handling.

DeepSeek V4 is About to Be Released: Job Recruitment Leaks the Secret, Programming Capabilities Target Claude?

Although DeepSeek V4 has not been officially released, the latest job recruitment information has revealed its development focus. The official team is currently recruiting core talents such as Agent algorithm engineers, data evaluation, and infrastructure engineers. The job requirements indicate that the team not only focuses on traditional algorithm capabilities but also places great emphasis on candidates' proficiency with cutting-edge development tools such as Claude Code and Cursor, suggesting that the new model will focus on the evolution of intelligent agents and code capabilities.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

Google DeepMind Launches Evo-Memory Benchmark and ReMem Framework to Promote Experience Reuse in LLM Agents

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Lei Jun responds to Xiaomi's large model: We are indeed relatively low-key, but our strength has entered the top five globally

SkyReels V4 Tops the Global Video Generation Rankings, Chinese AI Audio-Visual Technology Achieves World-Class Leadership

Musk Confirms SpaceX AI and Tesla Will Continue Large-Scale Orders of NVIDIA Chips

Can ride-hailing be personalized? Didi AI Assistant Xiao Di upgrades: Precisely grasp with more than 90 service tags

Stripe Launches Machine Payments Protocol (MPP): The Era of AI Agents Making Autonomous Payments Begins

Google DeepMind Upgrades Gemini API with Multi-Toolchain and Context Loop Features

Tencent Hunyuan 3.0 Scheduled for April: Significant Upgrades in Reasoning and Intelligence Capabilities

Musk Likes! Kimi Paper Shakes the Traditional Foundations of Large Models: Same Computing Power, Efficiency Improved by 25%

DeepSeek V4 is About to Be Released: Job Recruitment Leaks the Secret, Programming Capabilities Target Claude?

Ma Huateng Discusses the Lobster App: Focus on AI Agents and Persist with a Decentralized Ecosystem

AI News Recommendations

Lei Jun responds to Xiaomi's large model: We are indeed relatively low-key, but our strength has entered the top five globally

SkyReels V4 Tops the Global Video Generation Rankings, Chinese AI Audio-Visual Technology Achieves World-Class Leadership

Musk Confirms SpaceX AI and Tesla Will Continue Large-Scale Orders of NVIDIA Chips

Can ride-hailing be personalized? Didi AI Assistant Xiao Di upgrades: Precisely grasp with more than 90 service tags

Stripe Launches Machine Payments Protocol (MPP): The Era of AI Agents Making Autonomous Payments Begins

Google DeepMind Upgrades Gemini API with Multi-Toolchain and Context Loop Features

Tencent Hunyuan 3.0 Scheduled for April: Significant Upgrades in Reasoning and Intelligence Capabilities

Musk Likes! Kimi Paper Shakes the Traditional Foundations of Large Models: Same Computing Power, Efficiency Improved by 25%

DeepSeek V4 is About to Be Released: Job Recruitment Leaks the Secret, Programming Capabilities Target Claude?

Ma Huateng Discusses the Lobster App: Focus on AI Agents and Persist with a Decentralized Ecosystem

GEO Services