In the development of large language model (LLM) agents, how to effectively store and utilize experience has become a key issue. Recently, a research team from the University of Illinois Urbana-Champaign and Google DeepMind proposed Evo-Memory, a streaming benchmark and agent framework designed to address current technological shortcomings. Evo-Memory not only evaluates an agent's learning ability during testing but also focuses on self-evolving memory, challenging whether agents can accumulate and reuse strategies from continuous task streams, rather than relying solely on static conversation records.

Traditional agents mainly rely on conversation recall, storing conversation history, tool usage records, and document retrieval to re-integrate this information in future queries. However, this memory approach only passively buffers information and cannot actively modify the agent's processing strategies for related tasks. In contrast, Evo-Memory emphasizes experience reuse, treating each interaction as an experience containing input, output, and feedback, evaluating whether the agent can retrieve these experiences and convert them into reusable strategies in subsequent tasks.
The research team formalized memory-enhanced agents as a tuple (F, U, R, C), where F is the base model, R is the retrieval module, C is the context construction, and U writes new experiences and evolves the memory after each step. Evo-Memory evaluates agents' performance in various environments by reconfiguring the dataset into an ordered task stream.
To set a baseline, the research team also defined the ExpRAG model, which transforms each interaction into structured experience text. In new tasks, agents process by retrieving similar experiences and combining them with the current input.
Additionally, the ReMem framework introduces a "think - act - memory refinement" control loop, allowing agents to actively retrieve, prune, and reorganize their memory during reasoning. This approach makes memory an explicit object that can be dynamically edited during reasoning.
Research results show that agents using self-evolving memory such as ReMem and ExpRAG significantly improve performance during testing, completing tasks with fewer steps, demonstrating higher success rates and accuracy. This research provides new directions for the future development of LLM agents.
Paper: https://arxiv.org/pdf/2511.20857
Key Points:
🧠 Evo-Memory is a newly launched streaming benchmark, focusing on experience reuse for agents.
🚀 The ReMem framework allows agents to dynamically manage memory during reasoning, improving task completion efficiency.
📈 Research shows that agents using self-evolving memory demonstrate significant improvements in accuracy and success rate.




