Recently, a research team from Stanford University released AgentFlow, a trainable intelligent agent framework designed to enhance AI's intelligent decision-making capabilities through modular design and tool integration. AgentFlow consists of four modules: the Planner, Executor, Verifier, and Generator, and is coordinated through explicit memory. In each step, the Planner proposes sub-goals and selects appropriate tools and context, the Executor is responsible for calling the tools, the Verifier determines whether to continue, and the Generator provides the final answer once the task is completed.
The core innovation of this framework lies in its training method — Flow-GRPO (Flow-based Group Refinement Policy Optimization). This method transforms long-term, sparse reward optimization problems into manageable single-round updates. Specifically, Flow-GRPO broadcasts a single verifiable trajectory-level signal at each step, aligning successful global goals with local steps. It also uses a weighted ratio per token calculation, combined with PPO-style clipping and KL penalty, to prevent policy drift.
The research team evaluated AgentFlow on multiple benchmark tests, covering four types of tasks: knowledge-intensive search, agent reasoning, math, and science. The 7B model optimized by Flow-GRPO showed an average improvement of 14.9% (search tasks), 14.0% (agent reasoning), 14.5% (math tasks), and 4.1% (science tasks) across 10 benchmarks. The research team stated that the model outperformed existing strong baselines, even surpassing GPT-4o.
Additionally, the study showed that the reliability of tool calls using AgentFlow has significantly improved, with a 28.4% reduction in tool call errors. These results indicate that with larger round budgets and model sizes, the quality of planning has seen significant improvements.
The open-source implementation of AgentFlow demonstrates a modular toolkit accompanied by quick start scripts, making it convenient for users to perform inference, training, and benchmark testing. The project is licensed under MIT, ensuring its open source and accessibility, supporting extensive research and development.
Key Points:
🛠️ AgentFlow is a modular AI agent framework, consisting of four modules: Planner, Executor, Verifier, and Generator.
🚀 The Flow-GRPO training method efficiently optimizes the agent's decision-making process, guiding each step with trajectory-level rewards.
📈 Experimental results show that AgentFlow performs well on multiple benchmarks, with an average increase of 14.9% in task completion rates, surpassing existing strong baselines.