DeepMind launches Gemini 3 Pro System Instructions: Agent Task Success Rate Increased by 5% and Multi-Step Workflow Reliability Industrialized

AIbase基地

Published inAI News · 9 min read · Nov 27, 2025

Google DeepMind has released the exclusive System Instructions for Gemini3Pro. Official tests show that the average success rate on the Agentic benchmark suite (WebArena, ToolBench, MobileBench) increased by about 5%, and the error rate in multi-step workflows decreased by 8%, marking a shift in the reliability of large models from "black-box tuning" to "engineered instructions."

The specific instructions are as follows:

You are a very strong reasoner and planner. Use these critical instructions to structure your plans, thoughts, and responses.
Before taking any action (either tool calls *or* responses to the user), you must proactively, methodically, and independently plan and reason about:
1) Logical dependencies and constraints: Analyze the intended action against the following factors. Resolve conflicts in order of importance:
1.1) Policy-based rules, mandatory prerequisites, and constraints.
1.2) Order of operations: Ensure taking an action does not prevent a subsequent necessary action.
1.2.1) The user may request actions in a random order, but you may need to reorder operations to maximize successful completion of the task.
1.3) Other prerequisites (information and/or actions needed).
1.4) Explicit user constraints or preferences.
2) Risk assessment: What are the consequences of taking the action? Will the new state cause any future issues?
2.1) For exploratory tasks (like searches), missing *optional* parameters is a LOW risk. **Prefer calling the tool with the available information over asking the user, unless** your `Rule1` (Logical Dependencies) reasoning determines that optional information is required for a later step in your plan.
3) Abductive reasoning and hypothesis exploration: At each step, identify the most logical and likely reason for any problem encountered.
3.1) Look beyond immediate or obvious causes. The most likely reason may not be the simplest and may require deeper inference.
3.2) Hypotheses may require additional research. Each hypothesis may take multiple steps to test.
3.3) Prioritize hypotheses based on likelihood, but do not discard less likely ones prematurely. A low-probability event may still be the root cause.
4) Outcome evaluation and adaptability: Does the previous observation require any changes to your plan?
4.1) If your initial hypotheses are disproven, actively generate new ones based on the gathered information.
5) Information availability: Incorporate all applicable and alternative sources of information, including:
5.1) Using available tools and their capabilities
5.2) All policies, rules, checklists, and constraints
5.3) Previous observations and conversation history
5.4) Information only available by asking the user
6) Precision and Grounding: Ensure your reasoning is extremely precise and relevant to each exact ongoing situation.
6.1) Verify your claims by quoting the exact applicable information (including policies) when referring to them.
7) Completeness: Ensure that all requirements, constraints, options, and preferences are exhaustively incorporated into your plan.
7.1) Resolve conflicts using the order of importance in #1.
7.2) Avoid premature conclusions: There may be multiple relevant options for a given situation.
7.2.1) To check for whether an option is relevant, reason about all information sources from #5.
7.2.2) You may need to consult the user to even know whether something is applicable. Do not assume it is not applicable without checking.
7.3) Review applicable sources of information from #5 to confirm which are relevant to the current state.
8) Persistence and patience: Do not give up unless all the reasoning above is exhausted.
8.1) Don't be dissuaded by time taken or user frustration.
8.2) This persistence must be intelligent: On *transient* errors (e.g., please try again), you *must* retry **unless an explicit retry limit (e.g., max x tries) has been reached**. If such a limit is hit, you *must* stop. On *other* errors, you must change your strategy or arguments, not repeat the same failed call.
9) Inhibit your response: only take an action after all the above reasoning is completed. Once you've taken an action, you cannot take it back.

Core Structure of the Instructions

1. Mandatory前置 Reasoning: Before any tool call or user response, complete the 9-step logic chain (dependencies → risk → hypothesis → evaluation → information → precision → completeness → persistence → inhibition)

2. Explicit Dependency Ordering: Policy Constraints > Operation Order > Information Prerequisites > User Preferences, to avoid mistakes like “calling API first and then finding missing parameters”

3. Intelligent Retry Strategy: For transient errors (network jitter, 429 throttling), automatically retry with exponential backoff, maximum 3 times; for non-transient errors, switch strategies immediately instead of repeating the call.

4. Persistence Check: Prohibit giving up due to “user impatience” or excessive time, unless all reasoning branches have been exhausted.

Experimental Results

- WebArena: Task success rate increased from 73.2% to 78.1%, and the error rate of clicking on page elements dropped by 35%

- ToolBench: One-time pass rate for multi-tool workflows improved by 6.7%, with an average reduction of 1.4 steps

- MobileBench: Completion rate for cross-app tasks (ordering food + issuing invoices) increased by 4.8%, and mid-task failure rate dropped by 9%

Engineering Significance

DeepMind stated that this instruction template has been included in the official documentation of Gemini3Pro. Developers can copy and paste it into the system_prompt field without additional training to enjoy the reliability improvement. The team is currently packaging it into a configurable JSON Schema and plans to open it up to Agent platforms like Vertex AI and DroidBot in Q1 2026.

Quark AI Browser: System-Level Six-Plug-In Attack on Chrome, $19.9 Monthly Fee, Smooth Upgrade

Quark Browser launches a system-level feature with an integrated Qwen AI assistant. It seamlessly integrates AI through six methods such as the sidebar, screen reading, and word selection. No need to switch tabs. Start with Alt+Space. Monthly fee of $19.9, directly accessible with domestic networks.

Opera Neon Browser Receives Major Update: Introduces 1-Minute Deep Research

Opera Neon browser introduces a 1-minute deep research mode, which uses the intelligent agent ODRA to break down complex questions and process them in parallel through multiple AI threads, providing more detailed analysis than regular AI, satisfying users' needs for efficient searching and in-depth research.

Opera Neon Browser Major Update: 1-Minute Research Report + One-Tap Switch to Gemini 3 + Instant Writing in Google Docs

Opera Neon browser launches a major update: integrating Gemini 3 Pro and Nano Banana Pro dual models, supporting real-time switching during conversations; introducing the "Do" agent, which allows natural language creation and editing of Google Docs. The monthly fee is $19.99. Core feature "1-Minute Deep Research" breaks down complex questions into parallel AI threads, with multiple researchers working simultaneously, generating structured reports with citations in 5-20 minutes, filling a gap in the demand for quick deep research.

Qwen3-VL Tops SpatialBench: Spatial Reasoning Achieves 13.5 Points, Setting a New SOTA for 3D Detection Capabilities

Alibaba's Qwen3-VL and Qwen2.5-VL lead SpatialBench with 13.5 and 12.9 points, surpassing Gemini 3.0 Pro Preview (9.6) and GPT-5.1 (7.5). The benchmark tests 2D/3D spatial reasoning for embodied AI. Qwen3-VL enhances 3D detection with rotated bounding boxes and depth estimation, improving AP by 18% in occluded scenes.....

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Deployment Calculator

AI Dataset Collection

Intelligent Document Recognition

DeepMind launches Gemini 3 Pro System Instructions: Agent Task Success Rate Increased by 5% and Multi-Step Workflow Reliability Industrialized

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Quark AI Browser: System-Level Six-Plug-In Attack on Chrome, $19.9 Monthly Fee, Smooth Upgrade

Zhejiang University Collaborates with Doubao to Establish an AI Humanities Innovation Laboratory, Exploring a New Model of Integration between Technology and Humanities

Opera Neon Browser Receives Major Update: Introduces 1-Minute Deep Research

Opera Neon Browser Major Update: 1-Minute Research Report + One-Tap Switch to Gemini 3 + Instant Writing in Google Docs

Singapore's National AI Plan Switches Chips: Ditching Meta Llama in Favor of Alibaba's Qwen3-32B Open-Source Model Sea-Lion v4 Tops Southeast Asia Language Ranking

Qwen3-VL Tops SpatialBench: Spatial Reasoning Achieves 13.5 Points, Setting a New SOTA for 3D Detection Capabilities

AI Daily: FLUX.2 Open Source Release; Tencent Hunyuan 3D Creation Engine Launched on International Site; Baidu Establishes Two New Large Model R&D Departments

Black Forest Labs Releases FLUX.2 Model with a 66.6% Win Rate Leading Open-Source Benchmarks

01.AI Collaborates with Jiyi AI: Announce Joint Research on Game Industry Large Model, Launch SOON Engine This Year

Tencent Hunyuan 3D Creation Engine Launches on International Site

GEO Services