Google Introduces New Method: Training Data Reduced by 10,000 Times to Improve Model Accuracy

AIbase基地

Published inAI News · 5 min read · Aug 25, 2025

Recently, Google proposed a novel active learning screening process in its research, aiming to significantly reduce the amount of training data required for fine-tuning large language models. According to experimental results, this method can reduce the training data to 1/10,000 of the original, while increasing the model's consistency with human experts by 65%. In practical applications such as advertising content classification and financial data security analysis, the demand for high-fidelity training data has always been high, but selecting data that meets the requirements is not only difficult but also extremely expensive.

Google (3)

Image source note: The image was generated by AI, and the image licensing service provider is Midjourney.

This new method starts with an initial model that has zero or few samples. Users define the target content through prompts, such as asking whether an advertisement is a "click bait." The initial model marks the ads as click bait or benign and generates a large labeled dataset. However, the initial dataset often suffers from serious class imbalance, resulting in weak accurate identification capabilities of the model.

To address this issue, researchers grouped the content marked by the model as click bait and benign advertisements, discovering that some groups overlapped, indicating that the model tends to make errors on these contents. Therefore, researchers can select sample pairs from these overlapping groups and have experts evaluate them, thus controlling the review cost and prioritizing sample pairs that cover various situations. The resulting samples are both valuable and cover various potential error scenarios.

During the model fine-tuning process, the expert-labeled data is divided into two groups, one for evaluating the model's consistency and the other for fine-tuning the model. This process continues until the model's performance reaches a level comparable to that of human experts.

Google's experiments used the Gemini Nano-1 and Nano-2 models and tested them on two tasks with different levels of complexity. In the tests, each task used approximately 100,000 crowdsourced annotated data, although these data were severely imbalanced. The results showed that the consistency among experts was very high, while the consistency between crowdsourced labels and expert judgments was relatively low. Using the new method, a 3.25 billion parameter model showed significant improvement in alignment on a low-complexity task, using only 250-450 data points instead of the original 100,000, still achieving good results.

In summary, Google's new method demonstrates that even with a small amount of high-quality data and ensuring expert annotation consistency exceeds 0.8, large models can achieve excellent performance during training.

Key Points:
📉 The amount of training data can be reduced to 1/10,000, improving model accuracy.
🤝 The new method relies on expert judgment and model iteration to ensure sample quality.
📊 Experiments show that a small amount of high-quality data can achieve or even exceed the effects of traditional large-scale data.

Google Launches Veo 3.1 Video Generation Model: New Audio Features and Fine-Grained Editing Capabilities

Google upgrades the video generation model Veo 3.1, improving audio output, editing control accuracy, and image-to-video quality, enabling more realistic videos and precise response to instructions. New features allow adding objects to videos and automatically matching the visual style. The ability to remove objects will be introduced in the Flow tool, enhancing editing flexibility.

Tencent Proposes a Training-Free Optimization Method: Achieving the Effect of Traditional 70,000 Yuan Fine-tuning with Only 120 Yuan Cost

Tencent released the Training-Free GRPO technology, which replaces parameter fine-tuning with an external knowledge base, achieving performance optimization under the condition of frozen model parameters. This method transforms empirical knowledge into token-level prior information, significantly reducing training costs, and achieves performance improvements comparable to expensive fine-tuning on the DeepSeek-V3.1-Terminus model.

Report says Google is about to release VEO 3.1 version on Gemini and API

Google is about to publicly release the VEO 3.1 AI model. The Gemini application has already shown related disclaimers, indicating that new features will be integrated into the user-friendly interface. Community member Logan Kilpatrick confirmed the release information in advance on social media. At the same time, the Vertex AI platform has referenced the VEO 3.0 preview model, showing that Google is accelerating the deployment of AI products.

Google Meet Launches AI Makeup Feature to Help Users Feel More Confident Before Meetings

Google Meet has introduced an AI makeup filter feature, offering 12 virtual makeup options to help users enhance their appearance during video conferences. This move aims to compete with similar features already launched by competitors such as Microsoft Teams and Zoom, enhancing market competitiveness. Users can find this feature under the "Appearance" settings in "Portrait Editing." The feature has been available since 2023 and continues to optimize virtual beauty effects.

Google Meet Launches AI Makeup Filter: Go to Meetings with a Natural Look but Stylish Appearance

Google Meet has launched an AI makeup filter, offering 12 types of virtual makeup styles. Users can access it under the 'Appearance' section's 'Portrait Enhancement' option. This feature allows users to look polished without real makeup, aiming to enhance the video conferencing experience and compete with similar features on platforms like Microsoft Teams and Zoom. Previously, Google Meet had introduced basic beauty enhancement features in 2023.

Google's New AI Tool Helps Schedule Meetings with Emails, Making It Easy to Arrange Your Schedule!

Google has launched a new Gmail feature called 'Help Me Schedule,' based on Gemini technology, which automatically analyzes the user's calendar free time and recommends suitable one-on-one meeting times to the recipient, simplifying the meeting scheduling process. This feature does not currently support multi-party meetings, and users can access it through a button in the email composition interface.

Google Introduces New Method: Training Data Reduced by 10,000 Times to Improve Model Accuracy

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Google Gemini 3.0 Pro Begins Limited Release: Enhanced Reasoning Capabilities, Official Launch May Be at the End of This Month

AI Daily: Google Releases Veo 3.1; Tongyi Qianwen Introduces Qwen Chat Memory Feature; Sora2 Free Users Can Generate 15-Second Videos