DeepSeek may have used Google Gemini data to train new AI model

AIbase基地

Published inAI News · 4 min read · Jun 4, 2025

163

Recently, DeepSeek released an updated version of its R1 reasoning AI model, which demonstrated excellent performance in multiple math and coding benchmarks. However, DeepSeek did not disclose the source of its training data, raising questions among some AI researchers who speculated that the model may have been partially trained on Google's Gemini AI series.

Sam Paeach, a developer from Melbourne, claimed that he found many similarities in word choice and expression between DeepSeek's R1-0528 model and Google Gemini2.5Pro. Although this cannot serve as direct evidence, another developer — the anonymous founder of the SpeechMap project — also mentioned that the "thought trajectory" produced by DeepSeek models during reasoning was identical to Gemini's performance. This finding once again sparked discussions about whether DeepSeek used competitors' data during training.

DeepSeek

Image Source Note: Image generated by AI, image authorization service provided by Midjourney

As early as last December, DeepSeek had been criticized for its V3 model frequently identifying itself as OpenAI's ChatGPT, which hinted that the model might have been trained using ChatGPT chat logs. Earlier this year, OpenAI revealed to the media that it found evidence related to "data distillation" technology used by DeepSeek. "Data distillation" is a method of training new models by extracting information from large models. Bloomberg reported that Microsoft discovered at the end of 2024 that much of the data was leaked through OpenAI developer accounts, which may be associated with DeepSeek.

Although "distillation" technology is not uncommon in the AI community, OpenAI explicitly prohibits users from building competitive products using its model outputs. It should be noted that due to the abundance of low-quality content on the open web, many AI models often mistakenly mimic each other's word choices and phrasing during training. This makes it more complex to deeply analyze the source of training data.

Nathan Lambert, an AI expert, believes that it is not impossible for DeepSeek to have trained its models using Google Gemini data. He mentioned that DeepSeek has sufficient funds to use the best API models available to generate synthetic data. To prevent data from being distilled, AI companies are also constantly strengthening their security measures. For example, OpenAI has started requiring organizations to complete identity verification to access certain advanced models, while Google is also working to enhance the security of its AI Studio platform and limit access to model generation trajectories.

GLM 4.6 by Zhipu is Here: Domestic Chips Join Forces to Drive AI Advancement

Zhipu launches the GLM-4.6 model, using Cambrian Neural's domestic chips, achieving for the first time FP8+Int4 hybrid quantization deployment. This technological breakthrough significantly reduces inference costs while maintaining model accuracy, opening a new path for domestic chips supporting large model local execution.

Accenture Lays Off Over 11,000 Employees, Fully Shifts to Artificial Intelligence

Accenture recently laid off over 11,000 employees, reducing the total number of employees from 791,000 to 779,000. The company warned that further layoffs may occur in the future if employees cannot adapt to AI demands. This round of layoffs is part of an $865 million restructuring plan, which is expected to continue until November. CEO Julie Sweet emphasized the necessity of the transformation.

Qwen3-LiveTranslate-Flash Achieves a Groundbreaking 3-Second Live Translation Delay, Setting a New Industry Record

Qwen3-LiveTranslate-Flash is a multilingual real-time audio-visual translation system launched by Qwen, supporting offline and real-time translation for 18 major languages and various dialects. Its core innovation is the visual context enhancement technology, which not only understands speech but also enhances translation accuracy by combining visual information, bringing a breakthrough in cross-language communication.

Volc Engine Launches Doubao Large Model 1.6-Vision, Achieving Major Breakthroughs in Visual Understanding

Volc Engine launches Doubao Large Model 1.6-Vision, achieving breakthroughs in the field of visual understanding. The core highlight of this model is its ability to call tools, significantly improving the accuracy and processing speed of image recognition and object detection through optimized algorithms and enhanced learning, promoting the development of AI technology applications.

Doubao Large Model 1.6-vision is officially released, with a comprehensive cost reduction of about 50% compared to the previous generation

Volcano Engine released Doubao Large Model 1.6-vision, the first visual deep thinking model in the family with tool invocation capabilities. It enhances multimodal understanding and reasoning capabilities, supports Responses API, and its core advantages include precise visual understanding through tool invocation, the ability to integrate images into the thinking chain, and support for image operations such as positioning, cropping, and selection.

Stanford Top Scientist Xu Zuhong Joins Alibaba Tongyi

Global AI expert Xu Zuhong joins the Alibaba Tongyi team, responsible for the development of multimodal interaction models, drawing attention from the technology community. As an IEEE Fellow, he has over 20 years of AI experience and previously served as a tenured professor at the Singapore Management University and an associate professor at Nanyang Technological University. This move is seen as an important strategic step for Alibaba in the field of AI.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

DeepSeek may have used Google Gemini data to train new AI model

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Interviewer Startup Alex Secures $17 Million in Series A Funding: Completes Thousands of Interviews Daily, Investors Favor the Trend of Recruitment Automation

GLM 4.6 by Zhipu is Here: Domestic Chips Join Forces to Drive AI Advancement

Accenture Lays Off Over 11,000 Employees, Fully Shifts to Artificial Intelligence

Opera Launches AI-Powered Neon Browser for a New Intelligent Browsing Experience

Qwen3-LiveTranslate-Flash Achieves a Groundbreaking 3-Second Live Translation Delay, Setting a New Industry Record

Volc Engine Launches Doubao Large Model 1.6-Vision, Achieving Major Breakthroughs in Visual Understanding

Doubao Large Model 1.6-vision is officially released, with a comprehensive cost reduction of about 50% compared to the previous generation

Stanford Top Scientist Xu Zuhong Joins Alibaba Tongyi

Zhipu Releases Open-Source Large Model GLM-4.6: Programming Capabilities Aligned with Claude Sonnet4

AI Daily: DeepSeek Releases V3.2-exp Model; Claude Sonnet 4.5 Released; ChatGPT Launches Instant Checkout Feature

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

DeepSeek may have used Google Gemini data to train new AI model

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Interviewer Startup Alex Secures $17 Million in Series A Funding: Completes Thousands of Interviews Daily, Investors Favor the Trend of Recruitment Automation

GLM 4.6 by Zhipu is Here: Domestic Chips Join Forces to Drive AI Advancement

Accenture Lays Off Over 11,000 Employees, Fully Shifts to Artificial Intelligence

Opera Launches AI-Powered Neon Browser for a New Intelligent Browsing Experience

Qwen3-LiveTranslate-Flash Achieves a Groundbreaking 3-Second Live Translation Delay, Setting a New Industry Record

Volc Engine Launches Doubao Large Model 1.6-Vision, Achieving Major Breakthroughs in Visual Understanding

Doubao Large Model 1.6-vision is officially released, with a comprehensive cost reduction of about 50% compared to the previous generation

Stanford Top Scientist Xu Zuhong Joins Alibaba Tongyi

Zhipu Releases Open-Source Large Model GLM-4.6: Programming Capabilities Aligned with Claude Sonnet4

AI Daily: DeepSeek Releases V3.2-exp Model; Claude Sonnet 4.5 Released; ChatGPT Launches Instant Checkout Feature

GEO Services