Adobe Sued for Using Pirated Books to Train AI Model, SlimLM Involved in Copyright Controversy

AIbase基地

Published inAI News · 4 min read · Dec 18, 2025

Recently, Oregon author Elizabeth Lyon filed a class-action lawsuit against Adobe, accusing it of using an illegal dataset containing her pirated works in training a small language model called SlimLM.

SlimLM is a series of lightweight language models launched by Adobe, optimized for document-assisting tasks on mobile devices such as summarization, rewriting, and question-answering. Adobe stated that the model was pre-trained on the SlimPajama-627B dataset—a publicly available, deduplicated, and multi-source corpus released by AI chip company Cerebras in June 2023.

However, Lyon's complaint states that SlimPajama is actually a derivative of the RedPajama dataset, which directly copied the notorious Books3 dataset. Books3 contains about 191,000 copyrighted books and has long been accused of heavily incorporating pirated online resources (such as The Bibliotik). The complaint emphasizes: "Since SlimPajama is a derivative of RedPajama, it includes content from Books3, including the copyrighted works of the plaintiff and collective members."

Lyon is the author of several nonfiction writing guides, whose works are alleged to be among those illegally used for training. She accuses Adobe of using her text for commercial AI product development without authorization, attribution, or payment, violating the exclusive rights granted to authors under copyright law.

This is not an isolated incident. Books3 and RedPajama have become frequent topics in AI-related copyright lawsuits:

- In September 2024, Apple was sued for using Books3 to train its Apple Intelligence;

- The same month, Anthropic reached a $1.5 billion settlement with a group of authors over similar allegations, considered a milestone in AI copyright cases;

- In October, Salesforce was also accused of relying on RedPajama to train its AI system.

As generative AI becomes increasingly dependent on massive text data, the legality of training data has evolved from a moral controversy into a legal minefield. Adobe’s current lawsuit once again highlights an industry-wide dilemma: even if using "open-source" datasets, downstream developers may still bear joint liability if the source contains infringing content.

Under the shadow of Anthropic's costly settlement case, how Adobe responds to this lawsuit may influence the entire AI industry's attention to training data traceability and compliance review. For content creators, this lawsuit is not only about protection of rights but also a key confirmation of "who owns creative value in the AI era."

AI Daily: Kolors 3.0 Released; Alibaba's Large Model Brand Officially Renamed Qwen; Mistral AI Releases Voxtral Transcribe 2 Voice Model

Welcome to the [AI Daily] column! Here is your guide to exploring the world of artificial intelligence every day. Each day, we present you with the latest content in the AI field, focusing on developers, helping you grasp technological trends and understand innovative AI product applications. Click to learn more about new AI products: https://app.aibase.com/zh1. 'Global First Subject Reference': Kolors AI 3.0 is officially released, opening the era of AI directing with 15-second long videos. The release of Kolors AI 3.0 marks a new era in AI video creation, through

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

Adobe Sued for Using Pirated Books to Train AI Model, SlimLM Involved in Copyright Controversy

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Topping AI Intelligence Benchmark Tests: Claude Opus 4.6 Strongly Outperforms GPT-5.2

ByteDance Seedance 2.0 Launches Shockingly: Film Hurricane Tim Reveals the AI Training Black Box

16 Claude Agents Wrote 100,000 Lines of Rust Code in Two Weeks, Self-Developed a C Compiler

The Smoke Has Faded? Anthropic Urgently Amends Super Bowl Ad Copy to Avoid Direct Confrontation with OpenAI

OpenAI's First AI Hardware Revealed: The Smart Earbuds Codenamed Dime May Be Released Within the Year

Tencent Games Launches the 2026 Winter Holiday Protection Action for Minors, AI Features Assist Families in Scientific Management

Musk Makes Another Bold Statement: Tesla Humanoid Robot Can Establish Civilization on Habitable Planets Independently

AI Daily: Kolors 3.0 Released; Alibaba's Large Model Brand Officially Renamed Qwen; Mistral AI Releases Voxtral Transcribe 2 Voice Model

Google's Earnings Shock! Annual Revenue Exceeds 400 Billion, AI Surges with Gemini Users Catching Up to ChatGPT

Latency below 0.2 seconds! Mistral AI releases Voxtral Transcribe 2 speech model with support for real-time Chinese transcription

AI News Recommendations

Topping AI Intelligence Benchmark Tests: Claude Opus 4.6 Strongly Outperforms GPT-5.2

ByteDance Seedance 2.0 Launches Shockingly: Film Hurricane Tim Reveals the AI Training Black Box

16 Claude Agents Wrote 100,000 Lines of Rust Code in Two Weeks, Self-Developed a C Compiler

The Smoke Has Faded? Anthropic Urgently Amends Super Bowl Ad Copy to Avoid Direct Confrontation with OpenAI

OpenAI's First AI Hardware Revealed: The Smart Earbuds Codenamed Dime May Be Released Within the Year

Tencent Games Launches the 2026 Winter Holiday Protection Action for Minors, AI Features Assist Families in Scientific Management

Musk Makes Another Bold Statement: Tesla Humanoid Robot Can Establish Civilization on Habitable Planets Independently

AI Daily: Kolors 3.0 Released; Alibaba's Large Model Brand Officially Renamed Qwen; Mistral AI Releases Voxtral Transcribe 2 Voice Model

Google's Earnings Shock! Annual Revenue Exceeds 400 Billion, AI Surges with Gemini Users Catching Up to ChatGPT

Latency below 0.2 seconds! Mistral AI releases Voxtral Transcribe 2 speech model with support for real-time Chinese transcription

GEO Services