Salesforce has been sued by two novelists for allegedly using pirated books to train its xGen series of large language models. The lawsuit was filed on October 15 in the U.S. District Court in San Francisco. Plaintiffs Molly Tanzer and Jennifer Gilmore accuse Salesforce of downloading, storing, copying, and using a large dataset of copyrighted books without authorization to develop its AI model.
This is not an isolated incident; similar copyright infringement allegations have frequently emerged in the AI industry. Just last month, generative AI company Anthropic reached a $1.5 billion settlement over allegations that it used millions of pirated books to train its models. Michael Bennett, vice provost for data science and AI strategy at the University of Illinois Chicago, said that the Salesforce case is very similar to the Anthropic case. In the Anthropic case, the judge ruled that the use of legally obtained works for training models constitutes "fair use," while illegally obtained works do not enjoy this protection.
Currently, it is likely that the Salesforce case will be resolved through a settlement, similar to the outcome of the Anthropic case. Kashyap Kompella, founder and analyst at RPA2AI, believes that this case shows that copyright holders have certain legal leverage, and the source of training data is both a business and legal issue.
Additionally, this lawsuit may further negatively impact Salesforce, especially causing concerns among its enterprise customers about trust in its models and training datasets. Kompella emphasized that enterprise customers need to ensure that the data sources used by their AI vendors are licensed, auditable, and reasonable, which is crucial for enterprises.
Similar lawsuits could become obstacles for broader AI technology applications. When selecting AI vendors, companies must thoroughly understand the sources of training data and related compensation terms.
Key Points:
- 📚 Salesforce is being sued for allegedly using pirated books to train its AI models.
- ⚖️ The case may be settled, similar to the outcome of the Anthropic case.
- 🔍 Enterprise customers' trust in AI models may be affected, and they need to ensure the legality of data sources.