The issue of artificial intelligence training data continues to generate significant legal conflicts. Recently, a class-action lawsuit accused Adobe of misusing authors’ literary works in the development process of its AI systems, raising increasingly urgent questions about how tech companies collect and employ copyrighted content.
Adobe in the spotlight: details of the controversy
The case revolves around the company’s SlimLM model. According to the complaint filed by Elizabeth Lyon, an Oregon-based writer specializing in non-fiction manuals, Adobe allegedly used pirated versions of numerous books—including works by the plaintiff herself—to pre-train SlimLM. The company describes this project as a series of compact language models designed to optimize document processing tasks on mobile devices.
The issue stems from a chain of processed datasets. SlimLM is said to be based on SlimPajama-627B, a multi-source, open-source dataset made available by Cerebras in June 2023. However, according to the lawsuit, SlimPajama is a manipulated derivative of the RedPajama dataset, which in turn includes the Books3 collection—a database containing 191,000 volumes used to train generative AI systems.
A systemic problem in the industry
The legal action represents another chapter in a dispute plaguing the tech sector. Last September, Apple was sued over similar allegations regarding its Apple Intelligence model, and in October, the lawsuit expanded to include Salesforce. Both cases centered on the improper use of datasets containing copyrighted materials without authorization or compensation to the original authors.
The issue takes on even greater significance considering the agreement reached between Anthropic and several writers: the company agreed to pay $1.5 billion to settle claims related to the use of pirated works in training the chatbot Claude. This transaction could set a significant precedent for future disputes in the industry.
Firefly and Adobe’s AI strategy
It is important to highlight that Firefly, Adobe’s multimedia generative suite launched in 2023, represents the core of the company’s AI strategy. However, these legal disputes threaten to undermine the credibility of the entire Adobe AI ecosystem and could trigger further regulatory scrutiny of the sector.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
The legal landscape of AI becomes more complex: battles over copyright in machine learning
The issue of artificial intelligence training data continues to generate significant legal conflicts. Recently, a class-action lawsuit accused Adobe of misusing authors’ literary works in the development process of its AI systems, raising increasingly urgent questions about how tech companies collect and employ copyrighted content.
Adobe in the spotlight: details of the controversy
The case revolves around the company’s SlimLM model. According to the complaint filed by Elizabeth Lyon, an Oregon-based writer specializing in non-fiction manuals, Adobe allegedly used pirated versions of numerous books—including works by the plaintiff herself—to pre-train SlimLM. The company describes this project as a series of compact language models designed to optimize document processing tasks on mobile devices.
The issue stems from a chain of processed datasets. SlimLM is said to be based on SlimPajama-627B, a multi-source, open-source dataset made available by Cerebras in June 2023. However, according to the lawsuit, SlimPajama is a manipulated derivative of the RedPajama dataset, which in turn includes the Books3 collection—a database containing 191,000 volumes used to train generative AI systems.
A systemic problem in the industry
The legal action represents another chapter in a dispute plaguing the tech sector. Last September, Apple was sued over similar allegations regarding its Apple Intelligence model, and in October, the lawsuit expanded to include Salesforce. Both cases centered on the improper use of datasets containing copyrighted materials without authorization or compensation to the original authors.
The issue takes on even greater significance considering the agreement reached between Anthropic and several writers: the company agreed to pay $1.5 billion to settle claims related to the use of pirated works in training the chatbot Claude. This transaction could set a significant precedent for future disputes in the industry.
Firefly and Adobe’s AI strategy
It is important to highlight that Firefly, Adobe’s multimedia generative suite launched in 2023, represents the core of the company’s AI strategy. However, these legal disputes threaten to undermine the credibility of the entire Adobe AI ecosystem and could trigger further regulatory scrutiny of the sector.