Meta allegedly used 'crude tactics' to close in on OpenAI’s 2-year lead building AI uncontested — Sam Altman admitted creating ChatGPT without copyrighted data is impossible

Text of LlaMA by Meta on a phone's display resting on a laptop's keyboard.
Meta reportedly resorted to "crude tactics" to narrow OpenAI’s 2-year AI lead. (Image credit: Getty Images | SOPA)

Meta's lagging AI efforts are making news again. Microsoft CEO Satya Nadella recently admitted that OpenAI had a 2-year runway in the AI race to work uncontested and build ChatGPT. While other top AI labs, such as Anthropic and Google, are swiftly picking up the slack, Meta is seemingly having a long day at the office trying to keep up.

According to internal communications within Meta Inc. during a major copyright lawsuit battle, the company allegedly used copyrighted content to train its AI models and seemingly tried to cover its tracks to avoid copyright infringement-related issues (via The Verge).

Interestingly, the company's deceitful tactics aimed to expedite the process of catching up with OpenAI's rapid progression in the AI landscape. An email sent to Meta AI researcher Hugo Touvron by the company's VP of gen AI revealed the company's “needs to be GPT4,” which would involve learning "how to build frontier and win this race.”

However, intricated details about the Facebook maker's plans to achieve these goals reportedly involved the book piracy site Library Genesis (LibGen), which would be used to train its models.

The Verge's damning report further revealed another email from Meta's Director of Product, Sony Theakanath, to Joelle Pineau, VP of AI Research, seeking clarity on whether to use LibGen's data internally for benchmarks included in a blog post or use the site's data to train a model. In the email, Theakanath indicated Gen AI had been approved to use LibGen for Llama3 but with several mitigations, including scrapping data labeled as pirated or stolen without indicating that the model was trained using data from the site.

According to Theakanath, “Libgen is essential to meet SOTA [state-of-the-art] numbers.” He further indicated that “it is known that OpenAI and Mistral are using the library for their models (through word of mouth)” after escalating the issue to an executive within the organization under MZ, presumably Meta CEO Mark Zuckerberg.

The email also highlighted potential policy risks caused by training the AI models using copyrighted content, including regulatory response and intervention measures following media coverage, highlighting Meta's copyright infringement practices. “This may undermine our negotiating position with regulators on these issues,” added Theakanath.

Meta reportedly turned to crafty measures to cover its tracks after using LibGen's data to train its AI models, including removing copyright headers and document identifiers such as the copyright symbol. The document also disclosed comments by employees to further blur the lines, including scrapping metadata “to avoid potential legal complications.”

(Image credit: Getty Images | KIRILL KUDRYAVTSEV )

Microsoft and OpenAI have been wrapped up in countless copyright infringement lawsuits. And while some of these cases are still in court, OpenAI CEO Sam Altman admitted that training AI models without copyrighted content is virtually impossible. He further indicated that almost everything on the internet is copyrighted, deeming the use of copyrighted content to train AI models as fair use. He argued the copyright law doesn't categorically prohibit training of AI models using copyrighted content.

More recently, reports indicated that top AI labs, including OpenAI and Anthropic, are struggling to develop advanced AI systems due to a lack of high-quality content. However, leaders in the AI landscape, including Sam Altman and the former Google CEO, have disputed the claims, citing no evidence showing scaling laws have begun; "there's no wall."

CATEGORIES
Kevin Okemwa
Contributor

Kevin Okemwa is a seasoned tech journalist based in Nairobi, Kenya with lots of experience covering the latest trends and developments in the industry at Windows Central. With a passion for innovation and a keen eye for detail, he has written for leading publications such as OnMSFT, MakeUseOf, and Windows Report, providing insightful analysis and breaking news on everything revolving around the Microsoft ecosystem. You'll also catch him occasionally contributing at iMore about Apple and AI. While AFK and not busy following the ever-emerging trends in tech, you can find him exploring the world or listening to music.

Read more
The logos of OpenAI and DeepSeek artificial intelligence apps on mobile phones.
Is DeepSeek's AI a brand-new secondhand ChatGPT? A "unanimous jury" rules its AI-generated text matches OpenAI models by 74%
Mark Zuckerberg, chief executive officer of Meta Platforms Inc.
"We regularly evaluate all competitive models": DeepSeek AI reportedly outperforms Llama's next version, throwing Meta into panic mode with "4 war rooms of engineers" analyzing its cost-effective AI success
Deepseek logo with stock market ticker and headline on a TV.
OpenAI and Microsoft ironically accuse DeepSeek of copyright infringement — training its cost-effective model with privileged data
OpenAI and ChatGPT
"We made a mistake in not being more transparent": OpenAI secretly accessed benchmark data, raising questions about the AI model's supposedly "high scores" — after Sam Altman touted it as "very good"
ChatGPT logo is seen displayed on a smartphone screen next to a laptop keyboard.
I love ChatGPT-4o's unhinged image-generation capabilities — but I'm afraid imminent censorship by OpenAI lurks on the horizon
Andrew Bosworth, chief technology officer of Meta Platforms Inc.
Meta CTO reportedly predicted DeepSeek AI six months ago: "They've probably done a lot of distilling against existing models."
Latest in Software Apps
Photo of Microsoft's new sign-in page for Xbox.com using the Microsoft Edge browser.
Over one billion users will get a new Microsoft user experience, and it has a dark mode
ChatGPT logo is seen displayed on a smartphone screen next to a laptop keyboard.
I love ChatGPT-4o's unhinged image-generation capabilities — but I'm afraid imminent censorship by OpenAI lurks on the horizon
Bill Gates, co-chairman of the Bill and Melinda Gates Foundation, delivers a keynote speech on the closing day of the Global Solutions Summit in Berlin, Germany, on Tuesday, May 7, 2024.
Bill Gates says "AI will replace humans for most things" — Rendering doctors and tutors obsolete within a decade
Artificial intelligence mobile apps for DeepSeek, ChatGPT and Google Gemini arranged.
Google says its latest reasoning model is its "most intelligent" — but Microsoft's CEO claims Google already fumbled its AI opportunity
ChatGPT and Microsoft Logo
ChatGPT’s new image-generation tool is impressive; it can finally create a glass of wine filled to the brim — but it struggles with blank white images and appears to discriminate against 'sexy women'
Microsoft Edge Sidebar
My favorite Microsoft Edge feature just got an AI upgrade — is this the best way to use Copilot on Windows 11?
Latest in News
Cloud servers
Microsoft has killed "several" data center projects in the U.S. and Europe, according to reports — Microsoft responds (Updated)
Photo of Microsoft's new sign-in page for Xbox.com using the Microsoft Edge browser.
Over one billion users will get a new Microsoft user experience, and it has a dark mode
The Thing: Remastered key art
The Thing comes to Xbox Cloud Gaming's "Stream Your Own Game" library alongside other new arrivals
Promotional screenshot of heroes fighting a giant in Pillars of Eternity
Obsidian's classic Baldur's Gate successor 'Pillars of Eternity' is getting a surprise turn-based mode later this year, alongside other updates
Atomfall
Atomfall reviews and Metacritic scores are in: Here's a roundup of what everyone's saying about this new Game Pass survival game
Screenshot of one of the new flat world presets in Minecraft.
Minecraft testing new flat world presets and a better way to locate your friends in-game