AI Updates Aggregator

📰New York Times Technology•Jun 25, 2026Freshcollected in 10m

NYT Expands Lawsuit Against OpenAI and Microsoft

Post LinkedIn

📰Read original on New York Times Technology

#copyright #legal-risk #data-governanceopenai-api

💡A landmark legal shift that could redefine how AI companies source training data and manage copyright risks.

⚡ 30-Second TL;DR

What Changed

NYT alleges Microsoft actively encouraged OpenAI's data scraping practices

Why It Matters

This case could set a legal precedent for how AI companies source training data, potentially forcing a shift toward licensed datasets.

What To Do Next

Review your data ingestion pipeline and ensure you have clear licensing or usage rights for all training corpora.

Who should care:Founders & Product Leaders

🧠 Deep Insight

AI-generated analysis for this event.

🔑 Enhanced Key Takeaways

•The amended complaint introduces claims under the Digital Millennium Copyright Act (DMCA), specifically alleging that Microsoft and OpenAI removed Copyright Management Information (CMI) from NYT articles during the ingestion process.
•NYT's legal strategy now emphasizes the 'joint venture' nature of the relationship, arguing that Microsoft's infrastructure and financial backing make it a direct participant in the alleged infringement rather than a passive platform provider.
•The lawsuit highlights the integration of 'Browse with Bing' as a mechanism that allegedly bypasses paywalls and robots.txt protocols to harvest content for model fine-tuning.
•Microsoft has countered by arguing that its AI services fall under 'fair use' protections, asserting that the transformative nature of LLMs creates new value rather than serving as a market substitute for journalism.
•Discovery requests in the expanded suit seek internal communications between OpenAI and Microsoft executives regarding the specific datasets used to train GPT-4 and subsequent iterations.

📊 Competitor Analysis▸ Show

Feature	OpenAI/Microsoft	Google (Gemini)	Anthropic (Claude)
Data Sourcing	Web scraping/Partnerships	Proprietary Search Index	Web scraping/Partnerships
Legal Strategy	Fair Use / Licensing	Fair Use / Licensing	Fair Use / Licensing
Training Data Transparency	Low (Closed)	Low (Closed)	Low (Closed)

🛠️ Technical Deep Dive

The core technical dispute centers on the 'ingestion pipeline' where raw HTML is stripped of metadata, including CMI, before being tokenized for training.
The lawsuit challenges the architecture of Retrieval-Augmented Generation (RAG) systems, claiming that the way models retrieve and present snippets of NYT content constitutes unauthorized reproduction.
The legal focus includes the 'weighting' of training data, where the plaintiffs allege that high-quality journalistic content is prioritized to improve model reasoning and factual accuracy.

🔮 Future ImplicationsAI analysis grounded in cited sources

Establishment of a 'Data Licensing' industry standard.

A court ruling against OpenAI would force AI companies to shift from 'scraping-first' models to mandatory paid licensing agreements with major publishers.

Increased technical implementation of 'opt-out' protocols.

Legal pressure is forcing AI developers to adopt more robust, standardized technical methods for publishers to block AI crawlers beyond traditional robots.txt.

⏳ Timeline

2023-12

The New York Times files initial copyright infringement lawsuit against OpenAI and Microsoft.

2024-02

OpenAI files a motion to dismiss parts of the lawsuit, claiming the NYT 'hacked' ChatGPT to generate evidence.

2024-09

Judge denies parts of OpenAI's motion to dismiss, allowing the core copyright claims to proceed.

2026-06

The New York Times amends the complaint to specifically target Microsoft's role in encouraging data scraping.

📰Read original article on New York Times Technology

📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Related Updates

Same topic

Explore #copyright

Same product

AI-curated news aggregator. All content rights belong to original publishers.
Original source: New York Times Technology ↗