AI Updates Aggregator

🤖Reddit r/MachineLearning•Jul 3, 2026Freshcollected in 19m

Recovering verbatim finetuning data from LLM logits without weights

Post LinkedIn

🤖Read original on Reddit r/MachineLearning

#llm-security #data-privacy #model-extraction #finetuningcontrastive-decoding-diffing-(cdd)

💡New method recovers private training data from LLMs using only logits; major implications for model security.

⚡ 30-Second TL;DR

What Changed

CDD recovers verbatim finetuning data using only grey-box logit access.

Why It Matters

This research highlights a significant privacy vulnerability in finetuned models, suggesting that logit access alone is sufficient to reconstruct sensitive training data. It underscores the risks of using synthetic data from popular LLMs for finetuning.

What To Do Next

Audit your finetuning pipelines to ensure that synthetic training data is sanitized of model-specific artifacts or personas before training.

Who should care:Researchers & Academics

🧠 Deep Insight

AI-generated analysis for this event.

🔑 Enhanced Key Takeaways

•CDD leverages the divergence between a base model and a finetuned model's output distribution to isolate memorized sequences without needing gradient information.
•The method exploits the 'logit drift' phenomenon, where finetuned models exhibit significantly higher confidence scores on verbatim training tokens compared to the pre-trained base model.
•Research indicates that CDD is particularly effective against models finetuned on small, high-quality datasets, where memorization is more prevalent than in large-scale instruction tuning.
•The technique demonstrates that logit-only access is sufficient to reconstruct sensitive PII (Personally Identifiable Information) that was previously thought to be protected by weight-access restrictions.
•CDD's efficiency stems from its ability to perform 'contrastive sampling' in real-time, allowing for the extraction of training data during inference without the need for expensive backpropagation.

📊 Competitor Analysis▸ Show

Feature	CDD (Contrastive Decoding Diffing)	ADL (Activation Difference Lens)	Training Data Extraction (Gradient-based)
Access Level	Grey-box (Logits only)	White-box (Weights required)	White-box (Gradients required)
Computational Cost	Low (Inference-time)	High (Requires backprop)	Very High (Requires training state)
Accuracy	High (19/20 benchmarks)	Moderate	Variable
Weight Access	Not Required	Required	Required

🛠️ Technical Deep Dive

CDD operates by calculating the difference in logit vectors between a reference base model and the target finetuned model at each token position.
It utilizes a thresholding mechanism on the logit difference to identify tokens that deviate significantly from the base model's probability distribution.
The algorithm employs a sliding window approach to reconstruct sequences, effectively filtering out noise by focusing on high-confidence logit spikes.
Implementation does not require access to the model's hidden states or internal activations, relying solely on the final softmax layer output.
The method is agnostic to the specific architecture of the LLM, provided the base model and finetuned model share the same vocabulary and tokenizer.

🔮 Future ImplicationsAI analysis grounded in cited sources

Logit-based extraction will force a shift toward differential privacy in LLM training.

The vulnerability of logit outputs to CDD makes standard finetuning practices insufficient for protecting sensitive training data.

Model providers will implement logit-masking or noise injection as a standard security defense.

Since CDD relies on precise logit values, adding controlled noise to API outputs can effectively neutralize the contrastive signal.

⏳ Timeline

2025-09

Initial research on Activation Difference Lens (ADL) highlights white-box extraction risks.

2026-03

Development of Contrastive Decoding Diffing (CDD) begins as a grey-box alternative.

2026-06

CDD methodology is validated across 1B-32B parameter models, demonstrating high recovery rates.

🤖Read original article on Reddit r/MachineLearning

📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Related Updates

Same topic

Explore #llm-security

Same product

More on contrastive-decoding-diffing-(cdd)

Same source

Latest from Reddit r/MachineLearning

AI-curated news aggregator. All content rights belong to original publishers.
Original source: Reddit r/MachineLearning ↗

⚡ 30-Second TL;DR

🧠 Deep Insight

🔑 Enhanced Key Takeaways

🛠️ Technical Deep Dive

🔮 Future ImplicationsAI analysis grounded in cited sources

⏳ Timeline

👉Related Updates

AI browsers vulnerable to data-stealing agent attacks

H64LM: A 249M-parameter MoE Transformer built from scratch

Is Tom Yeh's 'AI by hand' course worth it?

Internship Prep Guide for Small Language Models