๐Ÿค–Stalecollected in 3h

COCONUT Latent Reasoning Debunked by Controls

PostLinkedIn
๐Ÿค–Read original on Reddit r/MachineLearning

๐Ÿ’กDebunks COCONUT claims: curriculum > recycling, hurts OOD (code public)

โšก 30-Second TL;DR

What Changed

Trained GPT-2 124M controls on ProsQA: M2 (COCONUT) 97.0%, M3 (fixed embeddings) 96.6%, p=0.845

Why It Matters

Undermines latent space reasoning claims, highlighting curriculum's role in high ProsQA scores. Reveals risks of recycled states like overconfidence and poor extrapolation. Encourages rigorous controls in reasoning research.

What To Do Next

Replicate experiments with the GitHub code on ProsQA using GPT-2.

Who should care:Researchers & Academics

๐Ÿง  Deep Insight

Web-grounded analysis with 7 cited sources.

๐Ÿ”‘ Enhanced Key Takeaways

  • โ€ขCOCONUT enables breadth-first search in latent space by encoding multiple alternative reasoning paths in continuous thoughts, outperforming CoT on tasks requiring backtracking like ProsQA[1][2].
  • โ€ขWithout multi-stage curriculum training, COCONUT models fail to learn effective latent reasoning and perform no better than baselines without CoT[2][3][5].
  • โ€ขCOCONUT exhibits training instability and compute inefficiency, with pretraining costs growing exponentially with thinking tokens and adaptation challenges to hidden state inputs[1][5].

๐Ÿ› ๏ธ Technical Deep Dive

  • โ€ขCOCONUT uses the last hidden state of a pretrained GPT-2 as 'continuous thought', fed directly back as input embedding without token decoding, applied iteratively[1][3].
  • โ€ขMulti-stage curriculum progressively increases continuous thoughts (k=1 to higher), mixing prior stage data (0.3 probability) to prevent forgetting, with cross-entropy loss on remaining tokens[2][3][4].
  • โ€ขAt inference, increasing thoughts per step (up to 2-6 on ProsQA) improves accuracy via search tree exploration, prioritizing promising paths and pruning others based on implicit probabilities[2][4].

๐Ÿ”ฎ Future ImplicationsAI analysis grounded in cited sources

Latent reasoning paradigms like COCONUT will require curriculum learning for reliable scaling beyond toy benchmarks.
Theoretical analysis links instability to distributional mismatch in continuous representations, resolved only by staged supervised latent states as in Coconut[5].
COCONUT's compute inefficiency will limit adoption in large-scale models without architectural fixes.
Replications show exponential pretraining costs with thinking tokens and slow adaptation to hidden state inputs[1].

โณ Timeline

2026-02
Meta releases COCONUT (Chain of Continuous Thought) paper introducing latent space reasoning via hidden state recycling on GPT-2[1][2]
2026-02
Independent replications confirm COCONUT's token efficiency but highlight compute costs and curriculum necessity[1]
2026-03
Controlled experiments on Reddit debunk innate latent reasoning in COCONUT, attributing gains to multistage training[article]
๐Ÿ“ฐ

Weekly AI Recap

Read this week's curated digest of top AI events โ†’

๐Ÿ‘‰Related Updates

AI-curated news aggregator. All content rights belong to original publishers.
Original source: Reddit r/MachineLearning โ†—