🐯虎嗅•Freshcollected in 4m
DeepSeek's Deleted Visual Primitives Paper

💡DeepSeek's 'finger' trick (points/boxes) fixes top models' visual counting fails at 7056x compression.
⚡ 30-Second TL;DR
What Changed
Introduces points/boxes as visual primitives embedded in CoT to reference image elements precisely
Why It Matters
This approach could outperform GPT/Claude/Gemini in visual tasks with far less compute, democratizing efficient multimodal AI. Highlights DeepSeek's edge in MoE efficiency for vision-language models.
What To Do Next
Test embedding bounding box coordinates in your multimodal CoT prompts for better visual counting.
Who should care:Researchers & Academics
📰
Weekly AI Recap
Read this week's curated digest of top AI events →
👉Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: 虎嗅 ↗
