🐯Freshcollected in 4m

DeepSeek's Deleted Visual Primitives Paper

DeepSeek's Deleted Visual Primitives Paper
PostLinkedIn
🐯Read original on 虎嗅

💡DeepSeek's 'finger' trick (points/boxes) fixes top models' visual counting fails at 7056x compression.

⚡ 30-Second TL;DR

What Changed

Introduces points/boxes as visual primitives embedded in CoT to reference image elements precisely

Why It Matters

This approach could outperform GPT/Claude/Gemini in visual tasks with far less compute, democratizing efficient multimodal AI. Highlights DeepSeek's edge in MoE efficiency for vision-language models.

What To Do Next

Test embedding bounding box coordinates in your multimodal CoT prompts for better visual counting.

Who should care:Researchers & Academics
📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Related Updates

AI-curated news aggregator. All content rights belong to original publishers.
Original source: 虎嗅