Qwen3.5-35B Crashes with Claude Code on llama.cpp
๐กQwen3.5-35B + llama.cpp claude code crashes โ fix or workaround?
โก 30-Second TL;DR
What Changed
Crashes/unloads on 2nd or 3rd prompt with claude code interpreter
Why It Matters
Highlights compatibility issues for Qwen3.5-35B-A3B in code interpreter workflows on llama.cpp, pushing users toward alternatives like OpenCode.
What To Do Next
Switch to OpenCode interpreter for Qwen3.5-35B-A3B on llama.cpp to avoid crashes.
๐ง Deep Insight
Web-grounded analysis with 9 cited sources.
๐ Enhanced Key Takeaways
- โขQwen3.5 models in llama.cpp force full prompt reprocessing on every generation due to KV cache truncation failures, leading to inefficiency in multi-turn conversations[1][8].
- โขMultiple GitHub issues document Qwen3-Coder-Next crashes from empty grammar stacks during long-context (80K+ tokens) tool calls across CUDA, ROCm, and Vulkan backends[1][2].
- โขROCm support for newer Qwen3.5 models is broken in recent llama.cpp builds, causing failures while Vulkan works reliably[9].
๐ ๏ธ Technical Deep Dive
- โขQwen3.5moe bug triggers full prompt reprocess after each message due to 'failed to truncate' errors in token cache, confirmed in llama.cpp issue #19858[1][8].
- โขLong-prompt crashes (20K+ tokens) with Qwen3.5MoE on multi-GPU setups stem from CUDA graph optimizations; fixed by replacing in-place tensor ops with non-inplace versions[1].
- โขQwen3-Coder-Next GGUF fails on Windows due to unknown architecture errors and crashes on tool calls from invalid JSON with duplicate fields[1].
- โขTool calling requires specific flags like --tool-call-parser qwen3_coder and higher quantization (Q6_K+) for proper schema adherence[4].
๐ฎ Future ImplicationsAI analysis grounded in cited sources
โณ Timeline
๐ Sources (9)
Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.
- buttondown.com โ Weekly Github Report for Llamacpp February 16 8993
- buttondown.com โ Weekly Github Report for Llamacpp February 01 2667
- GitHub โ 19860
- dev.to โ Qwen3 Coder Next the Complete 2026 Guide to Running Powerful AI Coding Agents Locally 1k95
- GitHub โ 14419
- forums.developer.nvidia.com โ 360892
- forums.developer.nvidia.com โ 361639
- GitHub โ 19858
- GitHub โ 19880
Weekly AI Recap
Read this week's curated digest of top AI events โ
๐Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: Reddit r/LocalLLaMA โ