Qwen3.5-35B Crashes with Claude Code on llama.cpp

Post LinkedIn

🦙Read original on Reddit r/LocalLLaMA

#compatibility #llama-cpp #bug-report #code-interpreterqwen3.5-35b-a3b

💡Qwen3.5-35B + llama.cpp claude code crashes – fix or workaround?

⚡ 30-Second TL;DR

What Changed

Crashes/unloads on 2nd or 3rd prompt with claude code interpreter

Why It Matters

Highlights compatibility issues for Qwen3.5-35B-A3B in code interpreter workflows on llama.cpp, pushing users toward alternatives like OpenCode.

What To Do Next

Switch to OpenCode interpreter for Qwen3.5-35B-A3B on llama.cpp to avoid crashes.

Who should care:Developers & AI Engineers

🧠 Deep Insight

Web-grounded analysis with 9 cited sources.

🔑 Enhanced Key Takeaways

•Qwen3.5 models in llama.cpp force full prompt reprocessing on every generation due to KV cache truncation failures, leading to inefficiency in multi-turn conversations[1][8].
•Multiple GitHub issues document Qwen3-Coder-Next crashes from empty grammar stacks during long-context (80K+ tokens) tool calls across CUDA, ROCm, and Vulkan backends[1][2].
•ROCm support for newer Qwen3.5 models is broken in recent llama.cpp builds, causing failures while Vulkan works reliably[9].

🛠️ Technical Deep Dive

•Qwen3.5moe bug triggers full prompt reprocess after each message due to 'failed to truncate' errors in token cache, confirmed in llama.cpp issue #19858[1][8].
•Long-prompt crashes (20K+ tokens) with Qwen3.5MoE on multi-GPU setups stem from CUDA graph optimizations; fixed by replacing in-place tensor ops with non-inplace versions[1].
•Qwen3-Coder-Next GGUF fails on Windows due to unknown architecture errors and crashes on tool calls from invalid JSON with duplicate fields[1].
•Tool calling requires specific flags like --tool-call-parser qwen3_coder and higher quantization (Q6_K+) for proper schema adherence[4].

🔮 Future ImplicationsAI analysis grounded in cited sources

llama.cpp b8179+ will patch Qwen3.5 full reprocessing via improved KV cache truncation

Ongoing GitHub issues like #19858 detail the truncation failure as the root cause, with community bisecting and patches already resolving similar bugs in prior builds[1][8].

Qwen3.5 stability on ROCm will lag Vulkan by 1-2 months

Issue #19880 confirms ROCm breakage for new Qwen models while Vulkan succeeds, mirroring patterns in prior Qwen3-Coder-Next ROCm crashes fixed via targeted updates[1][9].

⏳ Timeline

2026-02

llama.cpp weekly report documents Qwen3.5moe full reprocess bug and long-prompt multi-GPU crashes

2026-02

Qwen3-Coder-Next GGUF crashes reported on ROCm, Windows, and tool calls in llama.cpp issues

2026-02

Issue #19858 opened: Qwen3.5 forces full prompt reprocessing due to cache truncation failure

2026-02

Issue #19860 opened: CUDA errors and crashes with Qwen3.5-27B in llama-bench/server

2026-02

Issue #19880 opened: ROCm support broken for newer Qwen3.5 models in llama.cpp

📎 Sources (9)

Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.

🦙Read original article on Reddit r/LocalLLaMA

📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Related Updates

Same topic

Explore #compatibility

Same product