Unsloth Fixes Qwen3.5-35B Tool Calling

Post LinkedIn

🦙Read original on Reddit r/LocalLLaMA

#tool-calling #quantization #rocm-inferenceqwen3.5-35b-a3b

💡Fixed Qwen3.5-35B crushes research tasks locally, beats cloud giants

⚡ 30-Second TL;DR

What Changed

Unsloth releases fixed GGUF quants resolving tool calling bugs

Why It Matters

Empowers local AI practitioners with a high-parameter model for advanced research without cloud dependency. Boosts open-source LLM viability for complex tool-using workflows.

What To Do Next

Download unsloth/Qwen3.5-35B-A3B-GGUF from Hugging Face and test tool calling via llama.cpp.

Who should care:Researchers & Academics

🧠 Deep Insight

Web-grounded analysis with 6 cited sources.

🔑 Enhanced Key Takeaways

•Unsloth's Dynamic quants for Qwen3.5-35B achieve state-of-the-art (SOTA) performance on nearly all bit widths, validated by over 150 KL Divergence benchmarks across 9TB of GGUFs[1][2].
•The tool-calling fix addresses a chat template bug universally, impacting all Qwen3.5 formats and uploaders, not limited to Unsloth's GGUF quants[1][2].
•Unsloth is retiring MXFP4 layers from specific Qwen3.5 GGUFs (Q2_K_XL, Q3_K_XL, Q4_K_XL) due to benchmark findings[1].

🛠️ Technical Deep Dive

•Unsloth Dynamic 2.0 GGUFs for Qwen3.5-35B show 99.9% KL Divergence on the Pareto Frontier, outperforming alternatives in accuracy and size (e.g., dynamic 4bit version is 2GB smaller with +1% accuracy vs QAT)[2][3].
•Optimal quantization targets ffn_up_exps, ffn_gate_exps at 3bit (e.g., iq3_xxs) for best disk space and KLD balance; avoid heavy quantization on ssm_out due to high KLD increase[2].
•Qwen3.5-35B-A3B recommended over 27B for faster inference when hardware fits, while 27B prioritizes slight accuracy gains[1].

🔮 Future ImplicationsAI analysis grounded in cited sources

Unsloth Dynamic quants will dominate low-bit local inference for Qwen3.5 models.

Benchmarks demonstrate SOTA KL Divergence across bits with uploaded 9TB artifacts confirming superior accuracy-efficiency tradeoffs[2].

Tool-calling reliability will improve across all Qwen3.5 deployments.

Universal chat template fix applies to any format or uploader, resolving a widespread bug reported in community tooling[1][2].

⏳ Timeline

2026-02

Qwen3.5 model series released by Alibaba

2026-02-27

Unsloth Dynamic 2.0 GGUFs updated with Qwen3.5 support and initial tool-calling fixes

2026-03-03

Unsloth releases updated Qwen3.5-35B-A3B GGUFs with finalized tool-calling fixes and SOTA benchmarks

📎 Sources (6)

Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.

🦙Read original article on Reddit r/LocalLLaMA

📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Related Updates

Same topic

Explore #tool-calling

Same product