Unsloth Fixes Qwen3.5-35B Tool Calling
๐กFixed Qwen3.5-35B crushes research tasks locally, beats cloud giants
โก 30-Second TL;DR
What Changed
Unsloth releases fixed GGUF quants resolving tool calling bugs
Why It Matters
Empowers local AI practitioners with a high-parameter model for advanced research without cloud dependency. Boosts open-source LLM viability for complex tool-using workflows.
What To Do Next
Download unsloth/Qwen3.5-35B-A3B-GGUF from Hugging Face and test tool calling via llama.cpp.
๐ง Deep Insight
Web-grounded analysis with 6 cited sources.
๐ Enhanced Key Takeaways
- โขUnsloth's Dynamic quants for Qwen3.5-35B achieve state-of-the-art (SOTA) performance on nearly all bit widths, validated by over 150 KL Divergence benchmarks across 9TB of GGUFs[1][2].
- โขThe tool-calling fix addresses a chat template bug universally, impacting all Qwen3.5 formats and uploaders, not limited to Unsloth's GGUF quants[1][2].
- โขUnsloth is retiring MXFP4 layers from specific Qwen3.5 GGUFs (Q2_K_XL, Q3_K_XL, Q4_K_XL) due to benchmark findings[1].
๐ ๏ธ Technical Deep Dive
- โขUnsloth Dynamic 2.0 GGUFs for Qwen3.5-35B show 99.9% KL Divergence on the Pareto Frontier, outperforming alternatives in accuracy and size (e.g., dynamic 4bit version is 2GB smaller with +1% accuracy vs QAT)[2][3].
- โขOptimal quantization targets ffn_up_exps, ffn_gate_exps at 3bit (e.g., iq3_xxs) for best disk space and KLD balance; avoid heavy quantization on ssm_out due to high KLD increase[2].
- โขQwen3.5-35B-A3B recommended over 27B for faster inference when hardware fits, while 27B prioritizes slight accuracy gains[1].
๐ฎ Future ImplicationsAI analysis grounded in cited sources
โณ Timeline
๐ Sources (6)
Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.
Weekly AI Recap
Read this week's curated digest of top AI events โ
๐Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: Reddit r/LocalLLaMA โ