๐ฆReddit r/LocalLLaMAโขStalecollected in 22h
Qwen3.5-35B-A3B Runs on Raspberry Pi 5

๐ก35B Qwen model hits 3 t/s on Pi 5โedge LLM inference breakthrough
โก 30-Second TL;DR
What Changed
3+ tokens/s on Raspberry Pi 5 16GB with 2-bit quants
Why It Matters
Demonstrates feasibility of massive LLMs on low-power ARM hardware, opening doors for affordable edge AI deployments.
What To Do Next
Replicate Qwen3.5-35B-A3B 2-bit quant on your Pi 5 with llama.cpp.
Who should care:Developers & AI Engineers
๐ง Deep Insight
Web-grounded analysis with 6 cited sources.
๐ Enhanced Key Takeaways
- โขQwen3.5-35B-A3B is a native vision-language model supporting text, image, and video inputs with a 262,144 token context length.
- โขThe model employs a hybrid architecture combining linear attention mechanisms and a sparse mixture-of-experts (MoE) design for improved inference efficiency.
- โขOn Alibaba's API, it achieves 173.6 tokens per second output speed and scores 37 on the Artificial Analysis Intelligence Index, surpassing the median for similar open-weight models.
๐ ๏ธ Technical Deep Dive
- โขHybrid architecture: Integrates linear attention with sparse MoE, where A3B likely indicates ~3B active parameters per token for efficiency.
- โขMultimodal capabilities: Native support for text, image, and video inputs; output is text-only.
- โขContext window: 262,144 tokens, enabling long-context processing.
- โขPerformance benchmarks: 173.6 t/s output speed via Alibaba API; Intelligence Index of 37 (above median 15 for ~35B models).
- โขReasoning mode: Supports step-by-step reasoning via API parameters like 'reasoning' for detailed internal thought processes.
๐ฎ Future ImplicationsAI analysis grounded in cited sources
Edge AI devices will increasingly run 30B+ parameter VLMs at usable speeds
Demonstrated 3+ t/s on Raspberry Pi 5 with 2-bit quantization highlights MoE efficiency enabling agentic tasks on low-power ARM hardware.
MoE architectures like A3B will dominate efficient large model deployments
Qwen3.5-35B-A3B matches or exceeds denser 27B model performance while using fewer active parameters per inference.
โณ Timeline
2026-02
Alibaba releases Qwen3.5 series including 35B-A3B MoE model
2026-02-25
Qwen3.5-35B-A3B added to model directories with full specs published
๐ Sources (6)
Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.
๐ฐ
Weekly AI Recap
Read this week's curated digest of top AI events โ
๐Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: Reddit r/LocalLLaMA โ
