๐ฆReddit r/LocalLLaMAโขStalecollected in 53m
Qwen3 9B runs 6+ t/s on Android phones

๐ก9B LLM hits 6t/s on phonesโunlock mobile AI now
โก 30-Second TL;DR
What Changed
Runs at q4_0 on S25 Ultra with 12GB RAM
Why It Matters
Shows large LLMs like 9B models are feasible on high-end Android phones, enabling edge AI apps without cloud dependency.
What To Do Next
Quantize Qwen3 9B to q4_0 and benchmark on your Android device with llama.cpp.
Who should care:Developers & AI Engineers
๐ง Deep Insight
Web-grounded analysis with 7 cited sources.
๐ Enhanced Key Takeaways
- โขQwen3.5-9B employs a hybrid architecture combining Gated DeltaNet with Sparse MoE, using a 3:1 ratio of linear to softmax attention for reduced memory and compute costs[1][3][7].
- โขThe model supports a 262K token context window and native multimodality, processing text and visual data in a unified latent space[2][6][7].
- โขQwen3.5-9B outperforms prior Qwen3-30B on MMLU and math benchmarks like GSM8K/MATH due to Scaled RL training[1][3].
- โขAt Q4 quantization, it runs on ~5GB RAM with CPU-only inference at 20-30 t/s, and higher speeds like 115-167 t/s in optimized desktop tests[3][5].
๐ ๏ธ Technical Deep Dive
- โขHybrid architecture: Gated DeltaNet + Sparse Mixture-of-Experts (MoE), with 3:1 linear attention to softmax attention ratio, reducing computational cost for long contexts[3][7].
- โขParameter count: 9.65B, natively multimodal vision-language model[6][7].
- โขTraining: Scaled Reinforcement Learning (RL) optimizes logical reasoning, closing gap with 30B+ models[1][2][3].
- โขContext window: 262K tokens[6].
- โขQuantized (Q4 GGUF): ~5GB RAM footprint, supports CUDA/NVIDIA GPU, Metal/Apple Silicon, or CPU inference[3].
๐ฎ Future ImplicationsAI analysis grounded in cited sources
On-device AI apps will deploy 9B-scale models on smartphones by mid-2026
Hybrid attention architectures will become standard in sub-10B models
โณ Timeline
2025-04
Qwen3 released with dense and MoE models up to 235B parameters
2026-03
Qwen3.5 Small series launched including 0.8B-9B models optimized for edge devices
๐ Sources (7)
Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.
- marktechpost.com โ Alibaba Just Released Qwen 3 5 Small Models a Family of 0 8b to 9b Parameters Built for on Device Applications
- mlq.ai โ Alibaba Releases Open Source Qwen35 Small Models for Edge Devices
- oflight.co.jp โ Qwen35 9b Complete Guide
- dev.to โ Qwen 3 Benchmarks Comparisons Model Specifications and More 4hoa
- youtube.com โ Watch
- rdworldonline.com โ We Installed Alibabas 9 Billion Parameter Qwen3 5 9b AI on a Usb Hard Drive It Said It Was Made by Google
- llm-stats.com โ Qwen3.5 9b
๐ฐ
Weekly AI Recap
Read this week's curated digest of top AI events โ
๐Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: Reddit r/LocalLLaMA โ
