A developer ported Microsoft's BitNet to iOS, achieving 45-46 tokens per second on iPhone 14 Pro Max with the 0.7B model using just 200MB memory. BitNet employs 1-bit weights (-1, 0, +1) for tiny, fast models. Plans include open-sourcing an instruction-tuned 2B model soon.
Key Points
- 1.45-46 tok/s speed on iPhone 14 Pro Max with 0.7B model
- 2.1-bit weights reduce size to ~200MB and boost performance
- 3.ARM NEON kernels ported from M-series Macs to iOS
- 4.Base model running; instruction-tuned 2B model next
Impact Analysis
This enables high-speed local LLM inference on mobile devices, reducing reliance on cloud services and opening doors for on-device AI apps.
Technical Details
BitNet uses ternary weights instead of 16-bit floats; iOS port leveraged existing ARM NEON optimizations from Macs, focusing on build system tweaks.




