🇨🇳Stalecollected in 13h

Apple M4 Neural Engine Unlocked via Reverse Engineering

Apple M4 Neural Engine Unlocked via Reverse Engineering
PostLinkedIn
🇨🇳Read original on cnBeta (Full RSS)

💡Unlock 15.8 TFLOPS of AI compute on M4 chips by bypassing Apple's software restrictions.

⚡ 30-Second TL;DR

What Changed

Successfully bypassed Apple's software-locked Neural Engine.

Why It Matters

This breakthrough allows developers to run high-performance AI models directly on M4 hardware without Apple's proprietary software stack. It opens new possibilities for local LLM inference and edge computing on Mac devices.

What To Do Next

Explore the custom MIL implementation to test local model inference performance on your M4-based Mac hardware.

Who should care:Developers & AI Engineers

🧠 Deep Insight

Web-grounded analysis with 33 cited sources.

🔑 Enhanced Key Takeaways

  • The reverse engineering effort, led by developer maderix, involved mapping Apple's internal software stack, discovering private Objective-C APIs (_ANEClient, _ANECompiler, _ANEInMemoryModelDescriptor), and cracking the proprietary Model Intermediate Language (MIL) compilation path and E5 binary format to achieve direct hardware access.
  • The 15.8 TFLOPS performance achieved is specifically for FP16 compute, clarifying Apple's advertised 38 TOPS (Trillion Operations Per Second) which is often for INT8 operations and can be misleading as the ANE dequantizes INT8 weights to FP16 before computation.
  • The bypass enabled full backpropagation and transformer training directly on the M4's Neural Engine, a capability previously restricted by Apple's software to inference-only workloads via frameworks like CoreML and MLX.
  • The custom implementation operated entirely in RAM, avoiding slower NAND flash writes, and demonstrated the ANE as a dedicated graph execution engine optimized for neural network graphs rather than a general-purpose processor.
📊 Competitor Analysis▸ Show
Feature / CompetitorApple M4 Neural EngineQualcomm Snapdragon X Elite (NPU)Intel Core Ultra (NPU)AMD Ryzen AI (NPU)NVIDIA Laptop GPUs (e.g., RTX 5070 Ada)
AI Performance (Peak)38 TOPS (INT8), ~19 TFLOPS (FP16)45 TOPS (INT4/INT8)Up to 74 TOPS (INT8, Nova Lake-S)Up to 50 TOPS (INT8), 25 TFLOPS (BF16, XDNA2)Up to 798 AI TOPS (INT8), 23.22 TFLOPS (FP16/FP32, RTX 5070)
Primary Use CaseOn-device inference (officially), now training (via reverse engineering)On-device AI acceleration, LLMsOn-device AI acceleration, energy efficiencyOn-device inference and training, real-time generative AIHigh-performance AI training and inference, graphics
Software StackCoreML, MLX (official), custom MIL (reverse engineered)Hexagon NPU SDKOpenVINO, Windows MLAMD IRON, MLIR-AIRCUDA, TensorRT
Memory ArchitectureUnified Memory (shared with CPU/GPU)LPDDR5x-8448 (part of SoC)Integrated NPU within SoCUnified memory (shared with CPU/GPU)Dedicated VRAM (e.g., 8GB GDDR7)

🛠️ Technical Deep Dive

  • The M4 Neural Engine (codename H16G) features 16 cores, a queue depth of 127 evaluation requests, independent Dynamic Voltage and Frequency Scaling (DVFS), and hard power gating for efficiency.
  • It functions as a graph execution engine, specifically designed for fixed-function acceleration of compiled neural network graphs, executing them as atomic operations, rather than a general-purpose CPU or GPU.
  • The reverse engineering process involved identifying and utilizing private Objective-C APIs, including _ANEClient, _ANECompiler, and _ANEInMemoryModelDescriptor, and deciphering the proprietary E5 binary format.
  • The custom Model Intermediate Language (MIL) implementation enabled direct compilation and execution of compute graphs on the ANE, bypassing Apple's official CoreML framework.
  • The M4 chip is manufactured using TSMC's second-generation 3-nanometer process and integrates 28 billion transistors.
  • It is reportedly Apple's first SoC to adopt the ARMv9 CPU architecture, supporting the Scalable Matrix Extension (SME) but notably lacking Scalable Vector Extension (SVE) support.
  • The Neural Engine dequantizes INT8 weights to FP16 prior to computation, indicating that its true peak performance for AI workloads is approximately 19 TFLOPS (FP16), despite higher INT8 TOPS marketing figures.
  • Apple's unified memory architecture allows the CPU, GPU, and Neural Engine to share a single memory pool, which helps mitigate data transfer bottlenecks, with the M4 Max variant offering up to 400 GB/s of memory bandwidth.

🔮 Future ImplicationsAI analysis grounded in cited sources

Increased accessibility for on-device AI model training.
Bypassing software restrictions could lead to community-driven tools and frameworks that enable developers to leverage the Neural Engine for training, expanding local AI capabilities beyond inference.
Potential for enhanced privacy and efficiency in AI applications.
Performing AI training and inference entirely on-device, without cloud reliance, inherently improves data privacy and can lead to more energy-efficient AI workloads.
Apple may respond with stricter hardware lockdowns or new developer APIs.
The reverse engineering could prompt Apple to either further secure their hardware against such bypasses or, conversely, to release more flexible official APIs to support advanced AI development on their platform.

Timeline

2017-09
Apple introduces the first Neural Engine (0.6 TOPS) in the A11 Bionic chip, along with the Core ML framework for developers.
2020-11
Apple releases the M1 chip, integrating a 16-core Neural Engine capable of 11 TOPS, bringing dedicated AI acceleration to Macs.
2022-06
The M2 chip is introduced, featuring an improved Neural Engine with performance around 15.8 TOPS.
2023-10
Apple launches the M3 chip, with its Neural Engine offering approximately 18 TOPS (INT16 operations), or 35 TOPS (INT8).
2024-05
The Apple M4 chip is unveiled, featuring a Neural Engine capable of 38 TOPS (INT8), marketed as Apple's most powerful to date.
2026-06-16
Developers successfully bypass Apple's software restrictions on the M4 Neural Engine, unlocking 15.8 TFLOPS of FP16 AI compute for training.
📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Related Updates

AI-curated news aggregator. All content rights belong to original publishers.
Original source: cnBeta (Full RSS)