Meta Plans Four New MTIA Generations

Post LinkedIn

👥Read original on Meta Newsroom

#ai-chips #custom-silicon #hardware-strategymtia

💡Meta's 4 new AI chips in 2 yrs accelerate custom silicon for AI infra

⚡ 30-Second TL;DR

What Changed

MTIA custom silicon central to Meta's AI infrastructure

Why It Matters

Meta's push strengthens in-house AI hardware, potentially pressuring Nvidia dominance and spurring efficiency gains. AI practitioners gain insights into scalable custom silicon trends for large-scale deployments.

What To Do Next

Monitor Meta engineering blog for MTIA benchmark releases to evaluate vs. Nvidia GPUs.

Who should care:Developers & AI Engineers

🧠 Deep Insight

Web-grounded analysis with 6 cited sources.

🔑 Enhanced Key Takeaways

•First-generation MTIA, announced May 18, 2023, was fabricated on TSMC 7nm process, operates at 800 MHz with 102.4 TOPS INT8 and 25W TDP, targeting recommendation system inference[1][2][3].
•Next-generation MTIA uses TSMC 5nm process, clocks at 1.35 GHz with 90W TDP, deployed in rack systems holding up to 72 accelerators, achieving 3x performance improvement over v1[4].
•MTIA development began in 2020, with chips received as early as 2021; features 64 PEs in 8x8 grid, 128 MB on-chip SRAM at 800 GB/s bandwidth, and up to 128 GB LPDDR5 off-chip[1][3].

🛠️ Technical Deep Dive

•MTIA v1: TSMC 7nm, 800 MHz, 102.4 TOPS INT8 / 51.2 TFLOPS FP16, 25W TDP, 128 MB SRAM (800 GB/s), up to 128 GB LPDDR5 (176 GB/s), 8 PCIe 4.0 lanes, 64 PEs in 8x8 mesh[1][2][3].
•Architecture: 64 Processing Elements (PEs) each with 128 KB local SRAM, supports TLP/DLP/ILP/MLP, mesh network for inter-PE and memory connectivity[3].
•Next-gen (v2): TSMC 5nm, 1.35 GHz, 90W TDP, 1.12B gates, 373 mm² die area, on-chip 128 MB (800 GB/s), off-chip 64 GB LPDDR5 (176 GB/s), deployed in 72-accelerator racks with 6x throughput gain[4].
•Deployment: Yosemite V3 servers with 12 accelerators per server using PCIe switches for inter-accelerator communication bypassing host CPU[3].

🔮 Future ImplicationsAI analysis grounded in cited sources

Meta will reduce dependence on Nvidia GPUs for inference by 2028

MTIA is deployed at scale for ads/ranking workloads with efficiency gains over vendor silicon, alongside a training chip ramping up and multiple chips in development[6].

Four new MTIA generations will enable denser AI clusters by 2028

Next-gen already supports 72 accelerators per rack at higher clock/power for broader model sizes, aligning with plans for multiple chips and Hyperion 5GW cluster[4][6].