Gallery of LLM Architecture Diagrams

๐กVisual diagrams of LLM architectures to grasp complex designs quickly.
โก 30-Second TL;DR
What Changed
Curated gallery of diagrams for LLM architectures
Why It Matters
Submitted by u/seraschka with link to the collection.
What To Do Next
Browse the LLM architecture gallery linked in r/LocalLLaMA for design insights.
๐ง Deep Insight
Web-grounded analysis with 7 cited sources.
๐ Enhanced Key Takeaways
- โขModern LLM architectures diverge significantly from the original GPT design in normalization strategies: OLMo 2 adopts Post-Norm with RMSNorm (rather than Pre-LN), while most contemporary models like Llama and Gemma have switched from LayerNorm to RMSNorm for improved gradient behavior[1].
- โขArchitectural trade-offs between model width and depth are critical design decisions: Qwen3 uses 48 transformer blocks (deep architecture) while gpt-oss uses 24 blocks but wider hidden dimensions, reflecting different efficiency and capability priorities[1].
- โขVision-language model architectures are achieving frontier-level performance at compact scales through fully unfrozen training and scaled post-training: STEP3-VL-10B integrates a 1.8B perception encoder with Qwen3-8B decoder and achieves 94.43% on AIME2025 despite its 10B footprint[3].
- โขThe 2026 LLM landscape includes specialized architectural families beyond general-purpose models: reasoning models (o1/o3), vision-language models, small language models (SLM), large action models (LAM), and hierarchical language models (HLM), each with distinct architectural choices[6].
- โขOpen-source LLM architectures built on Llama foundations (Nemotron-4, Orca 2, Vicuna) now compete with proprietary models by leveraging established base architectures and fine-tuning strategies, with Nemotron-4 delivering performance competitive with leading proprietary systems across 340B, 70B, and 15B variants[5].
๐ ๏ธ Technical Deep Dive
Key Architectural Innovations in Contemporary LLMs:
-
Normalization Techniques: RMSNorm has replaced LayerNorm across most modern architectures (Llama, Gemma, OLMo 2) due to improved computational efficiency and gradient behavior. OLMo 2 uniquely adopts Post-Norm positioning rather than the Pre-LN standard, which requires careful learning rate warm-up but shows different gradient initialization properties[1].
-
Attention Mechanisms: Traditional Multi-Head Attention (MHA) persists in some models like OLMo 2, while others adopt Group Query Attention (GQA) for efficiency. OLMo 2's 32B variant later introduced GQA support[1].
-
Vision-Language Integration: STEP3-VL-10B uses a 1.8B language-optimized Perception Encoder bridged to a Qwen3-8B decoder via a projector with 16ร spatial downsampling. The model employs multi-crop strategies for fine-grained visual details and was trained on 1.2T tokens of curated multimodal data including K-12 education, OCR, and GUI interaction tasks[3].
-
Depth vs. Width Trade-offs: Qwen3 employs 48 transformer blocks (deep architecture) while gpt-oss uses 24 blocks with wider hidden dimensions, representing different computational efficiency strategies[1].
-
Open-Source Base Architectures: Nemotron-4 (Nvidia) and Orca 2 (Microsoft) build on Llama-3 architecture, available in multiple sizes (340B, 70B, 15B for Nemotron-4; 7B and 13B for Orca 2) for varied deployment scenarios[5].
๐ฎ Future ImplicationsAI analysis grounded in cited sources
โณ Timeline
๐ Sources (7)
Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.
- magazine.sebastianraschka.com โ The Big LLM Architecture Comparison
- dev.to โ Choosing an LLM in 2026 the Practical Comparison Table Specs Cost Latency Compatibility 354g
- todatabeyond.substack.com โ Important LLM Papers for the Week 504
- ideas2it.com โ LLM Comparison
- techtarget.com โ 12 of the Best Large Language Models
- youtube.com โ Watch
- pluralsight.com โ Best AI Models 2026 List
Weekly AI Recap
Read this week's curated digest of top AI events โ
๐Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: Reddit r/LocalLLaMA โ