LLM ARCHITECTURE ANALYSIS

One-pager technical deep dives into modern model architectures
Each page covers the full architecture: parameter breakdowns, per-block estimates, benchmark tables, code-verified config details, SVG architecture diagrams, deep dives into novel components, and live MADL architecture renderings. Built by reading the actual transformers source code.
LFM2.5-VL-450M Liquid AI
Sub-500M hybrid vision-language model. Not a transformer -- uses gated short convolution blocks (10 of 16 layers) with grouped-query attention only where long-range retrieval is needed (6 layers). SigLIP2 NaFlex vision encoder with PixelUnshuffle projector. 242ms on Jetson Orin. Beats SmolVLM2-500M across all benchmarks with fewer parameters.
Hybrid Conv+Attn Vision Edge 450M MADL
Params: 450M (350M LM + 86M vision + 10M proj) Context: 32K License: lfm1.0
GLM-5.1 Zhipu AI / Z.AI
754B-A40B MoE flagship with Multi-head Latent Attention (MLA), DeepSeek Sparse Attention (DSA) indexer with top-2048 selection, 256 experts with sigmoid+bias-correction routing, and multi-token prediction. Agentic-coding optimized: state-of-the-art on SWE-Bench Pro, CyberGym, BrowseComp.
MoE 256E/8A MLA DSA MADL
Params: 754B total / 40B active Context: 198K License: MIT
Gemma 4 Google DeepMind
Fourth-generation open model family with hybrid sliding/full attention, proportional RoPE, K=V weight sharing, per-layer embeddings (E-series), and parallel dense+MoE mixing. Four variants spanning server to on-device deployment. First generation where architecture -- not just scale -- is the primary axis of differentiation.
Dense MoE Vision Audio MADL
Variants: 31B, 26B-A4B, E4B, E2B Context: 256K License: Gemma