LLM Architecture Analysis -- One-Pagers

Each page covers the full architecture: parameter breakdowns, per-block estimates, benchmark tables, code-verified config details, SVG architecture diagrams, deep dives into novel components, and live MADL architecture renderings. Built by reading the actual transformers source code.

LFM2.5-VL-450M Liquid AI

Sub-500M hybrid vision-language model. Not a transformer -- uses gated short convolution blocks (10 of 16 layers) with grouped-query attention only where long-range retrieval is needed (6 layers). SigLIP2 NaFlex vision encoder with PixelUnshuffle projector. 242ms on Jetson Orin. Beats SmolVLM2-500M across all benchmarks with fewer parameters.

Hybrid Conv+Attn Vision Edge 450M MADL

Params: 450M (350M LM + 86M vision + 10M proj) Context: 32K License: lfm1.0

GLM-5.1 Zhipu AI / Z.AI

754B-A40B MoE flagship with Multi-head Latent Attention (MLA), DeepSeek Sparse Attention (DSA) indexer with top-2048 selection, 256 experts with sigmoid+bias-correction routing, and multi-token prediction. Agentic-coding optimized: state-of-the-art on SWE-Bench Pro, CyberGym, BrowseComp.

MoE 256E/8A MLA DSA MADL

Params: 754B total / 40B active Context: 198K License: MIT

Gemma 4 Google DeepMind

Fourth-generation open model family with hybrid sliding/full attention, proportional RoPE, K=V weight sharing, per-layer embeddings (E-series), and parallel dense+MoE mixing. Four variants spanning server to on-device deployment. First generation where architecture -- not just scale -- is the primary axis of differentiation.

Dense MoE Vision Audio MADL

Variants: 31B, 26B-A4B, E4B, E2B Context: 256K License: Gemma