LFM2.5-VL-450M
Liquid AI
Sub-500M hybrid vision-language model. Not a transformer -- uses gated short convolution blocks (10 of 16 layers) with grouped-query attention only where long-range retrieval is needed (6 layers). SigLIP2 NaFlex vision encoder with PixelUnshuffle projector. 242ms on Jetson Orin. Beats SmolVLM2-500M across all benchmarks with fewer parameters.
Hybrid Conv+Attn
Vision
Edge
450M
MADL
Params: 450M (350M LM + 86M vision + 10M proj)
Context: 32K
License: lfm1.0
GLM-5.1
Zhipu AI / Z.AI
754B-A40B MoE flagship with Multi-head Latent Attention (MLA), DeepSeek Sparse Attention (DSA) indexer with top-2048 selection, 256 experts with sigmoid+bias-correction routing, and multi-token prediction. Agentic-coding optimized: state-of-the-art on SWE-Bench Pro, CyberGym, BrowseComp.
MoE 256E/8A
MLA
DSA
MADL
Params: 754B total / 40B active
Context: 198K
License: MIT
Gemma 4
Google DeepMind
Fourth-generation open model family with hybrid sliding/full attention, proportional RoPE, K=V weight sharing, per-layer embeddings (E-series), and parallel dense+MoE mixing. Four variants spanning server to on-device deployment. First generation where architecture -- not just scale -- is the primary axis of differentiation.
Dense
MoE
Vision
Audio
MADL
Variants: 31B, 26B-A4B, E4B, E2B
Context: 256K
License: Gemma