machine-learning How Mixture of Experts (MoE) Works: Router, Expert Collapse, and the VRAM Tax 14 min read · May 2026