ProbMoE: Differentiable Probabilistic Routing for Mixture-of-Experts (bibtex)

by Heng Zhao, Zilei Shao, Guy Van den Broeck and Zhe Zeng
Abstract:
Mixture-of-Experts (MoE) models scale by activating only a small subset of experts per token. However, training such models remains challenging because top-$k$ routing is discrete and non-differentiable, requiring gradient estimators for expert selection whose design remains a central open problem. We introduce ProbMoE, a probabilistic routing framework that models expert selection as a distribution over cardinality-constrained expert subsets and formulates routing as probabilistic inference in this discrete subset space. We first propose ProbMoE Exact-$k$ routing, which samples $k$-expert subsets in the forward pass, and the backward pass uses gradients through each expert's exact marginal probability as a tractable surrogate for the true gradient. ProbMoE naturally generalizes to a dynamic-$k$ routing setting, where both training and inference constrain the routing cardinality to the same predefined range, allowing adaptive expert allocation per token. Across benchmarks and model backbones, ProbMoE Exact-$k$ achieves strong performance compared to competitive baselines, with improved expert utilization and routing diversity; ProbMoE Dynamic-$k$ achieves comparable performance with fewer activated experts.
Reference:
Heng Zhao, Zilei Shao, Guy Van den Broeck and Zhe Zeng. ProbMoE: Differentiable Probabilistic Routing for Mixture-of-Experts, In Proceedings of the 43rd International Conference on Machine Learning (ICML), 2026.
Bibtex Entry:
@inproceedings{ZhaoICML26,
  title     = {ProbMoE: Differentiable Probabilistic Routing for Mixture-of-Experts},
  author    = {Zhao, Heng and Shao, Zilei and Van den Broeck, Guy and Zeng, Zhe},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning (ICML)},
  url       = "https://starai.cs.ucla.edu/papers/ZhaoICML26.pdf",
  month     = 7,
  year      = {2026},
  keywords  = {conference,selective}
}
PDF Preview:
(PDF preview not available, download PDF instead)
Powered by bibtexbrowser