Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haixi Fan

Design and Behavior of Sparse Mixture-of-Experts Layers in CNN-based Semantic Segmentation

Apr 15, 2026

Svetlana Pavlitska, Haixi Fan, Konstantin Ditschuneit, J. Marius Zöllner

Abstract:Sparse mixture-of-experts (MoE) layers have been shown to substantially increase model capacity without a proportional increase in computational cost and are widely used in transformer architectures, where they typically replace feed-forward network blocks. In contrast, integrating sparse MoE layers into convolutional neural networks (CNNs) remains inconsistent, with most prior work focusing on fine-grained MoEs operating at the filter or channel levels. In this work, we investigate a coarser, patch-wise formulation of sparse MoE layers for semantic segmentation, where local regions are routed to a small subset of convolutional experts. Through experiments on the Cityscapes and BDD100K datasets using encoder-decoder and backbone-based CNNs, we conduct a design analysis to assess how architectural choices affect routing dynamics and expert specialization. Our results demonstrate consistent, architecture-dependent improvements (up to +3.9 mIoU) with little computational overhead, while revealing strong design sensitivity. Our work provides empirical insights into the design and internal dynamics of sparse MoE layers in CNN-based dense prediction. Our code is available at https://github.com/KASTEL-MobilityLab/moe-layers/.

* Accepted for publication at the SAIAD workshop at CVPR 2026

Via

Access Paper or Ask Questions

Robust Experts: the Effect of Adversarial Training on CNNs with Sparse Mixture-of-Experts Layers

Sep 05, 2025

Svetlana Pavlitska, Haixi Fan, Konstantin Ditschuneit, J. Marius Zöllner

Figure 1 for Robust Experts: the Effect of Adversarial Training on CNNs with Sparse Mixture-of-Experts Layers

Figure 2 for Robust Experts: the Effect of Adversarial Training on CNNs with Sparse Mixture-of-Experts Layers

Figure 3 for Robust Experts: the Effect of Adversarial Training on CNNs with Sparse Mixture-of-Experts Layers

Figure 4 for Robust Experts: the Effect of Adversarial Training on CNNs with Sparse Mixture-of-Experts Layers

Abstract:Robustifying convolutional neural networks (CNNs) against adversarial attacks remains challenging and often requires resource-intensive countermeasures. We explore the use of sparse mixture-of-experts (MoE) layers to improve robustness by replacing selected residual blocks or convolutional layers, thereby increasing model capacity without additional inference cost. On ResNet architectures trained on CIFAR-100, we find that inserting a single MoE layer in the deeper stages leads to consistent improvements in robustness under PGD and AutoPGD attacks when combined with adversarial training. Furthermore, we discover that when switch loss is used for balancing, it causes routing to collapse onto a small set of overused experts, thereby concentrating adversarial training on these paths and inadvertently making them more robust. As a result, some individual experts outperform the gated MoE model in robustness, suggesting that robust subpaths emerge through specialization. Our code is available at https://github.com/KASTEL-MobilityLab/robust-sparse-moes.

* Accepted for publication at the STREAM workshop at ICCV 2025

Via

Access Paper or Ask Questions