Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Arush Mehrotra

MoE-Beyond: Learning-Based Expert Activation Prediction on Edge Devices

Aug 23, 2025

Nishant Gavhane, Arush Mehrotra, Rohit Chawla, Peter Proenca

Figure 1 for MoE-Beyond: Learning-Based Expert Activation Prediction on Edge Devices

Figure 2 for MoE-Beyond: Learning-Based Expert Activation Prediction on Edge Devices

Figure 3 for MoE-Beyond: Learning-Based Expert Activation Prediction on Edge Devices

Figure 4 for MoE-Beyond: Learning-Based Expert Activation Prediction on Edge Devices

Abstract:The deployment of large-scale Mixture-of-Experts (MoE) models on edge devices presents significant challenges due to memory constraints. While MoE architectures enable efficient utilization of computational resources by activating only a subset of experts per inference, they require careful memory management to operate efficiently in resource-constrained environments. Traditional heuristic-based expert caching strategies such as MoE-Infinity struggle to maintain high cache hit rates as models parameters scale. In this work, we introduce MoE-Beyond, a learning-based expert activation predictor trained to predict expert activations during autoregressive decoding. By framing the task as a multi-label sequence prediction problem, we train a lightweight transformer model on 66 million expert activation traces extracted from LDJnr-Puffin dataset [5] using DeepSeek-V2-Chat-Lite MoE. Our predictor generalizes effectively across unseen prompts from WebGLM-QA dataset [6], achieving 97.5% accuracy and an 86.6% F1-score. Simulation results show that MoE-Beyond improves GPU cache hit rate from 17% to 72% when only 10% of experts fit in GPU cache, outperforming heuristic baselines.

Via

Access Paper or Ask Questions

Accuracy and Fairness of Facial Recognition Technology in Low-Quality Police Images: An Experiment With Synthetic Faces

May 20, 2025

Maria Cuellar, Hon Kiu, To, Arush Mehrotra

Abstract:Facial recognition technology (FRT) is increasingly used in criminal investigations, yet most evaluations of its accuracy rely on high-quality images, unlike those often encountered by law enforcement. This study examines how five common forms of image degradation--contrast, brightness, motion blur, pose shift, and resolution--affect FRT accuracy and fairness across demographic groups. Using synthetic faces generated by StyleGAN3 and labeled with FairFace, we simulate degraded images and evaluate performance using Deepface with ArcFace loss in 1:n identification tasks. We perform an experiment and find that false positive rates peak near baseline image quality, while false negatives increase as degradation intensifies--especially with blur and low resolution. Error rates are consistently higher for women and Black individuals, with Black females most affected. These disparities raise concerns about fairness and reliability when FRT is used in real-world investigative contexts. Nevertheless, even under the most challenging conditions and for the most affected subgroups, FRT accuracy remains substantially higher than that of many traditional forensic methods. This suggests that, if appropriately validated and regulated, FRT should be considered a valuable investigative tool. However, algorithmic accuracy alone is not sufficient: we must also evaluate how FRT is used in practice, including user-driven data manipulation. Such cases underscore the need for transparency and oversight in FRT deployment to ensure both fairness and forensic validity.

Via

Access Paper or Ask Questions