Abstract:The deployment of large-scale Mixture-of-Experts (MoE) models on edge devices presents significant challenges due to memory constraints. While MoE architectures enable efficient utilization of computational resources by activating only a subset of experts per inference, they require careful memory management to operate efficiently in resource-constrained environments. Traditional heuristic-based expert caching strategies such as MoE-Infinity struggle to maintain high cache hit rates as models parameters scale. In this work, we introduce MoE-Beyond, a learning-based expert activation predictor trained to predict expert activations during autoregressive decoding. By framing the task as a multi-label sequence prediction problem, we train a lightweight transformer model on 66 million expert activation traces extracted from LDJnr-Puffin dataset [5] using DeepSeek-V2-Chat-Lite MoE. Our predictor generalizes effectively across unseen prompts from WebGLM-QA dataset [6], achieving 97.5% accuracy and an 86.6% F1-score. Simulation results show that MoE-Beyond improves GPU cache hit rate from 17% to 72% when only 10% of experts fit in GPU cache, outperforming heuristic baselines.
Abstract:Facial recognition technology (FRT) is increasingly used in criminal investigations, yet most evaluations of its accuracy rely on high-quality images, unlike those often encountered by law enforcement. This study examines how five common forms of image degradation--contrast, brightness, motion blur, pose shift, and resolution--affect FRT accuracy and fairness across demographic groups. Using synthetic faces generated by StyleGAN3 and labeled with FairFace, we simulate degraded images and evaluate performance using Deepface with ArcFace loss in 1:n identification tasks. We perform an experiment and find that false positive rates peak near baseline image quality, while false negatives increase as degradation intensifies--especially with blur and low resolution. Error rates are consistently higher for women and Black individuals, with Black females most affected. These disparities raise concerns about fairness and reliability when FRT is used in real-world investigative contexts. Nevertheless, even under the most challenging conditions and for the most affected subgroups, FRT accuracy remains substantially higher than that of many traditional forensic methods. This suggests that, if appropriately validated and regulated, FRT should be considered a valuable investigative tool. However, algorithmic accuracy alone is not sufficient: we must also evaluate how FRT is used in practice, including user-driven data manipulation. Such cases underscore the need for transparency and oversight in FRT deployment to ensure both fairness and forensic validity.