Mixtral


Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts

Add code
Jun 26, 2025
Viaarxiv icon

Who Does What in Deep Learning? Multidimensional Game-Theoretic Attribution of Function of Neural Units

Add code
Jun 24, 2025
Viaarxiv icon

MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing

Add code
Jun 09, 2025
Viaarxiv icon

A Dataset for Addressing Patient's Information Needs related to Clinical Course of Hospitalization

Add code
Jun 04, 2025
Viaarxiv icon

Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis

Add code
May 30, 2025
Viaarxiv icon

From prosthetic memory to prosthetic denial: Auditing whether large language models are prone to mass atrocity denialism

Add code
May 27, 2025
Viaarxiv icon

FloE: On-the-Fly MoE Inference on Memory-constrained GPU

Add code
May 12, 2025
Viaarxiv icon

QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration

Add code
May 10, 2025
Viaarxiv icon

FloE: On-the-Fly MoE Inference

Add code
May 09, 2025
Viaarxiv icon

MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core

Add code
Apr 21, 2025
Viaarxiv icon