Picture for Venmugil Elango

Venmugil Elango

LatentMoE: Toward Optimal Accuracy per FLOP and Parameter in Mixture of Experts

Add code
Jan 26, 2026
Viaarxiv icon

NVIDIA Nemotron 3: Efficient and Open Intelligence

Add code
Dec 24, 2025
Viaarxiv icon

Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Add code
Dec 23, 2025
Viaarxiv icon

ATTENTION2D: Communication Efficient Distributed Self-Attention Mechanism

Add code
Mar 20, 2025
Figure 1 for ATTENTION2D: Communication Efficient Distributed Self-Attention Mechanism
Figure 2 for ATTENTION2D: Communication Efficient Distributed Self-Attention Mechanism
Figure 3 for ATTENTION2D: Communication Efficient Distributed Self-Attention Mechanism
Figure 4 for ATTENTION2D: Communication Efficient Distributed Self-Attention Mechanism
Viaarxiv icon

PaSE: Parallelization Strategies for Efficient DNN Training

Add code
Jul 04, 2024
Figure 1 for PaSE: Parallelization Strategies for Efficient DNN Training
Figure 2 for PaSE: Parallelization Strategies for Efficient DNN Training
Figure 3 for PaSE: Parallelization Strategies for Efficient DNN Training
Figure 4 for PaSE: Parallelization Strategies for Efficient DNN Training
Viaarxiv icon

Microscaling Data Formats for Deep Learning

Add code
Oct 19, 2023
Figure 1 for Microscaling Data Formats for Deep Learning
Figure 2 for Microscaling Data Formats for Deep Learning
Figure 3 for Microscaling Data Formats for Deep Learning
Figure 4 for Microscaling Data Formats for Deep Learning
Viaarxiv icon

Shared Microexponents: A Little Shifting Goes a Long Way

Add code
Feb 16, 2023
Figure 1 for Shared Microexponents: A Little Shifting Goes a Long Way
Figure 2 for Shared Microexponents: A Little Shifting Goes a Long Way
Figure 3 for Shared Microexponents: A Little Shifting Goes a Long Way
Figure 4 for Shared Microexponents: A Little Shifting Goes a Long Way
Viaarxiv icon