Picture for Ashwath Aithal

Ashwath Aithal

MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core

Add code
Apr 21, 2025
Viaarxiv icon

Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning

Add code
Apr 15, 2025
Viaarxiv icon

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

Add code
Apr 10, 2025
Viaarxiv icon

Training Video Foundation Models with NVIDIA NeMo

Add code
Mar 17, 2025
Viaarxiv icon

Llama 3 Meets MoE: Efficient Upcycling

Add code
Dec 13, 2024
Figure 1 for Llama 3 Meets MoE: Efficient Upcycling
Figure 2 for Llama 3 Meets MoE: Efficient Upcycling
Figure 3 for Llama 3 Meets MoE: Efficient Upcycling
Figure 4 for Llama 3 Meets MoE: Efficient Upcycling
Viaarxiv icon

Upcycling Large Language Models into Mixture of Experts

Add code
Oct 10, 2024
Figure 1 for Upcycling Large Language Models into Mixture of Experts
Figure 2 for Upcycling Large Language Models into Mixture of Experts
Figure 3 for Upcycling Large Language Models into Mixture of Experts
Figure 4 for Upcycling Large Language Models into Mixture of Experts
Viaarxiv icon

Nemotron-4 340B Technical Report

Add code
Jun 17, 2024
Figure 1 for Nemotron-4 340B Technical Report
Figure 2 for Nemotron-4 340B Technical Report
Figure 3 for Nemotron-4 340B Technical Report
Figure 4 for Nemotron-4 340B Technical Report
Viaarxiv icon

NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment

Add code
May 02, 2024
Figure 1 for NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
Figure 2 for NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
Figure 3 for NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
Figure 4 for NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
Viaarxiv icon

Nemotron-4 15B Technical Report

Add code
Feb 27, 2024
Figure 1 for Nemotron-4 15B Technical Report
Figure 2 for Nemotron-4 15B Technical Report
Figure 3 for Nemotron-4 15B Technical Report
Figure 4 for Nemotron-4 15B Technical Report
Viaarxiv icon

Hierarchical Graph Neural Network with Cross-Attention for Cross-Device User Matching

Add code
Apr 06, 2023
Viaarxiv icon