Picture for Maxim Naumov

Maxim Naumov

Sid

LoKA: Low-precision Kernel Applications for Recommendation Models At Scale

Add code
May 11, 2026
Viaarxiv icon

Training LLMs with Fault Tolerant HSDP on 100,000 GPUs

Add code
Jan 30, 2026
Viaarxiv icon

The Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes

Add code
Jan 15, 2026
Viaarxiv icon

Meta Lattice: Model Space Redesign for Cost-Effective Industry-Scale Ads Recommendations

Add code
Dec 15, 2025
Viaarxiv icon

The Llama 3 Herd of Models

Add code
Jul 31, 2024
Viaarxiv icon

Wukong: Towards a Scaling Law for Large-Scale Recommendation

Add code
Mar 08, 2024
Figure 1 for Wukong: Towards a Scaling Law for Large-Scale Recommendation
Figure 2 for Wukong: Towards a Scaling Law for Large-Scale Recommendation
Figure 3 for Wukong: Towards a Scaling Law for Large-Scale Recommendation
Figure 4 for Wukong: Towards a Scaling Law for Large-Scale Recommendation
Viaarxiv icon

Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation

Add code
Mar 07, 2024
Figure 1 for Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation
Figure 2 for Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation
Figure 3 for Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation
Figure 4 for Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation
Viaarxiv icon

Microscaling Data Formats for Deep Learning

Add code
Oct 19, 2023
Figure 1 for Microscaling Data Formats for Deep Learning
Figure 2 for Microscaling Data Formats for Deep Learning
Figure 3 for Microscaling Data Formats for Deep Learning
Figure 4 for Microscaling Data Formats for Deep Learning
Viaarxiv icon

Shared Microexponents: A Little Shifting Goes a Long Way

Add code
Feb 16, 2023
Figure 1 for Shared Microexponents: A Little Shifting Goes a Long Way
Figure 2 for Shared Microexponents: A Little Shifting Goes a Long Way
Figure 3 for Shared Microexponents: A Little Shifting Goes a Long Way
Figure 4 for Shared Microexponents: A Little Shifting Goes a Long Way
Viaarxiv icon

Learning to Collide: Recommendation System Model Compression with Learned Hash Functions

Add code
Mar 28, 2022
Figure 1 for Learning to Collide: Recommendation System Model Compression with Learned Hash Functions
Figure 2 for Learning to Collide: Recommendation System Model Compression with Learned Hash Functions
Figure 3 for Learning to Collide: Recommendation System Model Compression with Learned Hash Functions
Figure 4 for Learning to Collide: Recommendation System Model Compression with Learned Hash Functions
Viaarxiv icon