Picture for Louis Feng

Louis Feng

Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms

Add code
Apr 19, 2024
Figure 1 for Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms
Figure 2 for Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms
Figure 3 for Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms
Figure 4 for Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms
Viaarxiv icon

Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

Add code
May 26, 2023
Figure 1 for Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces
Figure 2 for Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces
Figure 3 for Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces
Figure 4 for Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces
Viaarxiv icon

Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models

Add code
May 03, 2023
Figure 1 for Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models
Figure 2 for Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models
Figure 3 for Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models
Figure 4 for Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models
Viaarxiv icon

Mystique: Accurate and Scalable Production AI Benchmarks Generation

Add code
Dec 16, 2022
Figure 1 for Mystique: Accurate and Scalable Production AI Benchmarks Generation
Figure 2 for Mystique: Accurate and Scalable Production AI Benchmarks Generation
Figure 3 for Mystique: Accurate and Scalable Production AI Benchmarks Generation
Figure 4 for Mystique: Accurate and Scalable Production AI Benchmarks Generation
Viaarxiv icon

DreamShard: Generalizable Embedding Table Placement for Recommender Systems

Add code
Oct 05, 2022
Figure 1 for DreamShard: Generalizable Embedding Table Placement for Recommender Systems
Figure 2 for DreamShard: Generalizable Embedding Table Placement for Recommender Systems
Figure 3 for DreamShard: Generalizable Embedding Table Placement for Recommender Systems
Figure 4 for DreamShard: Generalizable Embedding Table Placement for Recommender Systems
Viaarxiv icon

AutoShard: Automated Embedding Table Sharding for Recommender Systems

Add code
Aug 12, 2022
Figure 1 for AutoShard: Automated Embedding Table Sharding for Recommender Systems
Figure 2 for AutoShard: Automated Embedding Table Sharding for Recommender Systems
Figure 3 for AutoShard: Automated Embedding Table Sharding for Recommender Systems
Figure 4 for AutoShard: Automated Embedding Table Sharding for Recommender Systems
Viaarxiv icon

Building a Performance Model for Deep Learning Recommendation Model Training on GPUs

Add code
Jan 19, 2022
Figure 1 for Building a Performance Model for Deep Learning Recommendation Model Training on GPUs
Figure 2 for Building a Performance Model for Deep Learning Recommendation Model Training on GPUs
Figure 3 for Building a Performance Model for Deep Learning Recommendation Model Training on GPUs
Figure 4 for Building a Performance Model for Deep Learning Recommendation Model Training on GPUs
Viaarxiv icon

Alternate Model Growth and Pruning for Efficient Training of Recommendation Systems

Add code
May 04, 2021
Figure 1 for Alternate Model Growth and Pruning for Efficient Training of Recommendation Systems
Figure 2 for Alternate Model Growth and Pruning for Efficient Training of Recommendation Systems
Figure 3 for Alternate Model Growth and Pruning for Efficient Training of Recommendation Systems
Figure 4 for Alternate Model Growth and Pruning for Efficient Training of Recommendation Systems
Viaarxiv icon