Picture for Dehao Chen

Dehao Chen

LaMDA: Language Models for Dialog Applications

Add code
Feb 10, 2022
Figure 1 for LaMDA: Language Models for Dialog Applications
Figure 2 for LaMDA: Language Models for Dialog Applications
Figure 3 for LaMDA: Language Models for Dialog Applications
Figure 4 for LaMDA: Language Models for Dialog Applications
Viaarxiv icon

GSPMD: General and Scalable Parallelization for ML Computation Graphs

Add code
May 10, 2021
Figure 1 for GSPMD: General and Scalable Parallelization for ML Computation Graphs
Figure 2 for GSPMD: General and Scalable Parallelization for ML Computation Graphs
Figure 3 for GSPMD: General and Scalable Parallelization for ML Computation Graphs
Figure 4 for GSPMD: General and Scalable Parallelization for ML Computation Graphs
Viaarxiv icon

Exploring the limits of Concurrency in ML Training on Google TPUs

Add code
Nov 07, 2020
Figure 1 for Exploring the limits of Concurrency in ML Training on Google TPUs
Figure 2 for Exploring the limits of Concurrency in ML Training on Google TPUs
Figure 3 for Exploring the limits of Concurrency in ML Training on Google TPUs
Figure 4 for Exploring the limits of Concurrency in ML Training on Google TPUs
Viaarxiv icon

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Add code
Jun 30, 2020
Figure 1 for GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Figure 2 for GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Figure 3 for GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Figure 4 for GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Viaarxiv icon

Automatic Cross-Replica Sharding of Weight Update in Data-Parallel Training

Add code
Apr 28, 2020
Figure 1 for Automatic Cross-Replica Sharding of Weight Update in Data-Parallel Training
Figure 2 for Automatic Cross-Replica Sharding of Weight Update in Data-Parallel Training
Figure 3 for Automatic Cross-Replica Sharding of Weight Update in Data-Parallel Training
Figure 4 for Automatic Cross-Replica Sharding of Weight Update in Data-Parallel Training
Viaarxiv icon

MLPerf Training Benchmark

Add code
Oct 30, 2019
Figure 1 for MLPerf Training Benchmark
Figure 2 for MLPerf Training Benchmark
Figure 3 for MLPerf Training Benchmark
Figure 4 for MLPerf Training Benchmark
Viaarxiv icon

Scale MLPerf-0.6 models on Google TPU-v3 Pods

Add code
Oct 02, 2019
Figure 1 for Scale MLPerf-0.6 models on Google TPU-v3 Pods
Figure 2 for Scale MLPerf-0.6 models on Google TPU-v3 Pods
Figure 3 for Scale MLPerf-0.6 models on Google TPU-v3 Pods
Figure 4 for Scale MLPerf-0.6 models on Google TPU-v3 Pods
Viaarxiv icon

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Add code
Feb 21, 2019
Figure 1 for Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling
Figure 2 for Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling
Figure 3 for Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling
Viaarxiv icon

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Add code
Dec 12, 2018
Figure 1 for GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
Figure 2 for GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
Figure 3 for GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
Figure 4 for GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
Viaarxiv icon

Image Classification at Supercomputer Scale

Add code
Dec 02, 2018
Figure 1 for Image Classification at Supercomputer Scale
Figure 2 for Image Classification at Supercomputer Scale
Figure 3 for Image Classification at Supercomputer Scale
Figure 4 for Image Classification at Supercomputer Scale
Viaarxiv icon