Picture for Juntao Zhao

Juntao Zhao

OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training

Add code
Apr 14, 2025
Viaarxiv icon

QSpec: Speculative Decoding with Complementary Quantization Schemes

Add code
Oct 15, 2024
Figure 1 for QSpec: Speculative Decoding with Complementary Quantization Schemes
Figure 2 for QSpec: Speculative Decoding with Complementary Quantization Schemes
Figure 3 for QSpec: Speculative Decoding with Complementary Quantization Schemes
Figure 4 for QSpec: Speculative Decoding with Complementary Quantization Schemes
Viaarxiv icon

QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices

Add code
Jul 02, 2024
Figure 1 for QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices
Figure 2 for QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices
Figure 3 for QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices
Figure 4 for QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices
Viaarxiv icon

LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization

Add code
Mar 02, 2024
Figure 1 for LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
Figure 2 for LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
Figure 3 for LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
Figure 4 for LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
Viaarxiv icon

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs

Add code
Nov 17, 2023
Figure 1 for CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Figure 2 for CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Figure 3 for CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Figure 4 for CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Viaarxiv icon

Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training

Add code
Jun 02, 2023
Figure 1 for Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training
Figure 2 for Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training
Figure 3 for Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training
Figure 4 for Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training
Viaarxiv icon