Alert button
Picture for Juntao Zhao

Juntao Zhao

Alert button

LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization

Add code
Bookmark button
Alert button
Mar 02, 2024
Juntao Zhao, Borui Wan, Yanghua Peng, Haibin Lin, Chuan Wu

Figure 1 for LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
Figure 2 for LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
Figure 3 for LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
Figure 4 for LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
Viaarxiv icon

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs

Add code
Bookmark button
Alert button
Nov 17, 2023
Hanpeng Hu, Junwei Su, Juntao Zhao, Yanghua Peng, Yibo Zhu, Haibin Lin, Chuan Wu

Figure 1 for CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Figure 2 for CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Figure 3 for CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Figure 4 for CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Viaarxiv icon

Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training

Add code
Bookmark button
Alert button
Jun 02, 2023
Borui Wan, Juntao Zhao, Chuan Wu

Figure 1 for Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training
Figure 2 for Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training
Figure 3 for Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training
Figure 4 for Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training
Viaarxiv icon