Alert button
Picture for Ting Cao

Ting Cao

Alert button

BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation

Add code
Bookmark button
Alert button
Feb 16, 2024
Dayou Du, Yijia Zhang, Shijie Cao, Jiaqi Guo, Ting Cao, Xiaowen Chu, Ningyi Xu

Viaarxiv icon

Exploring the Impact of In-Browser Deep Learning Inference on Quality of User Experience and Performance

Add code
Bookmark button
Alert button
Feb 08, 2024
Qipeng Wang, Shiqi Jiang, Zhenpeng Chen, Xu Cao, Yuanchun Li, Aoyu Li, Ying Zhang, Yun Ma, Ting Cao, Xuanzhe Liu

Viaarxiv icon

AFPQ: Asymmetric Floating Point Quantization for LLMs

Add code
Bookmark button
Alert button
Nov 03, 2023
Yijia Zhang, Sicheng Zhang, Shijie Cao, Dayou Du, Jianyu Wei, Ting Cao, Ningyi Xu

Figure 1 for AFPQ: Asymmetric Floating Point Quantization for LLMs
Figure 2 for AFPQ: Asymmetric Floating Point Quantization for LLMs
Figure 3 for AFPQ: Asymmetric Floating Point Quantization for LLMs
Figure 4 for AFPQ: Asymmetric Floating Point Quantization for LLMs
Viaarxiv icon

Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations

Add code
Bookmark button
Alert button
Sep 16, 2023
Fucheng Jia, Shiqi Jiang, Ting Cao, Wei Cui, Tianrui Xia, Xu Cao, Yuanchun Li, Deyu Zhang, Ju Ren, Yunxin Liu, Lili Qiu, Mao Yang

Figure 1 for Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations
Figure 2 for Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations
Figure 3 for Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations
Figure 4 for Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations
Viaarxiv icon

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Add code
Bookmark button
Alert button
Aug 23, 2023
Ranggi Hwang, Jianyu Wei, Shijie Cao, Changho Hwang, Xiaohu Tang, Ting Cao, Mao Yang, Minsoo Rhu

Figure 1 for Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Figure 2 for Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Figure 3 for Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Figure 4 for Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Viaarxiv icon

Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference

Add code
Bookmark button
Alert button
Jun 26, 2023
Junyan Li, Li Lyna Zhang, Jiahang Xu, Yujing Wang, Shaoguang Yan, Yunqing Xia, Yuqing Yang, Ting Cao, Hao Sun, Weiwei Deng, Qi Zhang, Mao Yang

Figure 1 for Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference
Figure 2 for Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference
Figure 3 for Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference
Figure 4 for Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference
Viaarxiv icon

Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training

Add code
Bookmark button
Alert button
May 31, 2023
Yijia Zhang, Yibo Han, Shijie Cao, Guohao Dai, Youshan Miao, Ting Cao, Fan Yang, Ningyi Xu

Figure 1 for Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training
Figure 2 for Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training
Figure 3 for Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training
Figure 4 for Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training
Viaarxiv icon

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Add code
Bookmark button
Alert button
May 31, 2023
Huiqiang Jiang, Li Lyna Zhang, Yuang Li, Yu Wu, Shijie Cao, Ting Cao, Yuqing Yang, Jinyu Li, Mao Yang, Lili Qiu

Figure 1 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Figure 2 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Figure 3 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Figure 4 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Viaarxiv icon

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models

Add code
Bookmark button
Alert button
May 21, 2023
Yijia Zhang, Lingran Zhao, Shijie Cao, Wenqiang Wang, Ting Cao, Fan Yang, Mao Yang, Shanghang Zhang, Ningyi Xu

Figure 1 for Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models
Figure 2 for Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models
Figure 3 for Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models
Figure 4 for Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models
Viaarxiv icon

ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices

Add code
Bookmark button
Alert button
Mar 21, 2023
Chen Tang, Li Lyna Zhang, Huiqiang Jiang, Jiahang Xu, Ting Cao, Quanlu Zhang, Yuqing Yang, Zhi Wang, Mao Yang

Figure 1 for ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices
Figure 2 for ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices
Figure 3 for ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices
Figure 4 for ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices
Viaarxiv icon