Alert button
Picture for Qingru Zhang

Qingru Zhang

Alert button

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM

Add code
Bookmark button
Alert button
Mar 11, 2024
Hao Kang, Qingru Zhang, Souvik Kundu, Geonhwa Jeong, Zaoxing Liu, Tushar Krishna, Tuo Zhao

Figure 1 for GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
Figure 2 for GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
Figure 3 for GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
Figure 4 for GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
Viaarxiv icon

GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM

Add code
Bookmark button
Alert button
Mar 08, 2024
Hao Kang, Qingru Zhang, Souvik Kundu, Geonhwa Jeong, Zaoxing Liu, Tushar Krishna, Tuo Zhao

Figure 1 for GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
Figure 2 for GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
Figure 3 for GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
Figure 4 for GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
Viaarxiv icon

Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

Add code
Bookmark button
Alert button
Nov 03, 2023
Qingru Zhang, Chandan Singh, Liyuan Liu, Xiaodong Liu, Bin Yu, Jianfeng Gao, Tuo Zhao

Viaarxiv icon

Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer

Add code
Bookmark button
Alert button
Oct 19, 2023
Qingru Zhang, Dhananjay Ram, Cole Hawkins, Sheng Zha, Tuo Zhao

Viaarxiv icon

Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms

Add code
Bookmark button
Alert button
Oct 16, 2023
Alexander Bukharin, Yan Li, Yue Yu, Qingru Zhang, Zhehui Chen, Simiao Zuo, Chao Zhang, Songan Zhang, Tuo Zhao

Viaarxiv icon

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Add code
Bookmark button
Alert button
Jun 26, 2023
Yixiao Li, Yifan Yu, Qingru Zhang, Chen Liang, Pengcheng He, Weizhu Chen, Tuo Zhao

Figure 1 for LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation
Figure 2 for LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation
Figure 3 for LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation
Figure 4 for LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation
Viaarxiv icon

Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

Add code
Bookmark button
Alert button
Mar 18, 2023
Qingru Zhang, Minshuo Chen, Alexander Bukharin, Pengcheng He, Yu Cheng, Weizhu Chen, Tuo Zhao

Figure 1 for Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
Figure 2 for Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
Figure 3 for Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
Figure 4 for Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
Viaarxiv icon

Less is More: Task-aware Layer-wise Distillation for Language Model Compression

Add code
Bookmark button
Alert button
Oct 05, 2022
Chen Liang, Simiao Zuo, Qingru Zhang, Pengcheng He, Weizhu Chen, Tuo Zhao

Figure 1 for Less is More: Task-aware Layer-wise Distillation for Language Model Compression
Figure 2 for Less is More: Task-aware Layer-wise Distillation for Language Model Compression
Figure 3 for Less is More: Task-aware Layer-wise Distillation for Language Model Compression
Figure 4 for Less is More: Task-aware Layer-wise Distillation for Language Model Compression
Viaarxiv icon

PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance

Add code
Bookmark button
Alert button
Jun 25, 2022
Qingru Zhang, Simiao Zuo, Chen Liang, Alexander Bukharin, Pengcheng He, Weizhu Chen, Tuo Zhao

Figure 1 for PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
Figure 2 for PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
Figure 3 for PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
Figure 4 for PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
Viaarxiv icon