Alert button
Picture for Mao Yang

Mao Yang

Alert button

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Add code
Bookmark button
Alert button
Feb 21, 2024
Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, Yuanyuan Xu, Ning Shang, Jiahang Xu, Fan Yang, Mao Yang

Viaarxiv icon

Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning

Add code
Bookmark button
Alert button
Dec 26, 2023
Xijie Huang, Li Lyna Zhang, Kwang-Ting Cheng, Mao Yang

Viaarxiv icon

Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models

Add code
Bookmark button
Alert button
Oct 11, 2023
Song Guo, Jiahang Xu, Li Lyna Zhang, Mao Yang

Viaarxiv icon

Model-enhanced Vector Index

Add code
Bookmark button
Alert button
Sep 23, 2023
Hailin Zhang, Yujing Wang, Qi Chen, Ruiheng Chang, Ting Zhang, Ziming Miao, Yingyan Hou, Yang Ding, Xupeng Miao, Haonan Wang, Bochen Pang, Yuefeng Zhan, Hao Sun, Weiwei Deng, Qi Zhang, Fan Yang, Xing Xie, Mao Yang, Bin Cui

Figure 1 for Model-enhanced Vector Index
Figure 2 for Model-enhanced Vector Index
Figure 3 for Model-enhanced Vector Index
Figure 4 for Model-enhanced Vector Index
Viaarxiv icon

Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations

Add code
Bookmark button
Alert button
Sep 16, 2023
Fucheng Jia, Shiqi Jiang, Ting Cao, Wei Cui, Tianrui Xia, Xu Cao, Yuanchun Li, Deyu Zhang, Ju Ren, Yunxin Liu, Lili Qiu, Mao Yang

Figure 1 for Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations
Figure 2 for Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations
Figure 3 for Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations
Figure 4 for Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations
Viaarxiv icon

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Add code
Bookmark button
Alert button
Aug 23, 2023
Ranggi Hwang, Jianyu Wei, Shijie Cao, Changho Hwang, Xiaohu Tang, Ting Cao, Mao Yang, Minsoo Rhu

Figure 1 for Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Figure 2 for Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Figure 3 for Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Figure 4 for Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Viaarxiv icon

Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference

Add code
Bookmark button
Alert button
Jun 26, 2023
Junyan Li, Li Lyna Zhang, Jiahang Xu, Yujing Wang, Shaoguang Yan, Yunqing Xia, Yuqing Yang, Ting Cao, Hao Sun, Weiwei Deng, Qi Zhang, Mao Yang

Figure 1 for Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference
Figure 2 for Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference
Figure 3 for Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference
Figure 4 for Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference
Viaarxiv icon

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Add code
Bookmark button
Alert button
May 31, 2023
Huiqiang Jiang, Li Lyna Zhang, Yuang Li, Yu Wu, Shijie Cao, Ting Cao, Yuqing Yang, Jinyu Li, Mao Yang, Lili Qiu

Figure 1 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Figure 2 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Figure 3 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Figure 4 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Viaarxiv icon

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models

Add code
Bookmark button
Alert button
May 21, 2023
Yijia Zhang, Lingran Zhao, Shijie Cao, Wenqiang Wang, Ting Cao, Fan Yang, Mao Yang, Shanghang Zhang, Ningyi Xu

Figure 1 for Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models
Figure 2 for Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models
Figure 3 for Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models
Figure 4 for Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models
Viaarxiv icon

IRGen: Generative Modeling for Image Retrieval

Add code
Bookmark button
Alert button
Mar 27, 2023
Yidan Zhang, Ting Zhang, Dong Chen, Yujing Wang, Qi Chen, Xing Xie, Hao Sun, Weiwei Deng, Qi Zhang, Fan Yang, Mao Yang, Qingmin Liao, Baining Guo

Figure 1 for IRGen: Generative Modeling for Image Retrieval
Figure 2 for IRGen: Generative Modeling for Image Retrieval
Figure 3 for IRGen: Generative Modeling for Image Retrieval
Figure 4 for IRGen: Generative Modeling for Image Retrieval
Viaarxiv icon