Picture for Mao Yang

Mao Yang

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge

Add code
Jun 25, 2024
Viaarxiv icon

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Add code
May 13, 2024
Viaarxiv icon

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Add code
Feb 21, 2024
Figure 1 for LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Figure 2 for LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Figure 3 for LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Figure 4 for LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Viaarxiv icon

Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning

Add code
Dec 26, 2023
Figure 1 for Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning
Figure 2 for Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning
Figure 3 for Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning
Figure 4 for Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning
Viaarxiv icon

Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models

Add code
Oct 11, 2023
Figure 1 for Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models
Figure 2 for Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models
Figure 3 for Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models
Figure 4 for Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models
Viaarxiv icon

Model-enhanced Vector Index

Add code
Sep 23, 2023
Figure 1 for Model-enhanced Vector Index
Figure 2 for Model-enhanced Vector Index
Figure 3 for Model-enhanced Vector Index
Figure 4 for Model-enhanced Vector Index
Viaarxiv icon

Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations

Add code
Sep 16, 2023
Figure 1 for Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations
Figure 2 for Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations
Figure 3 for Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations
Figure 4 for Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations
Viaarxiv icon

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Add code
Aug 23, 2023
Figure 1 for Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Figure 2 for Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Figure 3 for Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Figure 4 for Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Viaarxiv icon

Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference

Add code
Jun 26, 2023
Figure 1 for Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference
Figure 2 for Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference
Figure 3 for Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference
Figure 4 for Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference
Viaarxiv icon

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Add code
May 31, 2023
Figure 1 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Figure 2 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Figure 3 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Figure 4 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Viaarxiv icon