Picture for Coleman Hooper

Coleman Hooper

AI and Memory Wall

Mar 21, 2024
Figure 1 for AI and Memory Wall
Figure 2 for AI and Memory Wall
Figure 3 for AI and Memory Wall
Figure 4 for AI and Memory Wall
Viaarxiv icon

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Add code
Feb 07, 2024
Viaarxiv icon

Learned Best-Effort LLM Serving

Jan 15, 2024
Viaarxiv icon

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Add code
Nov 07, 2023
Figure 1 for S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Figure 2 for S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Figure 3 for S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Figure 4 for S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Viaarxiv icon

Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation

Add code
Oct 18, 2023
Figure 1 for Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation
Figure 2 for Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation
Figure 3 for Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation
Figure 4 for Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation
Viaarxiv icon

SPEED: Speculative Pipelined Execution for Efficient Decoding

Oct 18, 2023
Figure 1 for SPEED: Speculative Pipelined Execution for Efficient Decoding
Figure 2 for SPEED: Speculative Pipelined Execution for Efficient Decoding
Figure 3 for SPEED: Speculative Pipelined Execution for Efficient Decoding
Figure 4 for SPEED: Speculative Pipelined Execution for Efficient Decoding
Viaarxiv icon

SqueezeLLM: Dense-and-Sparse Quantization

Add code
Jun 13, 2023
Figure 1 for SqueezeLLM: Dense-and-Sparse Quantization
Figure 2 for SqueezeLLM: Dense-and-Sparse Quantization
Figure 3 for SqueezeLLM: Dense-and-Sparse Quantization
Figure 4 for SqueezeLLM: Dense-and-Sparse Quantization
Viaarxiv icon

Full Stack Optimization of Transformer Inference: a Survey

Feb 27, 2023
Figure 1 for Full Stack Optimization of Transformer Inference: a Survey
Figure 2 for Full Stack Optimization of Transformer Inference: a Survey
Figure 3 for Full Stack Optimization of Transformer Inference: a Survey
Figure 4 for Full Stack Optimization of Transformer Inference: a Survey
Viaarxiv icon

Quantifying and Maximizing the Benefits of Back-End Noise Adaption on Attention-Based Speech Recognition Models

Add code
May 03, 2021
Figure 1 for Quantifying and Maximizing the Benefits of Back-End Noise Adaption on Attention-Based Speech Recognition Models
Figure 2 for Quantifying and Maximizing the Benefits of Back-End Noise Adaption on Attention-Based Speech Recognition Models
Figure 3 for Quantifying and Maximizing the Benefits of Back-End Noise Adaption on Attention-Based Speech Recognition Models
Figure 4 for Quantifying and Maximizing the Benefits of Back-End Noise Adaption on Attention-Based Speech Recognition Models
Viaarxiv icon

EdgeBERT: Optimizing On-Chip Inference for Multi-Task NLP

Dec 01, 2020
Figure 1 for EdgeBERT: Optimizing On-Chip Inference for Multi-Task NLP
Figure 2 for EdgeBERT: Optimizing On-Chip Inference for Multi-Task NLP
Figure 3 for EdgeBERT: Optimizing On-Chip Inference for Multi-Task NLP
Figure 4 for EdgeBERT: Optimizing On-Chip Inference for Multi-Task NLP
Viaarxiv icon