Picture for Murali Annavaram

Murali Annavaram

DuetServe: Harmonizing Prefill and Decode for LLM Serving via Adaptive GPU Multiplexing

Add code
Nov 06, 2025
Viaarxiv icon

Memory-Efficient Differentially Private Training with Gradient Random Projection

Add code
Jun 18, 2025
Viaarxiv icon

DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding

Add code
Apr 08, 2025
Figure 1 for DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
Figure 2 for DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
Figure 3 for DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
Figure 4 for DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
Viaarxiv icon

Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation

Add code
Nov 26, 2024
Viaarxiv icon

Characterizing Context Influence and Hallucination in Summarization

Add code
Oct 03, 2024
Figure 1 for Characterizing Context Influence and Hallucination in Summarization
Figure 2 for Characterizing Context Influence and Hallucination in Summarization
Figure 3 for Characterizing Context Influence and Hallucination in Summarization
Figure 4 for Characterizing Context Influence and Hallucination in Summarization
Viaarxiv icon

Adaptively Private Next-Token Prediction of Large Language Models

Add code
Oct 02, 2024
Figure 1 for Adaptively Private Next-Token Prediction of Large Language Models
Figure 2 for Adaptively Private Next-Token Prediction of Large Language Models
Figure 3 for Adaptively Private Next-Token Prediction of Large Language Models
Figure 4 for Adaptively Private Next-Token Prediction of Large Language Models
Viaarxiv icon

CADC: Encoding User-Item Interactions for Compressing Recommendation Model Training Data

Add code
Jul 11, 2024
Figure 1 for CADC: Encoding User-Item Interactions for Compressing Recommendation Model Training Data
Figure 2 for CADC: Encoding User-Item Interactions for Compressing Recommendation Model Training Data
Viaarxiv icon

Ethos: Rectifying Language Models in Orthogonal Parameter Space

Add code
Apr 01, 2024
Figure 1 for Ethos: Rectifying Language Models in Orthogonal Parameter Space
Figure 2 for Ethos: Rectifying Language Models in Orthogonal Parameter Space
Figure 3 for Ethos: Rectifying Language Models in Orthogonal Parameter Space
Figure 4 for Ethos: Rectifying Language Models in Orthogonal Parameter Space
Viaarxiv icon

Differentially Private Next-Token Prediction of Large Language Models

Add code
Apr 01, 2024
Figure 1 for Differentially Private Next-Token Prediction of Large Language Models
Figure 2 for Differentially Private Next-Token Prediction of Large Language Models
Figure 3 for Differentially Private Next-Token Prediction of Large Language Models
Figure 4 for Differentially Private Next-Token Prediction of Large Language Models
Viaarxiv icon

Edge Private Graph Neural Networks with Singular Value Perturbation

Add code
Mar 16, 2024
Viaarxiv icon