Picture for Murali Annavaram

Murali Annavaram

PrivacySIM: Evaluating LLM Simulation of User Privacy Behavior

Add code
May 12, 2026
Viaarxiv icon

Fast NF4 Dequantization Kernels for Large Language Model Inference

Add code
Apr 02, 2026
Viaarxiv icon

Differentially Private Retrieval-Augmented Generation

Add code
Feb 16, 2026
Viaarxiv icon

LRD-MPC: Efficient MPC Inference through Low-rank Decomposition

Add code
Feb 16, 2026
Viaarxiv icon

Striking the Right Balance between Compute and Copy: Improving LLM Inferencing Under Speculative Decoding

Add code
Nov 15, 2025
Viaarxiv icon

DuetServe: Harmonizing Prefill and Decode for LLM Serving via Adaptive GPU Multiplexing

Add code
Nov 06, 2025
Viaarxiv icon

Memory-Efficient Differentially Private Training with Gradient Random Projection

Add code
Jun 18, 2025
Viaarxiv icon

DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding

Add code
Apr 08, 2025
Figure 1 for DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
Figure 2 for DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
Figure 3 for DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
Figure 4 for DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
Viaarxiv icon

Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation

Add code
Nov 26, 2024
Viaarxiv icon

Characterizing Context Influence and Hallucination in Summarization

Add code
Oct 03, 2024
Figure 1 for Characterizing Context Influence and Hallucination in Summarization
Figure 2 for Characterizing Context Influence and Hallucination in Summarization
Figure 3 for Characterizing Context Influence and Hallucination in Summarization
Figure 4 for Characterizing Context Influence and Hallucination in Summarization
Viaarxiv icon