Picture for Sanjiv Kumar

Sanjiv Kumar

Google Research

Efficient Document Ranking with Learnable Late Interactions

Add code
Jun 25, 2024
Viaarxiv icon

Landscape-Aware Growing: The Power of a Little LAG

Add code
Jun 04, 2024
Viaarxiv icon

Faster Cascades via Speculative Decoding

Add code
May 29, 2024
Viaarxiv icon

Language Model Cascades: Token-level uncertainty and beyond

Add code
Apr 15, 2024
Viaarxiv icon

Towards Fast Inference: Exploring and Improving Blockwise Parallel Drafts

Add code
Apr 14, 2024
Figure 1 for Towards Fast Inference: Exploring and Improving Blockwise Parallel Drafts
Figure 2 for Towards Fast Inference: Exploring and Improving Blockwise Parallel Drafts
Figure 3 for Towards Fast Inference: Exploring and Improving Blockwise Parallel Drafts
Figure 4 for Towards Fast Inference: Exploring and Improving Blockwise Parallel Drafts
Viaarxiv icon

SOAR: Improved Indexing for Approximate Nearest Neighbor Search

Add code
Mar 31, 2024
Viaarxiv icon

Metric-aware LLM inference

Add code
Mar 07, 2024
Figure 1 for Metric-aware LLM inference
Figure 2 for Metric-aware LLM inference
Figure 3 for Metric-aware LLM inference
Figure 4 for Metric-aware LLM inference
Viaarxiv icon

HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference

Add code
Feb 14, 2024
Viaarxiv icon

Tandem Transformers for Inference Efficient LLMs

Add code
Feb 13, 2024
Viaarxiv icon

Efficient Stagewise Pretraining via Progressive Subnetworks

Add code
Feb 08, 2024
Viaarxiv icon