Picture for Olivier Tardieu

Olivier Tardieu

Data Driven Optimization of GPU efficiency for Distributed LLM Adapter Serving

Add code
Feb 27, 2026
Viaarxiv icon

Maximizing GPU Efficiency via Optimal Adapter Caching: An Analytical Approach for Multi-Tenant LLM Serving

Add code
Aug 11, 2025
Figure 1 for Maximizing GPU Efficiency via Optimal Adapter Caching: An Analytical Approach for Multi-Tenant LLM Serving
Figure 2 for Maximizing GPU Efficiency via Optimal Adapter Caching: An Analytical Approach for Multi-Tenant LLM Serving
Figure 3 for Maximizing GPU Efficiency via Optimal Adapter Caching: An Analytical Approach for Multi-Tenant LLM Serving
Figure 4 for Maximizing GPU Efficiency via Optimal Adapter Caching: An Analytical Approach for Multi-Tenant LLM Serving
Viaarxiv icon

Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference

Add code
Mar 11, 2025
Viaarxiv icon

Towards Pareto Optimal Throughput in Small Language Model Serving

Add code
Apr 04, 2024
Viaarxiv icon