Picture for Ferran Agullo

Ferran Agullo

Data Driven Optimization of GPU efficiency for Distributed LLM Adapter Serving

Add code
Feb 27, 2026
Viaarxiv icon

Maximizing GPU Efficiency via Optimal Adapter Caching: An Analytical Approach for Multi-Tenant LLM Serving

Add code
Aug 11, 2025
Figure 1 for Maximizing GPU Efficiency via Optimal Adapter Caching: An Analytical Approach for Multi-Tenant LLM Serving
Figure 2 for Maximizing GPU Efficiency via Optimal Adapter Caching: An Analytical Approach for Multi-Tenant LLM Serving
Figure 3 for Maximizing GPU Efficiency via Optimal Adapter Caching: An Analytical Approach for Multi-Tenant LLM Serving
Figure 4 for Maximizing GPU Efficiency via Optimal Adapter Caching: An Analytical Approach for Multi-Tenant LLM Serving
Viaarxiv icon

Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference

Add code
Mar 11, 2025
Viaarxiv icon