Picture for Esha Choukse

Esha Choukse

Power Stabilization for AI Training Datacenters

Add code
Aug 21, 2025
Viaarxiv icon

Towards Efficient Large Multimodal Model Serving

Add code
Feb 02, 2025
Viaarxiv icon

Towards Resource-Efficient Compound AI Systems

Add code
Jan 29, 2025
Viaarxiv icon

TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms

Add code
Jan 05, 2025
Figure 1 for TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms
Figure 2 for TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms
Figure 3 for TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms
Figure 4 for TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms
Viaarxiv icon

DroidSpeak: Enhancing Cross-LLM Communication

Add code
Nov 05, 2024
Figure 1 for DroidSpeak: Enhancing Cross-LLM Communication
Figure 2 for DroidSpeak: Enhancing Cross-LLM Communication
Figure 3 for DroidSpeak: Enhancing Cross-LLM Communication
Figure 4 for DroidSpeak: Enhancing Cross-LLM Communication
Viaarxiv icon

Input-Dependent Power Usage in GPUs

Add code
Sep 26, 2024
Figure 1 for Input-Dependent Power Usage in GPUs
Figure 2 for Input-Dependent Power Usage in GPUs
Figure 3 for Input-Dependent Power Usage in GPUs
Figure 4 for Input-Dependent Power Usage in GPUs
Viaarxiv icon

Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations

Add code
Sep 25, 2024
Figure 1 for Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Figure 2 for Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Figure 3 for Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Figure 4 for Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Viaarxiv icon

DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency

Add code
Aug 01, 2024
Figure 1 for DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency
Figure 2 for DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency
Figure 3 for DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency
Figure 4 for DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency
Viaarxiv icon

Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference

Add code
Mar 29, 2024
Viaarxiv icon

POLCA: Power Oversubscription in LLM Cloud Providers

Add code
Aug 24, 2023
Figure 1 for POLCA: Power Oversubscription in LLM Cloud Providers
Figure 2 for POLCA: Power Oversubscription in LLM Cloud Providers
Figure 3 for POLCA: Power Oversubscription in LLM Cloud Providers
Figure 4 for POLCA: Power Oversubscription in LLM Cloud Providers
Viaarxiv icon