Picture for Seungbeom Choi

Seungbeom Choi

FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration

Add code
May 28, 2025
Viaarxiv icon

ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor

Add code
May 14, 2025
Viaarxiv icon

Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning

Add code
Sep 01, 2021
Figure 1 for Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning
Figure 2 for Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning
Figure 3 for Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning
Figure 4 for Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning
Viaarxiv icon