Picture for Yunho Jin

Yunho Jin

S$^{3}$: Increasing GPU Utilization during Generative Inference for Higher Throughput

Add code
Jun 09, 2023
Figure 1 for S$^{3}$: Increasing GPU Utilization during Generative Inference for Higher Throughput
Figure 2 for S$^{3}$: Increasing GPU Utilization during Generative Inference for Higher Throughput
Figure 3 for S$^{3}$: Increasing GPU Utilization during Generative Inference for Higher Throughput
Figure 4 for S$^{3}$: Increasing GPU Utilization during Generative Inference for Higher Throughput
Viaarxiv icon

Bigger&Faster: Two-stage Neural Architecture Search for Quantized Transformer Models

Add code
Sep 25, 2022
Figure 1 for Bigger&Faster: Two-stage Neural Architecture Search for Quantized Transformer Models
Figure 2 for Bigger&Faster: Two-stage Neural Architecture Search for Quantized Transformer Models
Figure 3 for Bigger&Faster: Two-stage Neural Architecture Search for Quantized Transformer Models
Figure 4 for Bigger&Faster: Two-stage Neural Architecture Search for Quantized Transformer Models
Viaarxiv icon