Picture for Woosuk Kwon

Woosuk Kwon

Dima

Gemma 3 Technical Report

Add code
Mar 25, 2025
Viaarxiv icon

Gemma 2: Improving Open Language Models at a Practical Size

Add code
Aug 02, 2024
Figure 1 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 2 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 3 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 4 for Gemma 2: Improving Open Language Models at a Practical Size
Viaarxiv icon

Optimizing Speculative Decoding for Serving Large Language Models Using Goodput

Add code
Jun 20, 2024
Figure 1 for Optimizing Speculative Decoding for Serving Large Language Models Using Goodput
Figure 2 for Optimizing Speculative Decoding for Serving Large Language Models Using Goodput
Figure 3 for Optimizing Speculative Decoding for Serving Large Language Models Using Goodput
Figure 4 for Optimizing Speculative Decoding for Serving Large Language Models Using Goodput
Viaarxiv icon

Efficient Memory Management for Large Language Model Serving with PagedAttention

Add code
Sep 12, 2023
Figure 1 for Efficient Memory Management for Large Language Model Serving with PagedAttention
Figure 2 for Efficient Memory Management for Large Language Model Serving with PagedAttention
Figure 3 for Efficient Memory Management for Large Language Model Serving with PagedAttention
Figure 4 for Efficient Memory Management for Large Language Model Serving with PagedAttention
Viaarxiv icon

Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Add code
Dec 04, 2020
Figure 1 for Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
Figure 2 for Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
Figure 3 for Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
Figure 4 for Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
Viaarxiv icon