Picture for Grigory Sizov

Grigory Sizov

Jack

Efficient Speculative Decoding for Llama at Scale: Challenges and Solutions

Add code
Aug 11, 2025
Viaarxiv icon

Context Parallelism for Scalable Million-Token Inference

Add code
Nov 04, 2024
Figure 1 for Context Parallelism for Scalable Million-Token Inference
Figure 2 for Context Parallelism for Scalable Million-Token Inference
Figure 3 for Context Parallelism for Scalable Million-Token Inference
Figure 4 for Context Parallelism for Scalable Million-Token Inference
Viaarxiv icon

The Llama 3 Herd of Models

Add code
Jul 31, 2024
Viaarxiv icon