S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs

Add code
Jun 01, 2024
Figure 1 for S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs
Figure 2 for S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs
Figure 3 for S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs
Figure 4 for S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: