Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Training-Free Efficient Video Generation via Dynamic Token Carving

May 22, 2025

Yuechen Zhang, Jinbo Xing, Bin Xia, Shaoteng Liu, Bohao Peng, Xin Tao, Pengfei Wan, Eric Lo, Jiaya Jia

Figure 1 for Training-Free Efficient Video Generation via Dynamic Token Carving

Figure 2 for Training-Free Efficient Video Generation via Dynamic Token Carving

Figure 3 for Training-Free Efficient Video Generation via Dynamic Token Carving

Figure 4 for Training-Free Efficient Video Generation via Dynamic Token Carving

Share this with someone who'll enjoy it:

Abstract:Despite the remarkable generation quality of video Diffusion Transformer (DiT) models, their practical deployment is severely hindered by extensive computational requirements. This inefficiency stems from two key challenges: the quadratic complexity of self-attention with respect to token length and the multi-step nature of diffusion models. To address these limitations, we present Jenga, a novel inference pipeline that combines dynamic attention carving with progressive resolution generation. Our approach leverages two key insights: (1) early denoising steps do not require high-resolution latents, and (2) later steps do not require dense attention. Jenga introduces a block-wise attention mechanism that dynamically selects relevant token interactions using 3D space-filling curves, alongside a progressive resolution strategy that gradually increases latent resolution during generation. Experimental results demonstrate that Jenga achieves substantial speedups across multiple state-of-the-art video diffusion models while maintaining comparable generation quality (8.83$\times$ speedup with 0.01\% performance drop on VBench). As a plug-and-play solution, Jenga enables practical, high-quality video generation on modern hardware by reducing inference time from minutes to seconds -- without requiring model retraining. Code: https://github.com/dvlab-research/Jenga

* Project Page: https://julianjuaner.github.io/projects/jenga/ , 24 pages

View paper on

Share this with someone who'll enjoy it:

Title:Training-Free Efficient Video Generation via Dynamic Token Carving

Paper and Code