Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sibin Deng

SATB-VR: Training Few-Step Video Restoration Diffusion Model using SNR-Aware Trajectory Blending

Jun 27, 2026

Haoran Bai, Xiaoxu Chen, Xiaoyu Liu, Zongsheng Yue, Sibin Deng, Wangmeng Zuo, Ying Chen

Abstract:While diffusion models excel in video restoration, their reliance on extensive iterative steps limits efficiency. Conversely, aggressive single-step distillation often compromises fine texture recovery. To achieve an optimal balance, we present SATB-VR, a few-step paradigm that jump-starts the denoising process via an auxiliary predictor, explicitly bypassing early low signal-to-noise ratio (SNR) steps. However, naive joint training of the predictor and the denoiser inherently introduces a severe train-inference discrepancy. To resolve this, we propose the SNR-Aware Trajectory Blending (SATB) strategy. During the forward process, SATB constructs the noisy input by dynamically blending the predictor's output with the ground-truth trajectory based on the SNRs. This forces the denoiser to robustly compensate for initial prediction errors while smoothly converging to the clean data manifold. Furthermore, we introduce a Denoiser-Driven Consistency (DDC) loss, leveraging the concurrently updated denoiser as a dynamic evaluator to explicitly align internal features and boost predictor accuracy. Extensive experiments demonstrate that, under flexible few-step inference regimes (\eg, $\le 5$ steps), SATB-VR performs favorably against existing approaches on synthetic, real-world, and AIGC benchmarks.

Via

Access Paper or Ask Questions

NTIRE 2026 Challenge on Short-form UGC Video Restoration in the Wild with Generative Models: Datasets, Methods and Results

Apr 12, 2026

Xin Li, Jiachao Gong, Xijun Wang, Shiyao Xiong, Bingchen Li, Suhang Yao, Chao Zhou, Zhibo Chen, Radu Timofte, Yuxiang Chen(+68 more)

Abstract:This paper presents an overview of the NTIRE 2026 Challenge on Short-form UGC Video Restoration in the Wild with Generative Models. This challenge utilizes a new short-form UGC (S-UGC) video restoration benchmark, termed KwaiVIR, which is contributed by USTC and Kuaishou Technology. It contains both synthetically distorted videos and real-world short-form UGC videos in the wild. For this edition, the released data include 200 synthetic training videos, 48 wild training videos, 11 validation videos, and 20 testing videos. The primary goal of this challenge is to establish a strong and practical benchmark for restoring short-form UGC videos under complex real-world degradations, especially in the emerging paradigm of generative-model-based S-UGC video restoration. This challenge has two tracks: (i) the primary track is a subjective track, where the evaluation is based on a user study; (ii) the second track is an objective track. These two tracks enable a comprehensive assessment of restoration quality. In total, 95 teams have registered for this competition. And 12 teams submitted valid final solutions and fact sheets for the testing phase. The submitted methods achieved strong performance on the KwaiVIR benchmark, demonstrating encouraging progress in short-form UGC video restoration in the wild.

* Accepted by CVPR 2026 workshop; NTIRE 2026

Via

Access Paper or Ask Questions

Vivid-VR: Distilling Concepts from Text-to-Video Diffusion Transformer for Photorealistic Video Restoration

Aug 20, 2025

Haoran Bai, Xiaoxu Chen, Canqian Yang, Zongyao He, Sibin Deng, Ying Chen

Figure 1 for Vivid-VR: Distilling Concepts from Text-to-Video Diffusion Transformer for Photorealistic Video Restoration

Figure 2 for Vivid-VR: Distilling Concepts from Text-to-Video Diffusion Transformer for Photorealistic Video Restoration

Figure 3 for Vivid-VR: Distilling Concepts from Text-to-Video Diffusion Transformer for Photorealistic Video Restoration

Figure 4 for Vivid-VR: Distilling Concepts from Text-to-Video Diffusion Transformer for Photorealistic Video Restoration

Abstract:We present Vivid-VR, a DiT-based generative video restoration method built upon an advanced T2V foundation model, where ControlNet is leveraged to control the generation process, ensuring content consistency. However, conventional fine-tuning of such controllable pipelines frequently suffers from distribution drift due to limitations in imperfect multimodal alignment, resulting in compromised texture realism and temporal coherence. To tackle this challenge, we propose a concept distillation training strategy that utilizes the pretrained T2V model to synthesize training samples with embedded textual concepts, thereby distilling its conceptual understanding to preserve texture and temporal quality. To enhance generation controllability, we redesign the control architecture with two key components: 1) a control feature projector that filters degradation artifacts from input video latents to minimize their propagation through the generation pipeline, and 2) a new ControlNet connector employing a dual-branch design. This connector synergistically combines MLP-based feature mapping with cross-attention mechanism for dynamic control feature retrieval, enabling both content preservation and adaptive control signal modulation. Extensive experiments show that Vivid-VR performs favorably against existing approaches on both synthetic and real-world benchmarks, as well as AIGC videos, achieving impressive texture realism, visual vividness, and temporal consistency. The codes and checkpoints are publicly available at https://github.com/csbhr/Vivid-VR.

Via

Access Paper or Ask Questions

Subjective and Objective Quality-of-Experience Evaluation Study for Live Video Streaming

Sep 26, 2024

Zehao Zhu, Wei Sun, Jun Jia, Wei Wu, Sibin Deng, Kai Li, Ying Chen, Xiongkuo Min, Jia Wang, Guangtao Zhai

Figure 1 for Subjective and Objective Quality-of-Experience Evaluation Study for Live Video Streaming

Figure 2 for Subjective and Objective Quality-of-Experience Evaluation Study for Live Video Streaming

Figure 3 for Subjective and Objective Quality-of-Experience Evaluation Study for Live Video Streaming

Figure 4 for Subjective and Objective Quality-of-Experience Evaluation Study for Live Video Streaming

Abstract:In recent years, live video streaming has gained widespread popularity across various social media platforms. Quality of experience (QoE), which reflects end-users' satisfaction and overall experience, plays a critical role for media service providers to optimize large-scale live compression and transmission strategies to achieve perceptually optimal rate-distortion trade-off. Although many QoE metrics for video-on-demand (VoD) have been proposed, there remain significant challenges in developing QoE metrics for live video streaming. To bridge this gap, we conduct a comprehensive study of subjective and objective QoE evaluations for live video streaming. For the subjective QoE study, we introduce the first live video streaming QoE dataset, TaoLive QoE, which consists of $42$ source videos collected from real live broadcasts and $1,155$ corresponding distorted ones degraded due to a variety of streaming distortions, including conventional streaming distortions such as compression, stalling, as well as live streaming-specific distortions like frame skipping, variable frame rate, etc. Subsequently, a human study was conducted to derive subjective QoE scores of videos in the TaoLive QoE dataset. For the objective QoE study, we benchmark existing QoE models on the TaoLive QoE dataset as well as publicly available QoE datasets for VoD scenarios, highlighting that current models struggle to accurately assess video QoE, particularly for live content. Hence, we propose an end-to-end QoE evaluation model, Tao-QoE, which integrates multi-scale semantic features and optical flow-based motion features to predicting a retrospective QoE score, eliminating reliance on statistical quality of service (QoS) features.

* 14 pages, 5 figures

Via

Access Paper or Ask Questions

NTIRE 2023 Quality Assessment of Video Enhancement Challenge

Jul 19, 2023

Xiaohong Liu, Xiongkuo Min, Wei Sun, Yulun Zhang, Kai Zhang, Radu Timofte, Guangtao Zhai, Yixuan Gao, Yuqin Cao, Tengchuan Kou(+62 more)

Figure 1 for NTIRE 2023 Quality Assessment of Video Enhancement Challenge

Figure 2 for NTIRE 2023 Quality Assessment of Video Enhancement Challenge

Figure 3 for NTIRE 2023 Quality Assessment of Video Enhancement Challenge

Figure 4 for NTIRE 2023 Quality Assessment of Video Enhancement Challenge

Abstract:This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual Video Enhancement (VDPVE), which has a total of 1211 enhanced videos, including 600 videos with color, brightness, and contrast enhancements, 310 videos with deblurring, and 301 deshaked videos. The challenge has a total of 167 registered participants. 61 participating teams submitted their prediction results during the development phase, with a total of 3168 submissions. A total of 176 submissions were submitted by 37 participating teams during the final testing phase. Finally, 19 participating teams submitted their models and fact sheets, and detailed the methods they used. Some methods have achieved better results than baseline methods, and the winning methods have demonstrated superior prediction performance.

Via

Access Paper or Ask Questions