Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shuaicheng Liu

StabStitch++: Unsupervised Online Video Stitching with Spatiotemporal Bidirectional Warps

May 08, 2025

Lang Nie, Chunyu Lin, Kang Liao, Yun Zhang, Shuaicheng Liu, Yao Zhao

Figure 1 for StabStitch++: Unsupervised Online Video Stitching with Spatiotemporal Bidirectional Warps

Figure 2 for StabStitch++: Unsupervised Online Video Stitching with Spatiotemporal Bidirectional Warps

Figure 3 for StabStitch++: Unsupervised Online Video Stitching with Spatiotemporal Bidirectional Warps

Figure 4 for StabStitch++: Unsupervised Online Video Stitching with Spatiotemporal Bidirectional Warps

Abstract:We retarget video stitching to an emerging issue, named warping shake, which unveils the temporal content shakes induced by sequentially unsmooth warps when extending image stitching to video stitching. Even if the input videos are stable, the stitched video can inevitably cause undesired warping shakes and affect the visual experience. To address this issue, we propose StabStitch++, a novel video stitching framework to realize spatial stitching and temporal stabilization with unsupervised learning simultaneously. First, different from existing learning-based image stitching solutions that typically warp one image to align with another, we suppose a virtual midplane between original image planes and project them onto it. Concretely, we design a differentiable bidirectional decomposition module to disentangle the homography transformation and incorporate it into our spatial warp, evenly spreading alignment burdens and projective distortions across two views. Then, inspired by camera paths in video stabilization, we derive the mathematical expression of stitching trajectories in video stitching by elaborately integrating spatial and temporal warps. Finally, a warp smoothing model is presented to produce stable stitched videos with a hybrid loss to simultaneously encourage content alignment, trajectory smoothness, and online collaboration. Compared with StabStitch that sacrifices alignment for stabilization, StabStitch++ makes no compromise and optimizes both of them simultaneously, especially in the online mode. To establish an evaluation benchmark and train the learning framework, we build a video stitching dataset with a rich diversity in camera motions and scenes. Experiments exhibit that StabStitch++ surpasses current solutions in stitching performance, robustness, and efficiency, offering compelling advancements in this field by building a real-time online video stitching system.

* TPAMI2025; https://github.com/nie-lang/StabStitch2. arXiv admin note: text overlap with arXiv:2403.06378

Via

Access Paper or Ask Questions

CodingHomo: Bootstrapping Deep Homography With Video Coding

Apr 16, 2025

Yike Liu, Haipeng Li, Shuaicheng Liu, Bing Zeng

Abstract:Homography estimation is a fundamental task in computer vision with applications in diverse fields. Recent advances in deep learning have improved homography estimation, particularly with unsupervised learning approaches, offering increased robustness and generalizability. However, accurately predicting homography, especially in complex motions, remains a challenge. In response, this work introduces a novel method leveraging video coding, particularly by harnessing inherent motion vectors (MVs) present in videos. We present CodingHomo, an unsupervised framework for homography estimation. Our framework features a Mask-Guided Fusion (MGF) module that identifies and utilizes beneficial features among the MVs, thereby enhancing the accuracy of homography prediction. Additionally, the Mask-Guided Homography Estimation (MGHE) module is presented for eliminating undesired features in the coarse-to-fine homography refinement process. CodingHomo outperforms existing state-of-the-art unsupervised methods, delivering good robustness and generalizability. The code and dataset are available at: \href{github}{https://github.com/liuyike422/CodingHomo

* IEEE Transactions on Circuits and Systems for Video Technology ( Volume: 34, Issue: 11, November 2024)

Via

Access Paper or Ask Questions

Coding-Prior Guided Diffusion Network for Video Deblurring

Apr 16, 2025

Yike Liu, Jianhui Zhang, Haipeng Li, Shuaicheng Liu, Bing Zeng

Abstract:While recent video deblurring methods have advanced significantly, they often overlook two valuable prior information: (1) motion vectors (MVs) and coding residuals (CRs) from video codecs, which provide efficient inter-frame alignment cues, and (2) the rich real-world knowledge embedded in pre-trained diffusion generative models. We present CPGDNet, a novel two-stage framework that effectively leverages both coding priors and generative diffusion priors for high-quality deblurring. First, our coding-prior feature propagation (CPFP) module utilizes MVs for efficient frame alignment and CRs to generate attention masks, addressing motion inaccuracies and texture variations. Second, a coding-prior controlled generation (CPC) module network integrates coding priors into a pretrained diffusion model, guiding it to enhance critical regions and synthesize realistic details. Experiments demonstrate our method achieves state-of-the-art perceptual quality with up to 30% improvement in IQA metrics. Both the code and the codingprior-augmented dataset will be open-sourced.

Via

Access Paper or Ask Questions

Learning Hazing to Dehazing: Towards Realistic Haze Generation for Real-World Image Dehazing

Mar 25, 2025

Ruiyi Wang, Yushuo Zheng, Zicheng Zhang, Chunyi Li, Shuaicheng Liu, Guangtao Zhai, Xiaohong Liu

Abstract:Existing real-world image dehazing methods primarily attempt to fine-tune pre-trained models or adapt their inference procedures, thus heavily relying on the pre-trained models and associated training data. Moreover, restoring heavily distorted information under dense haze requires generative diffusion models, whose potential in dehazing remains underutilized partly due to their lengthy sampling processes. To address these limitations, we introduce a novel hazing-dehazing pipeline consisting of a Realistic Hazy Image Generation framework (HazeGen) and a Diffusion-based Dehazing framework (DiffDehaze). Specifically, HazeGen harnesses robust generative diffusion priors of real-world hazy images embedded in a pre-trained text-to-image diffusion model. By employing specialized hybrid training and blended sampling strategies, HazeGen produces realistic and diverse hazy images as high-quality training data for DiffDehaze. To alleviate the inefficiency and fidelity concerns associated with diffusion-based methods, DiffDehaze adopts an Accelerated Fidelity-Preserving Sampling process (AccSamp). The core of AccSamp is the Tiled Statistical Alignment Operation (AlignOp), which can provide a clean and faithful dehazing estimate within a small fraction of sampling steps to reduce complexity and enable effective fidelity guidance. Extensive experiments demonstrate the superior dehazing performance and visual quality of our approach over existing methods. The code is available at https://github.com/ruiyi-w/Learning-Hazing-to-Dehazing.

* Accepted by CVPR 2025

Via

Access Paper or Ask Questions

ISPDiffuser: Learning RAW-to-sRGB Mappings with Texture-Aware Diffusion Models and Histogram-Guided Color Consistency

Mar 25, 2025

Yang Ren, Hai Jiang, Menglong Yang, Wei Li, Shuaicheng Liu

Figure 1 for ISPDiffuser: Learning RAW-to-sRGB Mappings with Texture-Aware Diffusion Models and Histogram-Guided Color Consistency

Figure 2 for ISPDiffuser: Learning RAW-to-sRGB Mappings with Texture-Aware Diffusion Models and Histogram-Guided Color Consistency

Figure 3 for ISPDiffuser: Learning RAW-to-sRGB Mappings with Texture-Aware Diffusion Models and Histogram-Guided Color Consistency

Figure 4 for ISPDiffuser: Learning RAW-to-sRGB Mappings with Texture-Aware Diffusion Models and Histogram-Guided Color Consistency

Abstract:RAW-to-sRGB mapping, or the simulation of the traditional camera image signal processor (ISP), aims to generate DSLR-quality sRGB images from raw data captured by smartphone sensors. Despite achieving comparable results to sophisticated handcrafted camera ISP solutions, existing learning-based methods still struggle with detail disparity and color distortion. In this paper, we present ISPDiffuser, a diffusion-based decoupled framework that separates the RAW-to-sRGB mapping into detail reconstruction in grayscale space and color consistency mapping from grayscale to sRGB. Specifically, we propose a texture-aware diffusion model that leverages the generative ability of diffusion models to focus on local detail recovery, in which a texture enrichment loss is further proposed to prompt the diffusion model to generate more intricate texture details. Subsequently, we introduce a histogram-guided color consistency module that utilizes color histogram as guidance to learn precise color information for grayscale to sRGB color consistency mapping, with a color consistency loss designed to constrain the learned color information. Extensive experimental results show that the proposed ISPDiffuser outperforms state-of-the-art competitors both quantitatively and visually. The code is available at https://github.com/RenYangSCU/ISPDiffuser.

* Accepted by AAAI 2025

Via

Access Paper or Ask Questions

HybridReg: Robust 3D Point Cloud Registration with Hybrid Motions

Mar 10, 2025

Keyu Du, Hao Xu, Haipeng Li, Hong Qu, Chi-Wing Fu, Shuaicheng Liu

Figure 1 for HybridReg: Robust 3D Point Cloud Registration with Hybrid Motions

Figure 2 for HybridReg: Robust 3D Point Cloud Registration with Hybrid Motions

Figure 3 for HybridReg: Robust 3D Point Cloud Registration with Hybrid Motions

Figure 4 for HybridReg: Robust 3D Point Cloud Registration with Hybrid Motions

Abstract:Scene-level point cloud registration is very challenging when considering dynamic foregrounds. Existing indoor datasets mostly assume rigid motions, so the trained models cannot robustly handle scenes with non-rigid motions. On the other hand, non-rigid datasets are mainly object-level, so the trained models cannot generalize well to complex scenes. This paper presents HybridReg, a new approach to 3D point cloud registration, learning uncertainty mask to account for hybrid motions: rigid for backgrounds and non-rigid/rigid for instance-level foregrounds. First, we build a scene-level 3D registration dataset, namely HybridMatch, designed specifically with strategies to arrange diverse deforming foregrounds in a controllable manner. Second, we account for different motion types and formulate a mask-learning module to alleviate the interference of deforming outliers. Third, we exploit a simple yet effective negative log-likelihood loss to adopt uncertainty to guide the feature extraction and correlation computation. To our best knowledge, HybridReg is the first work that exploits hybrid motions for robust point cloud registration. Extensive experiments show HybridReg's strengths, leading it to achieve state-of-the-art performance on both widely-used indoor and outdoor datasets.

* 2025, Association for the Advancement of Artificial Intelligence

Via

Access Paper or Ask Questions

FlowPolicy: Enabling Fast and Robust 3D Flow-based Policy via Consistency Flow Matching for Robot Manipulation

Dec 06, 2024

Qinglun Zhang, Zhen Liu, Haoqiang Fan, Guanghui Liu, Bing Zeng, Shuaicheng Liu

Abstract:Robots can acquire complex manipulation skills by learning policies from expert demonstrations, which is often known as vision-based imitation learning. Generating policies based on diffusion and flow matching models has been shown to be effective, particularly in robotics manipulation tasks. However, recursion-based approaches are often inference inefficient in working from noise distributions to policy distributions, posing a challenging trade-off between efficiency and quality. This motivates us to propose FlowPolicy, a novel framework for fast policy generation based on consistency flow matching and 3D vision. Our approach refines the flow dynamics by normalizing the self-consistency of the velocity field, enabling the model to derive task execution policies in a single inference step. Specifically, FlowPolicy conditions on the observed 3D point cloud, where consistency flow matching directly defines straight-line flows from different time states to the same action space, while simultaneously constraining their velocity values, that is, we approximate the trajectories from noise to robot actions by normalizing the self-consistency of the velocity field within the action space, thus improving the inference efficiency. We validate the effectiveness of FlowPolicy on Adroit and Metaworld, demonstrating a 7$\times$ increase in inference speed while maintaining competitive average success rates compared to state-of-the-art policy models. Codes will be made publicly available.

Via

Access Paper or Ask Questions

You Only Look Around: Learning Illumination Invariant Feature for Low-light Object Detection

Oct 24, 2024

Mingbo Hong, Shen Cheng, Haibin Huang, Haoqiang Fan, Shuaicheng Liu

Abstract:In this paper, we introduce YOLA, a novel framework for object detection in low-light scenarios. Unlike previous works, we propose to tackle this challenging problem from the perspective of feature learning. Specifically, we propose to learn illumination-invariant features through the Lambertian image formation model. We observe that, under the Lambertian assumption, it is feasible to approximate illumination-invariant feature maps by exploiting the interrelationships between neighboring color channels and spatially adjacent pixels. By incorporating additional constraints, these relationships can be characterized in the form of convolutional kernels, which can be trained in a detection-driven manner within a network. Towards this end, we introduce a novel module dedicated to the extraction of illumination-invariant features from low-light images, which can be easily integrated into existing object detection frameworks. Our empirical findings reveal significant improvements in low-light object detection tasks, as well as promising results in both well-lit and over-lit scenarios. Code is available at \url{https://github.com/MingboHong/YOLA}.

* Accepted by NeurIPS2024

Via

Access Paper or Ask Questions

Neural Spectral Decomposition for Dataset Distillation

Aug 29, 2024

Shaolei Yang, Shen Cheng, Mingbo Hong, Haoqiang Fan, Xing Wei, Shuaicheng Liu

Figure 1 for Neural Spectral Decomposition for Dataset Distillation

Abstract:In this paper, we propose Neural Spectrum Decomposition, a generic decomposition framework for dataset distillation. Unlike previous methods, we consider the entire dataset as a high-dimensional observation that is low-rank across all dimensions. We aim to discover the low-rank representation of the entire dataset and perform distillation efficiently. Toward this end, we learn a set of spectrum tensors and transformation matrices, which, through simple matrix multiplication, reconstruct the data distribution. Specifically, a spectrum tensor can be mapped back to the image space by a transformation matrix, and efficient information sharing during the distillation learning process is achieved through pairwise combinations of different spectrum vectors and transformation matrices. Furthermore, we integrate a trajectory matching optimization method guided by a real distribution. Our experimental results demonstrate that our approach achieves state-of-the-art performance on benchmarks, including CIFAR10, CIFAR100, Tiny Imagenet, and ImageNet Subset. Our code are available at \url{https://github.com/slyang2021/NSD}.

* ECCV 2024

Via

Access Paper or Ask Questions

Efficient Single Image Super-Resolution with Entropy Attention and Receptive Field Augmentation

Aug 08, 2024

Xiaole Zhao, Linze Li, Chengxing Xie, Xiaoming Zhang, Ting Jiang, Wenjie Lin, Shuaicheng Liu, Tianrui Li

Figure 1 for Efficient Single Image Super-Resolution with Entropy Attention and Receptive Field Augmentation

Figure 2 for Efficient Single Image Super-Resolution with Entropy Attention and Receptive Field Augmentation

Figure 3 for Efficient Single Image Super-Resolution with Entropy Attention and Receptive Field Augmentation

Figure 4 for Efficient Single Image Super-Resolution with Entropy Attention and Receptive Field Augmentation

Abstract:Transformer-based deep models for single image super-resolution (SISR) have greatly improved the performance of lightweight SISR tasks in recent years. However, they often suffer from heavy computational burden and slow inference due to the complex calculation of multi-head self-attention (MSA), seriously hindering their practical application and deployment. In this work, we present an efficient SR model to mitigate the dilemma between model efficiency and SR performance, which is dubbed Entropy Attention and Receptive Field Augmentation network (EARFA), and composed of a novel entropy attention (EA) and a shifting large kernel attention (SLKA). From the perspective of information theory, EA increases the entropy of intermediate features conditioned on a Gaussian distribution, providing more informative input for subsequent reasoning. On the other hand, SLKA extends the receptive field of SR models with the assistance of channel shifting, which also favors to boost the diversity of hierarchical features. Since the implementation of EA and SLKA does not involve complex computations (such as extensive matrix multiplications), the proposed method can achieve faster nonlinear inference than Transformer-based SR models while maintaining better SR performance. Extensive experiments show that the proposed model can significantly reduce the delay of model inference while achieving the SR performance comparable with other advanced models.

* Accepted to ACM MM 2024

Via

Access Paper or Ask Questions