Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jinshan Pan

ColorMNet: A Memory-based Deep Spatial-Temporal Feature Propagation Network for Video Colorization

Apr 09, 2024

Yixin Yang, Jiangxin Dong, Jinhui Tang, Jinshan Pan

Abstract:How to effectively explore spatial-temporal features is important for video colorization. Instead of stacking multiple frames along the temporal dimension or recurrently propagating estimated features that will accumulate errors or cannot explore information from far-apart frames, we develop a memory-based feature propagation module that can establish reliable connections with features from far-apart frames and alleviate the influence of inaccurately estimated features. To extract better features from each frame for the above-mentioned feature propagation, we explore the features from large-pretrained visual models to guide the feature estimation of each frame so that the estimated features can model complex scenarios. In addition, we note that adjacent frames usually contain similar contents. To explore this property for better spatial and temporal feature utilization, we develop a local attention module to aggregate the features from adjacent frames in a spatial-temporal neighborhood. We formulate our memory-based feature propagation module, large-pretrained visual model guided feature estimation module, and local attention module into an end-to-end trainable network (named ColorMNet) and show that it performs favorably against state-of-the-art methods on both the benchmark datasets and real-world scenarios. The source code and pre-trained models will be available at \url{https://github.com/yyang181/colormnet}.

* Project website: \url{https://github.com/yyang181/colormnet}

Via

Access Paper or Ask Questions

Mansformer: Efficient Transformer of Mixed Attention for Image Deblurring and Beyond

Apr 09, 2024

Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien, Ming-Hsuan Yang

Figure 1 for Mansformer: Efficient Transformer of Mixed Attention for Image Deblurring and Beyond

Figure 2 for Mansformer: Efficient Transformer of Mixed Attention for Image Deblurring and Beyond

Figure 3 for Mansformer: Efficient Transformer of Mixed Attention for Image Deblurring and Beyond

Figure 4 for Mansformer: Efficient Transformer of Mixed Attention for Image Deblurring and Beyond

Abstract:Transformer has made an enormous success in natural language processing and high-level vision over the past few years. However, the complexity of self-attention is quadratic to the image size, which makes it infeasible for high-resolution vision tasks. In this paper, we propose the Mansformer, a Transformer of mixed attention that combines multiple self-attentions, gate, and multi-layer perceptions (MLPs), to explore and employ more possibilities of self-attention. Taking efficiency into account, we design four kinds of self-attention, whose complexities are all linear. By elaborate adjustment of the tensor shapes and dimensions for the dot product, we split the typical self-attention of quadratic complexity into four operations of linear complexity. To adaptively merge these different kinds of self-attention, we take advantage of an architecture similar to Squeeze-and-Excitation Networks. Furthermore, we make it to merge the two-staged Transformer design into one stage by the proposed gated-dconv MLP. Image deblurring is our main target, while extensive quantitative and qualitative evaluations show that this method performs favorably against the state-of-the-art methods far more than simply deblurring. The source codes and trained models will be made available to the public.

Via

Access Paper or Ask Questions

Collaborative Feedback Discriminative Propagation for Video Super-Resolution

Apr 06, 2024

Hao Li, Xiang Chen, Jiangxin Dong, Jinhui Tang, Jinshan Pan

Figure 1 for Collaborative Feedback Discriminative Propagation for Video Super-Resolution

Figure 2 for Collaborative Feedback Discriminative Propagation for Video Super-Resolution

Abstract:The key success of existing video super-resolution (VSR) methods stems mainly from exploring spatial and temporal information, which is usually achieved by a recurrent propagation module with an alignment module. However, inaccurate alignment usually leads to aligned features with significant artifacts, which will be accumulated during propagation and thus affect video restoration. Moreover, propagation modules only propagate the same timestep features forward or backward that may fail in case of complex motion or occlusion, limiting their performance for high-quality frame restoration. To address these issues, we propose a collaborative feedback discriminative (CFD) method to correct inaccurate aligned features and model long -range spatial and temporal information for better video reconstruction. In detail, we develop a discriminative alignment correction (DAC) method to adaptively explore information and reduce the influences of the artifacts caused by inaccurate alignment. Then, we propose a collaborative feedback propagation (CFP) module that employs feedback and gating mechanisms to better explore spatial and temporal information of different timestep features from forward and backward propagation simultaneously. Finally, we embed the proposed DAC and CFP into commonly used VSR networks to verify the effectiveness of our method. Quantitative and qualitative experiments on several benchmarks demonstrate that our method can improve the performance of existing VSR models while maintaining a lower model complexity. The source code and pre-trained models will be available at \url{https://github.com/House-Leo/CFDVSR}.

* Project website: https://github.com/House-Leo/CFDVSR

Via

Access Paper or Ask Questions

Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining

Apr 02, 2024

Xiang Chen, Jinshan Pan, Jiangxin Dong

Abstract:How to effectively explore multi-scale representations of rain streaks is important for image deraining. In contrast to existing Transformer-based methods that depend mostly on single-scale rain appearance, we develop an end-to-end multi-scale Transformer that leverages the potentially useful features in various scales to facilitate high-quality image reconstruction. To better explore the common degradation representations from spatially-varying rain streaks, we incorporate intra-scale implicit neural representations based on pixel coordinates with the degraded inputs in a closed-loop design, enabling the learned features to facilitate rain removal and improve the robustness of the model in complex scenarios. To ensure richer collaborative representation from different scales, we embed a simple yet effective inter-scale bidirectional feedback operation into our multi-scale Transformer by performing coarse-to-fine and fine-to-coarse information communication. Extensive experiments demonstrate that our approach, named as NeRD-Rain, performs favorably against the state-of-the-art ones on both synthetic and real-world benchmark datasets. The source code and trained models are available at https://github.com/cschenxiang/NeRD-Rain.

* CVPR 2024
* Project website: https://github.com/cschenxiang/NeRD-Rain

Via

Access Paper or Ask Questions

Look-Around Before You Leap: High-Frequency Injected Transformer for Image Restoration

Mar 30, 2024

Shihao Zhou, Duosheng Chen, Jinshan Pan, Jufeng Yang

Figure 1 for Look-Around Before You Leap: High-Frequency Injected Transformer for Image Restoration

Figure 2 for Look-Around Before You Leap: High-Frequency Injected Transformer for Image Restoration

Figure 3 for Look-Around Before You Leap: High-Frequency Injected Transformer for Image Restoration

Figure 4 for Look-Around Before You Leap: High-Frequency Injected Transformer for Image Restoration

Abstract:Transformer-based approaches have achieved superior performance in image restoration, since they can model long-term dependencies well. However, the limitation in capturing local information restricts their capacity to remove degradations. While existing approaches attempt to mitigate this issue by incorporating convolutional operations, the core component in Transformer, i.e., self-attention, which serves as a low-pass filter, could unintentionally dilute or even eliminate the acquired local patterns. In this paper, we propose HIT, a simple yet effective High-frequency Injected Transformer for image restoration. Specifically, we design a window-wise injection module (WIM), which incorporates abundant high-frequency details into the feature map, to provide reliable references for restoring high-quality images. We also develop a bidirectional interaction module (BIM) to aggregate features at different scales using a mutually reinforced paradigm, resulting in spatially and contextually improved representations. In addition, we introduce a spatial enhancement unit (SEU) to preserve essential spatial relationships that may be lost due to the computations carried out across channel dimensions in the BIM. Extensive experiments on 9 tasks (real noise, real rain streak, raindrop, motion blur, moir\'e, shadow, snow, haze, and low-light condition) demonstrate that HIT with linear computational complexity performs favorably against the state-of-the-art methods. The source code and pre-trained models will be available at https://github.com/joshyZhou/HIT.

* 19 pages, 7 figures

Via

Access Paper or Ask Questions

Harmonizing Light and Darkness: A Symphony of Prior-guided Data Synthesis and Adaptive Focus for Nighttime Flare Removal

Mar 30, 2024

Lishen Qu, Shihao Zhou, Jinshan Pan, Jinglei Shi, Duosheng Chen, Jufeng Yang

Figure 1 for Harmonizing Light and Darkness: A Symphony of Prior-guided Data Synthesis and Adaptive Focus for Nighttime Flare Removal

Figure 2 for Harmonizing Light and Darkness: A Symphony of Prior-guided Data Synthesis and Adaptive Focus for Nighttime Flare Removal

Figure 3 for Harmonizing Light and Darkness: A Symphony of Prior-guided Data Synthesis and Adaptive Focus for Nighttime Flare Removal

Figure 4 for Harmonizing Light and Darkness: A Symphony of Prior-guided Data Synthesis and Adaptive Focus for Nighttime Flare Removal

Abstract:Intense light sources often produce flares in captured images at night, which deteriorates the visual quality and negatively affects downstream applications. In order to train an effective flare removal network, a reliable dataset is essential. The mainstream flare removal datasets are semi-synthetic to reduce human labour, but these datasets do not cover typical scenarios involving multiple scattering flares. To tackle this issue, we synthesize a prior-guided dataset named Flare7K*, which contains multi-flare images where the brightness of flares adheres to the laws of illumination. Besides, flares tend to occupy localized regions of the image but existing networks perform flare removal on the entire image and sometimes modify clean areas incorrectly. Therefore, we propose a plug-and-play Adaptive Focus Module (AFM) that can adaptively mask the clean background areas and assist models in focusing on the regions severely affected by flares. Extensive experiments demonstrate that our data synthesis method can better simulate real-world scenes and several models equipped with AFM achieve state-of-the-art performance on the real-world test dataset.

Via

Access Paper or Ask Questions

Spread Your Wings: A Radial Strip Transformer for Image Deblurring

Mar 30, 2024

Duosheng Chen, Shihao Zhou, Jinshan Pan, Jinglei Shi, Lishen Qu, Jufeng Yang

Figure 1 for Spread Your Wings: A Radial Strip Transformer for Image Deblurring

Figure 2 for Spread Your Wings: A Radial Strip Transformer for Image Deblurring

Figure 3 for Spread Your Wings: A Radial Strip Transformer for Image Deblurring

Figure 4 for Spread Your Wings: A Radial Strip Transformer for Image Deblurring

Abstract:Exploring motion information is important for the motion deblurring task. Recent the window-based transformer approaches have achieved decent performance in image deblurring. Note that the motion causing blurry results is usually composed of translation and rotation movements and the window-shift operation in the Cartesian coordinate system by the window-based transformer approaches only directly explores translation motion in orthogonal directions. Thus, these methods have the limitation of modeling the rotation part. To alleviate this problem, we introduce the polar coordinate-based transformer, which has the angles and distance to explore rotation motion and translation information together. In this paper, we propose a Radial Strip Transformer (RST), which is a transformer-based architecture that restores the blur images in a polar coordinate system instead of a Cartesian one. RST contains a dynamic radial embedding module (DRE) to extract the shallow feature by a radial deformable convolution. We design a polar mask layer to generate the offsets for the deformable convolution, which can reshape the convolution kernel along the radius to better capture the rotation motion information. Furthermore, we proposed a radial strip attention solver (RSAS) as deep feature extraction, where the relationship of windows is organized by azimuth and radius. This attention module contains radial strip windows to reweight image features in the polar coordinate, which preserves more useful information in rotation and translation motion together for better recovering the sharp images. Experimental results on six synthesis and real-world datasets prove that our method performs favorably against other SOTA methods for the image deblurring task.

Via

Access Paper or Ask Questions

Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration

Mar 30, 2024

Shihao Zhou, Jinshan Pan, Jinglei Shi, Duosheng Chen, Lishen Qu, Jufeng Yang

Figure 1 for Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration

Figure 2 for Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration

Figure 3 for Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration

Figure 4 for Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration

Abstract:How to explore useful features from images as prompts to guide the deep image restoration models is an effective way to solve image restoration. In contrast to mining spatial relations within images as prompt, which leads to characteristics of different frequencies being neglected and further remaining subtle or undetectable artifacts in the restored image, we develop a Frequency Prompting image restoration method, dubbed FPro, which can effectively provide prompt components from a frequency perspective to guild the restoration model address these differences. Specifically, we first decompose input features into separate frequency parts via dynamically learned filters, where we introduce a gating mechanism for suppressing the less informative elements within the kernels. To propagate useful frequency information as prompt, we then propose a dual prompt block, consisting of a low-frequency prompt modulator (LPM) and a high-frequency prompt modulator (HPM), to handle signals from different bands respectively. Each modulator contains a generation process to incorporate prompting components into the extracted frequency maps, and a modulation part that modifies the prompt feature with the guidance of the decoder features. Experimental results on commonly used benchmarks have demonstrated the favorable performance of our pipeline against SOTA methods on 5 image restoration tasks, including deraining, deraindrop, demoir\'eing, deblurring, and dehazing. The source code and pre-trained models will be available at https://github.com/joshyZhou/FPro.

* 18 pages, 10 figrues

Via

Access Paper or Ask Questions

How Powerful Potential of Attention on Image Restoration?

Mar 15, 2024

Cong Wang, Jinshan Pan, Yeying Jin, Liyan Wang, Wei Wang, Gang Fu, Wenqi Ren, Xiaochun Cao

Figure 1 for How Powerful Potential of Attention on Image Restoration?

Figure 2 for How Powerful Potential of Attention on Image Restoration?

Figure 3 for How Powerful Potential of Attention on Image Restoration?

Figure 4 for How Powerful Potential of Attention on Image Restoration?

Abstract:Transformers have demonstrated their effectiveness in image restoration tasks. Existing Transformer architectures typically comprise two essential components: multi-head self-attention and feed-forward network (FFN). The former captures long-range pixel dependencies, while the latter enables the model to learn complex patterns and relationships in the data. Previous studies have demonstrated that FFNs are key-value memories \cite{geva2020transformer}, which are vital in modern Transformer architectures. In this paper, we conduct an empirical study to explore the potential of attention mechanisms without using FFN and provide novel structures to demonstrate that removing FFN is flexible for image restoration. Specifically, we propose Continuous Scaling Attention (\textbf{CSAttn}), a method that computes attention continuously in three stages without using FFN. To achieve competitive performance, we propose a series of key components within the attention. Our designs provide a closer look at the attention mechanism and reveal that some simple operations can significantly affect the model performance. We apply our \textbf{CSAttn} to several image restoration tasks and show that our model can outperform CNN-based and Transformer-based image restoration approaches.

Via

Access Paper or Ask Questions

Scene Prior Filtering for Depth Map Super-Resolution

Feb 23, 2024

Zhengxue Wang, Zhiqiang Yan, Ming-Hsuan Yang, Jinshan Pan, Jian Yang, Ying Tai, Guangwei Gao

Figure 1 for Scene Prior Filtering for Depth Map Super-Resolution

Figure 2 for Scene Prior Filtering for Depth Map Super-Resolution

Figure 3 for Scene Prior Filtering for Depth Map Super-Resolution

Figure 4 for Scene Prior Filtering for Depth Map Super-Resolution

Abstract:Multi-modal fusion is vital to the success of super-resolution of depth maps. However, commonly used fusion strategies, such as addition and concatenation, fall short of effectively bridging the modal gap. As a result, guided image filtering methods have been introduced to mitigate this issue. Nevertheless, it is observed that their filter kernels usually encounter significant texture interference and edge inaccuracy. To tackle these two challenges, we introduce a Scene Prior Filtering network, SPFNet, which utilizes the priors surface normal and semantic map from large-scale models. Specifically, we design an All-in-one Prior Propagation that computes the similarity between multi-modal scene priors, i.e., RGB, normal, semantic, and depth, to reduce the texture interference. In addition, we present a One-to-one Prior Embedding that continuously embeds each single-modal prior into depth using Mutual Guided Filtering, further alleviating the texture interference while enhancing edges. Our SPFNet has been extensively evaluated on both real and synthetic datasets, achieving state-of-the-art performance.

* 14 pages

Via

Access Paper or Ask Questions