Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chia-Wen Lin

Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution

May 08, 2024

Yi Xiao, Qiangqiang Yuan, Kui Jiang, Yuzeng Chen, Qiang Zhang, Chia-Wen Lin

Figure 1 for Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution

Figure 2 for Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution

Figure 3 for Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution

Abstract:Recent progress in remote sensing image (RSI) super-resolution (SR) has exhibited remarkable performance using deep neural networks, e.g., Convolutional Neural Networks and Transformers. However, existing SR methods often suffer from either a limited receptive field or quadratic computational overhead, resulting in sub-optimal global representation and unacceptable computational costs in large-scale RSI. To alleviate these issues, we develop the first attempt to integrate the Vision State Space Model (Mamba) for RSI-SR, which specializes in processing large-scale RSI by capturing long-range dependency with linear complexity. To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR, to explore the spatial and frequent correlations. In particular, our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM) to grasp their merits for effective spatial-frequency fusion. Recognizing that global and local dependencies are complementary and both beneficial for SR, we further recalibrate these multi-level features for accurate feature fusion via learnable scaling adaptors. Extensive experiments on AID, DOTA, and DIOR benchmarks demonstrate that our FMSR outperforms state-of-the-art Transformer-based methods HAT-L in terms of PSNR by 0.11 dB on average, while consuming only 28.05% and 19.08% of its memory consumption and complexity, respectively.

* Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution

Via

Access Paper or Ask Questions

Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes

May 07, 2024

Ammarah Hashmi, Sahibzada Adil Shahzad, Chia-Wen Lin, Yu Tsao, Hsin-Min Wang

Figure 1 for Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes

Figure 2 for Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes

Figure 3 for Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes

Figure 4 for Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes

Abstract:The emergence of contemporary deepfakes has attracted significant attention in machine learning research, as artificial intelligence (AI) generated synthetic media increases the incidence of misinterpretation and is difficult to distinguish from genuine content. Currently, machine learning techniques have been extensively studied for automatically detecting deepfakes. However, human perception has been less explored. Malicious deepfakes could ultimately cause public and social problems. Can we humans correctly perceive the authenticity of the content of the videos we watch? The answer is obviously uncertain; therefore, this paper aims to evaluate the human ability to discern deepfake videos through a subjective study. We present our findings by comparing human observers to five state-ofthe-art audiovisual deepfake detection models. To this end, we used gamification concepts to provide 110 participants (55 native English speakers and 55 non-native English speakers) with a webbased platform where they could access a series of 40 videos (20 real and 20 fake) to determine their authenticity. Each participant performed the experiment twice with the same 40 videos in different random orders. The videos are manually selected from the FakeAVCeleb dataset. We found that all AI models performed better than humans when evaluated on the same 40 videos. The study also reveals that while deception is not impossible, humans tend to overestimate their detection capabilities. Our experimental results may help benchmark human versus machine performance, advance forensics analysis, and enable adaptive countermeasures.

Via

Access Paper or Ask Questions

PANet: A Physics-guided Parametric Augmentation Net for Image Dehazing by Hazing

Apr 14, 2024

Chih-Ling Chang, Fu-Jen Tsai, Zi-Ling Huang, Lin Gu, Chia-Wen Lin

Abstract:Image dehazing faces challenges when dealing with hazy images in real-world scenarios. A huge domain gap between synthetic and real-world haze images degrades dehazing performance in practical settings. However, collecting real-world image datasets for training dehazing models is challenging since both hazy and clean pairs must be captured under the same conditions. In this paper, we propose a Physics-guided Parametric Augmentation Network (PANet) that generates photo-realistic hazy and clean training pairs to effectively enhance real-world dehazing performance. PANet comprises a Haze-to-Parameter Mapper (HPM) to project hazy images into a parameter space and a Parameter-to-Haze Mapper (PHM) to map the resampled haze parameters back to hazy images. In the parameter space, we can pixel-wisely resample individual haze parameter maps to generate diverse hazy images with physically-explainable haze conditions unseen in the training set. Our experimental results demonstrate that PANet can augment diverse realistic hazy images to enrich existing hazy image benchmarks so as to effectively boost the performances of state-of-the-art image dehazing models.

Via

Access Paper or Ask Questions

A self-supervised CNN for image watermark removal

Mar 09, 2024

Chunwei Tian, Menghua Zheng, Tiancai Jiao, Wangmeng Zuo, Yanning Zhang, Chia-Wen Lin

Figure 1 for A self-supervised CNN for image watermark removal

Figure 2 for A self-supervised CNN for image watermark removal

Figure 3 for A self-supervised CNN for image watermark removal

Figure 4 for A self-supervised CNN for image watermark removal

Abstract:Popular convolutional neural networks mainly use paired images in a supervised way for image watermark removal. However, watermarked images do not have reference images in the real world, which results in poor robustness of image watermark removal techniques. In this paper, we propose a self-supervised convolutional neural network (CNN) in image watermark removal (SWCNN). SWCNN uses a self-supervised way to construct reference watermarked images rather than given paired training samples, according to watermark distribution. A heterogeneous U-Net architecture is used to extract more complementary structural information via simple components for image watermark removal. Taking into account texture information, a mixed loss is exploited to improve visual effects of image watermark removal. Besides, a watermark dataset is conducted. Experimental results show that the proposed SWCNN is superior to popular CNNs in image watermark removal.

Via

Access Paper or Ask Questions

Beyond Night Visibility: Adaptive Multi-Scale Fusion of Infrared and Visible Images

Mar 02, 2024

Shufan Pei, Junhong Lin, Wenxi Liu, Tiesong Zhao, Chia-Wen Lin

Figure 1 for Beyond Night Visibility: Adaptive Multi-Scale Fusion of Infrared and Visible Images

Figure 2 for Beyond Night Visibility: Adaptive Multi-Scale Fusion of Infrared and Visible Images

Figure 3 for Beyond Night Visibility: Adaptive Multi-Scale Fusion of Infrared and Visible Images

Figure 4 for Beyond Night Visibility: Adaptive Multi-Scale Fusion of Infrared and Visible Images

Abstract:In addition to low light, night images suffer degradation from light effects (e.g., glare, floodlight, etc). However, existing nighttime visibility enhancement methods generally focus on low-light regions, which neglects, or even amplifies the light effects. To address this issue, we propose an Adaptive Multi-scale Fusion network (AMFusion) with infrared and visible images, which designs fusion rules according to different illumination regions. First, we separately fuse spatial and semantic features from infrared and visible images, where the former are used for the adjustment of light distribution and the latter are used for the improvement of detection accuracy. Thereby, we obtain an image free of low light and light effects, which improves the performance of nighttime object detection. Second, we utilize detection features extracted by a pre-trained backbone that guide the fusion of semantic features. Hereby, we design a Detection-guided Semantic Fusion Module (DSFM) to bridge the domain gap between detection and semantic features. Third, we propose a new illumination loss to constrain fusion image with normal light intensity. Experimental results demonstrate the superiority of AMFusion with better visual quality and detection accuracy. The source code will be released after the peer review process.

Via

Access Paper or Ask Questions

A Heterogeneous Dynamic Convolutional Neural Network for Image Super-resolution

Feb 24, 2024

Chunwei Tian, Xuanyu Zhang, Jia Ren, Wangmeng Zuo, Yanning Zhang, Chia-Wen Lin

Figure 1 for A Heterogeneous Dynamic Convolutional Neural Network for Image Super-resolution

Figure 2 for A Heterogeneous Dynamic Convolutional Neural Network for Image Super-resolution

Figure 3 for A Heterogeneous Dynamic Convolutional Neural Network for Image Super-resolution

Figure 4 for A Heterogeneous Dynamic Convolutional Neural Network for Image Super-resolution

Abstract:Convolutional neural networks can automatically learn features via deep network architectures and given input samples. However, robustness of obtained models may have challenges in varying scenes. Bigger differences of a network architecture are beneficial to extract more complementary structural information to enhance robustness of an obtained super-resolution model. In this paper, we present a heterogeneous dynamic convolutional network in image super-resolution (HDSRNet). To capture more information, HDSRNet is implemented by a heterogeneous parallel network. The upper network can facilitate more contexture information via stacked heterogeneous blocks to improve effects of image super-resolution. Each heterogeneous block is composed of a combination of a dilated, dynamic, common convolutional layers, ReLU and residual learning operation. It can not only adaptively adjust parameters, according to different inputs, but also prevent long-term dependency problem. The lower network utilizes a symmetric architecture to enhance relations of different layers to mine more structural information, which is complementary with a upper network for image super-resolution. The relevant experimental results show that the proposed HDSRNet is effective to deal with image resolving. The code of HDSRNet can be obtained at https://github.com/hellloxiaotian/HDSRNet.

* 11pages, 7 figures

Via

Access Paper or Ask Questions

ViStripformer: A Token-Efficient Transformer for Versatile Video Restoration

Dec 22, 2023

Fu-Jen Tsai, Yan-Tsung Peng, Chen-Yu Chang, Chan-Yu Li, Yen-Yu Lin, Chung-Chi Tsai, Chia-Wen Lin

Abstract:Video restoration is a low-level vision task that seeks to restore clean, sharp videos from quality-degraded frames. One would use the temporal information from adjacent frames to make video restoration successful. Recently, the success of the Transformer has raised awareness in the computer-vision community. However, its self-attention mechanism requires much memory, which is unsuitable for high-resolution vision tasks like video restoration. In this paper, we propose ViStripformer (Video Stripformer), which utilizes spatio-temporal strip attention to catch long-range data correlations, consisting of intra-frame strip attention (Intra-SA) and inter-frame strip attention (Inter-SA) for extracting spatial and temporal information. It decomposes video frames into strip-shaped features in horizontal and vertical directions for Intra-SA and Inter-SA to address degradation patterns with various orientations and magnitudes. Besides, ViStripformer is an effective and efficient transformer architecture with much lower memory usage than the vanilla transformer. Extensive experiments show that the proposed model achieves superior results with fast inference time on video restoration tasks, including video deblurring, demoireing, and deraining.

Via

Access Paper or Ask Questions

ID-Blau: Image Deblurring by Implicit Diffusion-based reBLurring AUgmentation

Dec 18, 2023

Jia-Hao Wu, Fu-Jen Tsai, Yan-Tsung Peng, Chung-Chi Tsai, Chia-Wen Lin, Yen-Yu Lin

Abstract:Image deblurring aims to remove undesired blurs from an image captured in a dynamic scene. Much research has been dedicated to improving deblurring performance through model architectural designs. However, there is little work on data augmentation for image deblurring. Since continuous motion causes blurred artifacts during image exposure, we aspire to develop a groundbreaking blur augmentation method to generate diverse blurred images by simulating motion trajectories in a continuous space. This paper proposes Implicit Diffusion-based reBLurring AUgmentation (ID-Blau), utilizing a sharp image paired with a controllable blur condition map to produce a corresponding blurred image. We parameterize the blur patterns of a blurred image with their orientations and magnitudes as a pixel-wise blur condition map to simulate motion trajectories and implicitly represent them in a continuous space. By sampling diverse blur conditions, ID-Blau can generate various blurred images unseen in the training set. Experimental results demonstrate that ID-Blau can produce realistic blurred images for training and thus significantly improve performance for state-of-the-art deblurring models.

Via

Access Paper or Ask Questions

AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection

Oct 19, 2023

Ammarah Hashmi, Sahibzada Adil Shahzad, Chia-Wen Lin, Yu Tsao, Hsin-Min Wang

Figure 1 for AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection

Figure 2 for AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection

Figure 3 for AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection

Figure 4 for AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection

Abstract:Forged content shared widely on social media platforms is a major social problem that requires increased regulation and poses new challenges to the research community. The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries. Most previous work on detecting AI-generated fake videos only utilizes visual modality or audio modality. While there are some methods in the literature that exploit audio and visual modalities to detect forged videos, they have not been comprehensively evaluated on multi-modal datasets of deepfake videos involving acoustic and visual manipulations. Moreover, these existing methods are mostly based on CNN and suffer from low detection accuracy. Inspired by the recent success of Transformer in various fields, to address the challenges posed by deepfake technology, in this paper, we propose an Audio-Visual Transformer-based Ensemble Network (AVTENet) framework that considers both acoustic manipulation and visual manipulation to achieve effective video forgery detection. Specifically, the proposed model integrates several purely transformer-based variants that capture video, audio, and audio-visual salient cues to reach a consensus in prediction. For evaluation, we use the recently released benchmark multi-modal audio-video FakeAVCeleb dataset. For a detailed analysis, we evaluate AVTENet, its variants, and several existing methods on multiple test sets of the FakeAVCeleb dataset. Experimental results show that our best model outperforms all existing methods and achieves state-of-the-art performance on Testset-I and Testset-II of the FakeAVCeleb dataset.

Via

Access Paper or Ask Questions

Survey on Deep Face Restoration: From Non-blind to Blind and Beyond

Sep 27, 2023

Wenjie Li, Mei Wang, Kai Zhang, Juncheng Li, Xiaoming Li, Yuhang Zhang, Guangwei Gao, Weihong Deng, Chia-Wen Lin

Figure 1 for Survey on Deep Face Restoration: From Non-blind to Blind and Beyond

Figure 2 for Survey on Deep Face Restoration: From Non-blind to Blind and Beyond

Figure 3 for Survey on Deep Face Restoration: From Non-blind to Blind and Beyond

Figure 4 for Survey on Deep Face Restoration: From Non-blind to Blind and Beyond

Abstract:Face restoration (FR) is a specialized field within image restoration that aims to recover low-quality (LQ) face images into high-quality (HQ) face images. Recent advances in deep learning technology have led to significant progress in FR methods. In this paper, we begin by examining the prevalent factors responsible for real-world LQ images and introduce degradation techniques used to synthesize LQ images. We also discuss notable benchmarks commonly utilized in the field. Next, we categorize FR methods based on different tasks and explain their evolution over time. Furthermore, we explore the various facial priors commonly utilized in the restoration process and discuss strategies to enhance their effectiveness. In the experimental section, we thoroughly evaluate the performance of state-of-the-art FR methods across various tasks using a unified benchmark. We analyze their performance from different perspectives. Finally, we discuss the challenges faced in the field of FR and propose potential directions for future advancements. The open-source repository corresponding to this work can be found at https:// github.com/ 24wenjie-li/ Awesome-Face-Restoration.

* Face restoration, Survey, Deep learning, Non-blind/Blind, Joint restoration tasks, Facial priors

Via

Access Paper or Ask Questions