Real-world image denoising is a practical image restoration problem that aims to obtain clean images from in-the-wild noisy input. Recently, Vision Transformer (ViT) exhibits a strong ability to capture long-range dependencies and many researchers attempt to apply ViT to image denoising tasks. However, real-world image is an isolated frame that makes the ViT build the long-range dependencies on the internal patches, which divides images into patches and disarranges the noise pattern and gradient continuity. In this article, we propose to resolve this issue by using a continuous Wavelet Sliding-Transformer that builds frequency correspondence under real-world scenes, called DnSwin. Specifically, we first extract the bottom features from noisy input images by using a CNN encoder. The key to DnSwin is to separate high-frequency and low-frequency information from the features and build frequency dependencies. To this end, we propose Wavelet Sliding-Window Transformer that utilizes discrete wavelet transform, self-attention and inverse discrete wavelet transform to extract deep features. Finally, we reconstruct the deep features into denoised images using a CNN decoder. Both quantitative and qualitative evaluations on real-world denoising benchmarks demonstrate that the proposed DnSwin performs favorably against the state-of-the-art methods.
Movie graphs play an important role to bridge heterogenous modalities of videos and texts in human-centric retrieval. In this work, we propose Graph Wasserstein Correlation Analysis (GWCA) to deal with the core issue therein, i.e, cross heterogeneous graph comparison. Spectral graph filtering is introduced to encode graph signals, which are then embedded as probability distributions in a Wasserstein space, called graph Wasserstein metric learning. Such a seamless integration of graph signal filtering together with metric learning results in a surprise consistency on both learning processes, in which the goal of metric learning is just to optimize signal filters or vice versa. Further, we derive the solution of the graph comparison model as a classic generalized eigenvalue decomposition problem, which has an exactly closed-form solution. Finally, GWCA together with movie/text graphs generation are unified into the framework of movie retrieval to evaluate our proposed method. Extensive experiments on MovieGrpahs dataset demonstrate the effectiveness of our GWCA as well as the entire framework.
In this work, we address semi-supervised classification of graph data, where the categories of those unlabeled nodes are inferred from labeled nodes as well as graph structures. Recent works often solve this problem via advanced graph convolution in a conventionally supervised manner, but the performance could degrade significantly when labeled data is scarce. To this end, we propose a Graph Inference Learning (GIL) framework to boost the performance of semi-supervised node classification by learning the inference of node labels on graph topology. To bridge the connection between two nodes, we formally define a structure relation by encapsulating node attributes, between-node paths, and local topological structures together, which can make the inference conveniently deduced from one node to another node. For learning the inference process, we further introduce meta-optimization on structure relations from training nodes to validation nodes, such that the learnt graph inference capability can be better self-adapted to testing nodes. Comprehensive evaluations on four benchmark datasets (including Cora, Citeseer, Pubmed, and NELL) demonstrate the superiority of our proposed GIL when compared against state-of-the-art methods on the semi-supervised node classification task.