Learning-based image dehazing methods are essential to assist autonomous systems in enhancing reliability. Due to the domain gap between synthetic and real domains, the internal information learned from synthesized images is usually sub-optimal in real domains, leading to severe performance drop of dehaizing models. Driven by the ability on exploring internal information from a few unseen-domain samples, meta-learning is commonly adopted to address this issue via test-time training, which is hyperparameter-sensitive and time-consuming. In contrast, we present a domain generalization framework based on meta-learning to dig out representative and discriminative internal properties of real hazy domains without test-time training. To obtain representative domain-specific information, we attach two entities termed adaptation network and distance-aware aggregator to our dehazing network. The adaptation network assists in distilling domain-relevant information from a few hazy samples and caching it into a collection of features. The distance-aware aggregator strives to summarize the generated features and filter out misleading information for more representative internal properties. To enhance the discrimination of distilled internal information, we present a novel loss function called domain-relevant contrastive regularization, which encourages the internal features generated from the same domain more similar and that from diverse domains more distinct. The generated representative and discriminative features are regarded as some external variables of our dehazing network to regress a particular and powerful function for a given domain. The extensive experiments on real hazy datasets, such as RTTS and URHI, validate that our proposed method has superior generalization ability than the state-of-the-art competitors.
Visual semantic segmentation aims at separating a visual sample into diverse blocks with specific semantic attributes and identifying the category for each block, and it plays a crucial role in environmental perception. Conventional learning-based visual semantic segmentation approaches count heavily on large-scale training data with dense annotations and consistently fail to estimate accurate semantic labels for unseen categories. This obstruction spurs a craze for studying visual semantic segmentation with the assistance of few/zero-shot learning. The emergence and rapid progress of few/zero-shot visual semantic segmentation make it possible to learn unseen-category from a few labeled or zero-labeled samples, which advances the extension to practical applications. Therefore, this paper focuses on the recently published few/zero-shot visual semantic segmentation methods varying from 2D to 3D space and explores the commonalities and discrepancies of technical settlements under different segmentation circumstances. Specifically, the preliminaries on few/zero-shot visual semantic segmentation, including the problem definitions, typical datasets, and technical remedies, are briefly reviewed and discussed. Moreover, three typical instantiations are involved to uncover the interactions of few/zero-shot learning with visual semantic segmentation, including image semantic segmentation, video object segmentation, and 3D segmentation. Finally, the future challenges of few/zero-shot visual semantic segmentation are discussed.
Graph Fourier transform (GFT) is one of the fundamental tools in graph signal processing to decompose graph signals into different frequency components and to represent graph signals with strong correlation by different modes of variation effectively. The GFT on undirected graphs has been well studied and several approaches have been proposed to define GFTs on directed graphs. In this paper, based on the singular value decompositions of some graph Laplacians, we propose two GFTs on the Cartesian product graph of two directed graphs. We show that the proposed GFTs could represent spatial-temporal data sets on directed networks with strong correlation efficiently, and in the undirected graph setting they are essentially the joint GFT in the literature. In this paper, we also consider the bandlimiting procedure in the spectral domain of the proposed GFTs, and demonstrate its performance to denoise the temperature data set in the region of Brest (France) on January 2014.
Graph Fourier transform (GFT) is a fundamental concept in graph signal processing. In this paper, based on singular value decomposition of Laplacian, we introduce a novel definition of GFT on directed graphs, and use singular values of Laplacian to carry the notion of graph frequencies. % of the proposed GFT. The proposed GFT is consistent with the conventional GFT in the undirected graph setting, and on directed circulant graphs, the proposed GFT is the classical discrete Fourier transform, up to some rotation, permutation and phase adjustment. We show that frequencies and frequency components of the proposed GFT can be evaluated by solving some constrained minimization problems with low computational cost. Numerical demonstrations indicate that the proposed GFT could represent graph signals with different modes of variation efficiently.
In this paper, we consider Wiener filters to reconstruct deterministic and (wide-band) stationary graph signals from their observations corrupted by random noises, and we propose distributed algorithms to implement Wiener filters and inverse filters on networks in which agents are equipped with a data processing subsystem for limited data storage and computation power, and with a one-hop communication subsystem for direct data exchange only with their adjacent agents. The proposed distributed polynomial approximation algorithm is an exponential convergent quasi-Newton method based on Jacobi polynomial approximation and Chebyshev interpolation polynomial approximation to analytic functions on a cube. Our numerical simulations show that Wiener filtering procedure performs better on denoising (wide-band) stationary signals than the Tikhonov regularization approach does, and that the proposed polynomial approximation algorithms converge faster than the Chebyshev polynomial approximation algorithm and gradient decent algorithm do in the implementation of an inverse filtering procedure associated with a polynomial filter of commutative graph shifts.
The ability to perform aggressive movements, which are called aggressive flights, is important for quadrotors during navigation. However, aggressive quadrotor flights are still a great challenge to practical applications. The existing solutions to aggressive flights heavily rely on a predefined trajectory, which is a time-consuming preprocessing step. To avoid such path planning, we propose a curiosity-driven reinforcement learning method for aggressive flight missions and a similarity-based curiosity module is introduced to speed up the training procedure. A branch structure exploration (BSE) strategy is also applied to guarantee the robustness of the policy and to ensure the policy trained in simulations can be performed in real-world experiments directly. The experimental results in simulations demonstrate that our reinforcement learning algorithm performs well in aggressive flight tasks, speeds up the convergence process and improves the robustness of the policy. Besides, our algorithm shows a satisfactory simulated to real transferability and performs well in real-world experiments.
Monocular depth estimation is one of the fundamental tasks in environmental perception and has achieved tremendous progress in virtue of deep learning. However, the performance of trained models tends to degrade or deteriorate when employed on other new datasets due to the gap between different datasets. Though some methods utilize domain adaptation technologies to jointly train different domains and narrow the gap between them, the trained models cannot generalize to new domains that are not involved in training. To boost the transferability of depth estimation models, we propose an adversarial depth estimation task and train the model in the pipeline of meta-learning. Our proposed adversarial task mitigates the issue of meta-overfitting, since the network is trained in an adversarial manner and aims to extract domain invariant representations. In addition, we propose a constraint to impose upon cross-task depth consistency to compel the depth estimation to be identical in different adversarial tasks, which improves the performance of our method and smoothens the training process. Experiments demonstrate that our method adapts well to new datasets after few training steps during the test procedure.
In this paper, we consider networks with topologies described by some connected undirected graph ${\mathcal{G}}=(V, E)$ and with some agents (fusion centers) equipped with processing power and local peer-to-peer communication, and optimization problem $\min_{{\boldsymbol x}}\big\{F({\boldsymbol x})=\sum_{i\in V}f_i({\boldsymbol x})\big\}$ with local objective functions $f_i$ depending only on neighboring variables of the vertex $i\in V$. We introduce a divide-and-conquer algorithm to solve the above optimization problem in a distributed and decentralized manner. The proposed divide-and-conquer algorithm has exponential convergence, its computational cost is almost linear with respect to the size of the network, and it can be fully implemented at fusion centers of the network. Our numerical demonstrations also indicate that the proposed divide-and-conquer algorithm has superior performance than popular decentralized optimization methods do for the least squares problem with/without $\ell^1$ penalty.
Previous unsupervised monocular depth estimation methods mainly focus on the day-time scenario, and their frameworks are driven by warped photometric consistency. While in some challenging environments, like night, rainy night or snowy winter, the photometry of the same pixel on different frames is inconsistent because of the complex lighting and reflection, so that the day-time unsupervised frameworks cannot be directly applied to these complex scenarios. In this paper, we investigate the problem of unsupervised monocular depth estimation in certain highly complex scenarios. We address this challenging problem by using domain adaptation, and a unified image transfer-based adaptation framework is proposed based on monocular videos in this paper. The depth model trained on day-time scenarios is adapted to different complex scenarios. Instead of adapting the whole depth network, we just consider the encoder network for lower computational complexity. The depth models adapted by the proposed framework to different scenarios share the same decoder, which is practical. Constraints on both feature space and output space promote the framework to learn the key features for depth decoding, and the smoothness loss is introduced into the adaptation framework for better depth estimation performance. Extensive experiments show the effectiveness of the proposed unsupervised framework in estimating the dense depth map from the night-time, rainy night-time and snowy winter images.
Semantic segmentation and depth completion are two challenging tasks in scene understanding, and they are widely used in robotics and autonomous driving. Although several works are proposed to jointly train these two tasks using some small modifications, like changing the last layer, the result of one task is not utilized to improve the performance of the other one despite that there are some similarities between these two tasks. In this paper, we propose multi-task generative adversarial networks (Multi-task GANs), which are not only competent in semantic segmentation and depth completion, but also improve the accuracy of depth completion through generated semantic images. In addition, we improve the details of generated semantic images based on CycleGAN by introducing multi-scale spatial pooling blocks and the structural similarity reconstruction loss. Furthermore, considering the inner consistency between semantic and geometric structures, we develop a semantic-guided smoothness loss to improve depth completion results. Extensive experiments on Cityscapes dataset and KITTI depth completion benchmark show that the Multi-task GANs are capable of achieving competitive performance for both semantic segmentation and depth completion tasks.