The vulnerability in the algorithm supply chain of deep learning has imposed new challenges to image retrieval systems in the downstream. Among a variety of techniques, deep hashing is gaining popularity. As it inherits the algorithmic backend from deep learning, a handful of attacks are recently proposed to disrupt normal image retrieval. Unfortunately, the defense strategies in softmax classification are not readily available to be applied in the image retrieval domain. In this paper, we propose an efficient and unsupervised scheme to identify unique adversarial behaviors in the hamming space. In particular, we design three criteria from the perspectives of hamming distance, quantization loss and denoising to defend against both untargeted and targeted attacks, which collectively limit the adversarial space. The extensive experiments on four datasets demonstrate 2-23% improvements of detection rates with minimum computational overhead for real-time image queries.
To generate high quality rendering images for real time applications, it is often to trace only a few samples-per-pixel (spp) at a lower resolution and then supersample to the high resolution. Based on the observation that the rendered pixels at a low resolution are typically highly aliased, we present a novel method for neural supersampling based on ray tracing 1/4-spp samples at the high resolution. Our key insight is that the ray-traced samples at the target resolution are accurate and reliable, which makes the supersampling an interpolation problem. We present a mask-reinforced neural network to reconstruct and interpolate high-quality image sequences. First, a novel temporal accumulation network is introduced to compute the correlation between current and previous features to significantly improve their temporal stability. Then a reconstruct network based on a multi-scale U-Net with skip connections is adopted for reconstruction and generation of the desired high-resolution image. Experimental results and comparisons have shown that our proposed method can generate higher quality results of supersampling, without increasing the total number of ray-tracing samples, over current state-of-the-art methods.
Panoptic Narrative Grounding (PNG) is an emerging cross-modal grounding task, which locates the target regions of an image corresponding to the text description. Existing approaches for PNG are mainly based on a two-stage paradigm, which is computationally expensive. In this paper, we propose a one-stage network for real-time PNG, termed End-to-End Panoptic Narrative Grounding network (EPNG), which directly generates masks for referents. Specifically, we propose two innovative designs, i.e., Locality-Perceptive Attention (LPA) and a bidirectional Semantic Alignment Loss (SAL), to properly handle the many-to-many relationship between textual expressions and visual objects. LPA embeds the local spatial priors into attention modeling, i.e., a pixel may belong to multiple masks at different scales, thereby improving segmentation. To help understand the complex semantic relationships, SAL proposes a bidirectional contrastive objective to regularize the semantic consistency inter modalities. Extensive experiments on the PNG benchmark dataset demonstrate the effectiveness and efficiency of our method. Compared to the single-stage baseline, our method achieves a significant improvement of up to 9.4% accuracy. More importantly, our EPNG is 10 times faster than the two-stage model. Meanwhile, the generalization ability of EPNG is also validated by zero-shot experiments on other grounding tasks.
Serverless cloud is an innovative cloud service model that frees customers from most cloud management duties. It also offers the same advantages as other cloud models but at much lower costs. As a result, the serverless cloud has been increasingly employed in high-impact areas such as system security, banking, and health care. A big threat to the serverless cloud's performance is cold-start, which is when the time of provisioning the needed cloud resource to serve customers' requests incurs unacceptable costs to the service providers and/or the customers. This paper proposes a novel low-coupling, high-cohesion ensemble policy that addresses the cold-start problem at infrastructure- and function-levels of the serverless cloud stack, while the state of the art policies have a more narrowed focus. This ensemble policy anchors on the prediction of function instance arrivals, 10 to 15 minutes into the future. It is achievable by using the temporal convolutional network (TCN) deep-learning method. Bench-marking results on a real-world dataset from a large-scale serverless cloud provider show that TCN out-performs other popular machine learning algorithms for time series. Going beyond cold-start management, the proposed policy and publicly available codes can be adopted in solving other cloud problems such as optimizing the provisioning of virtual software-defined network assets.
This paper proposes a new 3D gas distribution mapping technique based on the local message passing of Gaussian belief propagation that is capable of resolving in real time, concentration estimates in 3D space whilst accounting for the obstacle information within the scenario, the first of its kind in the literature. The gas mapping problem is formulated as a 3D factor graph of Gaussian potentials, the connections of which are conditioned on local occupancy values. The Gaussian belief propagation framework is introduced as the solver and a new hybrid message scheduler is introduced to increase the rate of convergence. The factor graph problem is then redesigned as a dynamically expanding inference task, coupling the information of consecutive gas measurements with local spatial structure obtained by the robot. The proposed algorithm is compared to the state of the art methods in 2D and 3D simulations and is found to resolve distribution maps orders of magnitude quicker than typical direct solvers. The proposed framework is then deployed for the first time onboard a ground robot in a 3D mapping and exploration task. The system is shown to be able to resolve multiple sensor inputs and output high resolution 3D gas distribution maps in a GPS denied cluttered scenario in real time. This online inference of complicated plume structures provides a new layer of contextual information over its 2D counterparts and enables autonomous systems to take advantage of real time estimates to inform potential next best sampling locations.
A deep neural network solution for time-scale modification (TSM) focused on large stretching factors is proposed, targeting environmental sounds. Traditional TSM artifacts such as transient smearing, loss of presence, and phasiness are heavily accentuated and cause poor audio quality when the TSM factor is four or larger. The weakness of established TSM methods, often based on a phase vocoder structure, lies in the poor description and scaling of the transient and noise components, or nuances, of a sound. Our novel solution combines a sines-transients-noise decomposition with an independent WaveNet synthesizer to provide a better description of the noise component and an improve sound quality for large stretching factors. Results of a subjective listening test against four other TSM algorithms are reported, showing the proposed method to be often superior. The proposed method is stereo compatible and has a wide range of applications related to the slow motion of media content.
The accuracy and complexity of machine learning algorithms based on kernel optimization are determined by the set of kernels over which they are able to optimize. An ideal set of kernels should: admit a linear parameterization (for tractability); be dense in the set of all kernels (for robustness); be universal (for accuracy). Recently, a framework was proposed for using positive matrices to parameterize a class of positive semi-separable kernels. Although this class can be shown to meet all three criteria, previous algorithms for optimization of such kernels were limited to classification and furthermore relied on computationally complex Semidefinite Programming (SDP) algorithms. In this paper, we pose the problem of learning semiseparable kernels as a minimax optimization problem and propose a SVD-QCQP primal-dual algorithm which dramatically reduces the computational complexity as compared with previous SDP-based approaches. Furthermore, we provide an efficient implementation of this algorithm for both classification and regression -- an implementation which enables us to solve problems with 100 features and up to 30,000 datums. Finally, when applied to benchmark data, the algorithm demonstrates the potential for significant improvement in accuracy over typical (but non-convex) approaches such as Neural Nets and Random Forest with similar or better computation time.
Guiding robots can not only detect close-range obstacles like other guiding tools, but also extend its range to perceive the environment when making decisions. However, most existing works over-simplified the interaction between human agents and robots, ignoring the differences between individuals, resulting in poor experiences for different users. To solve the problem, we propose a data-driven guiding system to cope with the effect brighten by individual differences. In our guiding system, we design a Human Motion Predictor (HMP) and a Robot Dynamics Model (RDM) based on deep neural network, the time convolutional network (TCN) is verified to have the best performance, to predict differences in interaction between different human agents and robots. To train our models, we collected datasets that records the interactions from different human agents. Moreover, given the predictive information of the specific user, we propose a waypoints selector that allows the robot to naturally adapt to the user's state changes, which are mainly reflected in the walking speed. We compare the performance of our models with previous works and achieve significant performance improvements. On this basis, our guiding system demonstrated good adaptability to different human agents. Our guiding system is deployed on a real quadruped robot to verify the practicability.
Inverse Reinforcement Learning (IRL) is a powerful set of techniques for imitation learning that aims to learn a reward function that rationalizes expert demonstrations. Unfortunately, traditional IRL methods suffer from a computational weakness: they require repeatedly solving a hard reinforcement learning (RL) problem as a subroutine. This is counter-intuitive from the viewpoint of reductions: we have reduced the easier problem of imitation learning to repeatedly solving the harder problem of RL. Another thread of work has proved that access to the side-information of the distribution of states where a strong policy spends time can dramatically reduce the sample and computational complexities of solving an RL problem. In this work, we demonstrate for the first time a more informed imitation learning reduction where we utilize the state distribution of the expert to alleviate the global exploration component of the RL subroutine, providing an exponential speedup in theory. In practice, we find that we are able to significantly speed up the prior art on continuous control tasks.
Replanning in temporal logic tasks is extremely difficult during the online execution of robots. This study introduces an effective path planner that computes solutions for temporal logic goals and instantly adapts to non-static and partially unknown environments. Given prior knowledge and a task specification, the planner first identifies an initial feasible solution by growing a sampling-based search tree. While carrying out the computed plan, the robot maintains a solution library to continuously enhance the unfinished part of the plan and store backup plans. The planner updates existing plans when meeting unexpected obstacles or recognizing flaws in prior knowledge. Upon a high-level path is obtained, a trajectory generator tracks the path by dividing it into segments of motion primitives. Our planner is integrated into an autonomous mobile robot system, further deployed on a multicopter with limited onboard processing power. In simulation and real-world experiments, our planner is demonstrated to swiftly and effectively adjust to environmental uncertainties.