Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Image": models, code, and papers

Towards Query-Efficient Black-Box Adversary with Zeroth-Order Natural Gradient Descent

Feb 18, 2020
Pu Zhao, Pin-Yu Chen, Siyue Wang, Xue Lin

Figure 1 for Towards Query-Efficient Black-Box Adversary with Zeroth-Order Natural Gradient Descent

Figure 2 for Towards Query-Efficient Black-Box Adversary with Zeroth-Order Natural Gradient Descent

Figure 3 for Towards Query-Efficient Black-Box Adversary with Zeroth-Order Natural Gradient Descent

Figure 4 for Towards Query-Efficient Black-Box Adversary with Zeroth-Order Natural Gradient Descent

Despite the great achievements of the modern deep neural networks (DNNs), the vulnerability/robustness of state-of-the-art DNNs raises security concerns in many application domains requiring high reliability. Various adversarial attacks are proposed to sabotage the learning performance of DNN models. Among those, the black-box adversarial attack methods have received special attentions owing to their practicality and simplicity. Black-box attacks usually prefer less queries in order to maintain stealthy and low costs. However, most of the current black-box attack methods adopt the first-order gradient descent method, which may come with certain deficiencies such as relatively slow convergence and high sensitivity to hyper-parameter settings. In this paper, we propose a zeroth-order natural gradient descent (ZO-NGD) method to design the adversarial attacks, which incorporates the zeroth-order gradient estimation technique catering to the black-box attack scenario and the second-order natural gradient descent to achieve higher query efficiency. The empirical evaluations on image classification datasets demonstrate that ZO-NGD can obtain significantly lower model query complexities compared with state-of-the-art attack methods.

* accepted by AAAI 2020

Via

Access Paper or Ask Questions

A PCA-Based Super-Resolution Algorithm for Short Image Sequences

Jan 18, 2012
Carlos Miravet, Francisco B. Rodríguez

Figure 1 for A PCA-Based Super-Resolution Algorithm for Short Image Sequences

Figure 2 for A PCA-Based Super-Resolution Algorithm for Short Image Sequences

Figure 3 for A PCA-Based Super-Resolution Algorithm for Short Image Sequences

Figure 4 for A PCA-Based Super-Resolution Algorithm for Short Image Sequences

In this paper, we present a novel, learning-based, two-step super-resolution (SR) algorithm well suited to solve the specially demanding problem of obtaining SR estimates from short image sequences. The first step, devoted to increase the sampling rate of the incoming images, is performed by fitting linear combinations of functions generated from principal components (PC) to reproduce locally the sparse projected image data, and using these models to estimate image values at nodes of the high-resolution grid. PCs were obtained from local image patches sampled at sub-pixel level, which were generated in turn from a database of high-resolution images by application of a physically realistic observation model. Continuity between local image models is enforced by minimizing an adequate functional in the space of model coefficients. The second step, dealing with restoration, is performed by a linear filter with coefficients learned to restore residual interpolation artifacts in addition to low-resolution blurring, providing an effective coupling between both steps of the method. Results on a demanding five-image scanned sequence of graphics and text are presented, showing the excellent performance of the proposed method compared to several state-of-the-art two-step and Bayesian Maximum a Posteriori SR algorithms.

* 4 pages, 4 figures. A version of this work was submitted to ICIP 2010

Via

Access Paper or Ask Questions

Estimation of Muscle Fascicle Orientation in Ultrasonic Images

Dec 09, 2019
Regina Pohle-Fröhlich, Christoph Dalitz, Charlotte Richter, Benjamin Stäudle, Kirsten Albracht

Figure 1 for Estimation of Muscle Fascicle Orientation in Ultrasonic Images

Figure 2 for Estimation of Muscle Fascicle Orientation in Ultrasonic Images

Figure 3 for Estimation of Muscle Fascicle Orientation in Ultrasonic Images

Figure 4 for Estimation of Muscle Fascicle Orientation in Ultrasonic Images

We compare four different algorithms for automatically estimating the muscle fascicle angle from ultrasonic images: the vesselness filter, the Radon transform, the projection profile method and the gray level cooccurence matrix (GLCM). The algorithm results are compared to ground truth data generated by three different experts on 425 image frames from two videos recorded during different types of motion. The best agreement with the ground truth data was achieved by a combination of pre-processing with a vesselness filter and measuring the angle with the projection profile method. The robustness of the estimation is increased by applying the algorithms to subregions with high gradients and performing a LOESS fit through these estimates.

* 7 pages, 7 figures, accepted for VISAPP 2020

Via

Access Paper or Ask Questions

Visual Navigation Among Humans with Optimal Control as a Supervisor

Mar 20, 2020
Varun Tolani, Somil Bansal, Aleksandra Faust, Claire Tomlin

Figure 1 for Visual Navigation Among Humans with Optimal Control as a Supervisor

Figure 2 for Visual Navigation Among Humans with Optimal Control as a Supervisor

Figure 3 for Visual Navigation Among Humans with Optimal Control as a Supervisor

Figure 4 for Visual Navigation Among Humans with Optimal Control as a Supervisor

Real world navigation requires robots to operate in unfamiliar, dynamic environments, sharing spaces with humans. Navigating around humans is especially difficult because it requires predicting their future motion, which can be quite challenging. We propose a novel framework for navigation around humans which combines learning-based perception with model-based optimal control. Specifically, we train a Convolutional Neural Network (CNN)-based perception module which maps the robot's visual inputs to a waypoint, or next desired state. This waypoint is then input into planning and control modules which convey the robot safely and efficiently to the goal. To train the CNN we contribute a photo-realistic bench-marking dataset for autonomous robot navigation in the presence of humans. The CNN is trained using supervised learning on images rendered from our photo-realistic dataset. The proposed framework learns to anticipate and react to peoples' motion based only on a monocular RGB image, without explicitly predicting future human motion. Our method generalizes well to unseen buildings and humans in both simulation and real world environments. Furthermore, our experiments demonstrate that combining model-based control and learning leads to better and more data-efficient navigational behaviors as compared to a purely learning based approach. Videos describing our approach and experiments are available on the project website.

* Project Website: https://smlbansal.github.io/LB-WayPtNav-DH/

Via

Access Paper or Ask Questions

Learning Deep Analysis Dictionaries -- Part II: Convolutional Dictionaries

Jan 31, 2020
Jun-Jie Huang, Pier Luigi Dragotti

Figure 1 for Learning Deep Analysis Dictionaries -- Part II: Convolutional Dictionaries

Figure 2 for Learning Deep Analysis Dictionaries -- Part II: Convolutional Dictionaries

Figure 3 for Learning Deep Analysis Dictionaries -- Part II: Convolutional Dictionaries

Figure 4 for Learning Deep Analysis Dictionaries -- Part II: Convolutional Dictionaries

In this paper, we introduce a Deep Convolutional Analysis Dictionary Model (DeepCAM) by learning convolutional dictionaries instead of unstructured dictionaries as in the case of deep analysis dictionary model introduced in the companion paper. Convolutional dictionaries are more suitable for processing high-dimensional signals like for example images and have only a small number of free parameters. By exploiting the properties of a convolutional dictionary, we present an efficient convolutional analysis dictionary learning approach. A L-layer DeepCAM consists of L layers of convolutional analysis dictionary and element-wise soft-thresholding pairs and a single layer of convolutional synthesis dictionary. Similar to DeepAM, each convolutional analysis dictionary is composed of a convolutional Information Preserving Analysis Dictionary (IPAD) and a convolutional Clustering Analysis Dictionary (CAD). The IPAD and the CAD are learned using variations of the proposed learning algorithm. We demonstrate that DeepCAM is an effective multilayer convolutional model and, on single image super-resolution, achieves performance comparable with other methods while also showing good generalization capabilities.

Via

Access Paper or Ask Questions

Novel View Synthesis of Dynamic Scenes with Globally Coherent Depths from a Monocular Camera

Apr 02, 2020
Jae Shin Yoon, Kihwan Kim, Orazio Gallo, Hyun Soo Park, Jan Kautz

Figure 1 for Novel View Synthesis of Dynamic Scenes with Globally Coherent Depths from a Monocular Camera

Figure 2 for Novel View Synthesis of Dynamic Scenes with Globally Coherent Depths from a Monocular Camera

Figure 3 for Novel View Synthesis of Dynamic Scenes with Globally Coherent Depths from a Monocular Camera

Figure 4 for Novel View Synthesis of Dynamic Scenes with Globally Coherent Depths from a Monocular Camera

This paper presents a new method to synthesize an image from arbitrary views and times given a collection of images of a dynamic scene. A key challenge for the novel view synthesis arises from dynamic scene reconstruction where epipolar geometry does not apply to the local motion of dynamic contents. To address this challenge, we propose to combine the depth from single view (DSV) and the depth from multi-view stereo (DMV), where DSV is complete, i.e., a depth is assigned to every pixel, yet view-variant in its scale, while DMV is view-invariant yet incomplete. Our insight is that although its scale and quality are inconsistent with other views, the depth estimation from a single view can be used to reason about the globally coherent geometry of dynamic contents. We cast this problem as learning to correct the scale of DSV, and to refine each depth with locally consistent motions between views to form a coherent depth estimation. We integrate these tasks into a depth fusion network in a self-supervised fashion. Given the fused depth maps, we synthesize a photorealistic virtual view in a specific location and time with our deep blending network that completes the scene and renders the virtual view. We evaluate our method of depth estimation and view synthesis on diverse real-world dynamic scenes and show the outstanding performance over existing methods.

* This paper is accepted to CVPR 2020

Via

Access Paper or Ask Questions

On Automation and Medical Image Interpretation, With Applications for Laryngeal Imaging

Jan 14, 2013
H. J. Moukalled

Figure 1 for On Automation and Medical Image Interpretation, With Applications for Laryngeal Imaging

Figure 2 for On Automation and Medical Image Interpretation, With Applications for Laryngeal Imaging

Figure 3 for On Automation and Medical Image Interpretation, With Applications for Laryngeal Imaging

Figure 4 for On Automation and Medical Image Interpretation, With Applications for Laryngeal Imaging

Indeed, these are exciting times. We are in the heart of a digital renaissance. Automation and computer technology allow engineers and scientists to fabricate processes that amalgamate quality of life. We anticipate much growth in medical image interpretation and understanding, due to the influx of computer technologies. This work should serve as a guide to introduce the reader to core themes in theoretical computer science, as well as imaging applications for understanding vocal-fold vibrations. In this work, we motivate the use of automation, review some mathematical models of computation. We present a proof of a classical problem in image analysis that cannot be automated by means of algorithms. Furthermore, discuss some applications for processing medical images of the vocal folds, and discuss some of the exhilarating directions the art of automation will take vocal-fold image interpretation and quite possibly other areas of biomedical image analysis.

* 18 pages, 9 figures, 41 references

Via

Access Paper or Ask Questions

LPaintB: Learning to Paint from Self-SupervisionLPaintB: Learning to Paint from Self-Supervision

Jun 17, 2019
Biao Jia, Jonathan Brandt, Radomir Mech, Byungmoon Kim, Dinesh Manocha

Figure 1 for LPaintB: Learning to Paint from Self-SupervisionLPaintB: Learning to Paint from Self-Supervision

Figure 2 for LPaintB: Learning to Paint from Self-SupervisionLPaintB: Learning to Paint from Self-Supervision

Figure 3 for LPaintB: Learning to Paint from Self-SupervisionLPaintB: Learning to Paint from Self-Supervision

Figure 4 for LPaintB: Learning to Paint from Self-SupervisionLPaintB: Learning to Paint from Self-Supervision

We present a novel reinforcement learning-based natural media painting algorithm. Our goal is to reproduce a reference image using brush strokes and we encode the objective through observations. Our formulation takes into account that the distribution of the reward in the action space is sparse and training a reinforcement learning algorithm from scratch can be difficult. We present an approach that combines self-supervised learning and reinforcement learning to effectively transfer negative samples into positive ones and change the reward distribution. We demonstrate the benefits of our painting agent to reproduce reference images with brush strokes. The training phase takes about one hour and the runtime algorithm takes about 30 seconds on a GTX1080 GPU reproducing a 1000x800 image with 20,000 strokes.

Via

Access Paper or Ask Questions

Conditional Gaussian Distribution Learning for Open Set Recognition

Mar 20, 2020
Xin Sun, Zhenning Yang, Chi Zhang, Guohao Peng, Keck-Voon Ling

Figure 1 for Conditional Gaussian Distribution Learning for Open Set Recognition

Figure 2 for Conditional Gaussian Distribution Learning for Open Set Recognition

Figure 3 for Conditional Gaussian Distribution Learning for Open Set Recognition

Figure 4 for Conditional Gaussian Distribution Learning for Open Set Recognition

Deep neural networks have achieved state-of-the-art performance in a wide range of recognition/classification tasks. However, when applying deep learning to real-world applications, there are still multiple challenges. A typical challenge is that unknown samples may be fed into the system during the testing phase and traditional deep neural networks will wrongly recognize the unknown sample as one of the known classes. Open set recognition is a potential solution to overcome this problem, where the open set classifier should have the ability to reject unknown samples as well as maintain high classification accuracy on known classes. The variational auto-encoder (VAE) is a popular model to detect unknowns, but it cannot provide discriminative representations for known classification. In this paper, we propose a novel method, Conditional Gaussian Distribution Learning (CGDL), for open set recognition. In addition to detecting unknown samples, this method can also classify known samples by forcing different latent features to approximate different Gaussian models. Meanwhile, to avoid information hidden in the input vanishing in the middle layers, we also adopt the probabilistic ladder architecture to extract high-level abstract features. Experiments on several standard image datasets reveal that the proposed method significantly outperforms the baseline method and achieves new state-of-the-art results.

* Accepted to CVPR2020

Via

Access Paper or Ask Questions

Learning from a Teacher using Unlabeled Data

Nov 13, 2019
Gaurav Menghani, Sujith Ravi

Figure 1 for Learning from a Teacher using Unlabeled Data

Figure 2 for Learning from a Teacher using Unlabeled Data

Figure 3 for Learning from a Teacher using Unlabeled Data

Figure 4 for Learning from a Teacher using Unlabeled Data

Knowledge distillation is a widely used technique for model compression. We posit that the teacher model used in a distillation setup, captures relationships between classes, that extend beyond the original dataset. We empirically show that a teacher model can transfer this knowledge to a student model even on an {\it out-of-distribution} dataset. Using this approach, we show promising results on MNIST, CIFAR-10, and Caltech-256 datasets using unlabeled image data from different sources. Our results are encouraging and help shed further light from the perspective of understanding knowledge distillation and utilizing unlabeled data to improve model quality.

Via

Access Paper or Ask Questions