Deep learning has greatly improved visual recognition in recent years. However, recent research has shown that there exist many adversarial examples that can negatively impact the performance of such an architecture. This paper focuses on detecting those adversarial examples by analyzing whether they come from the same distribution as the normal examples. Instead of directly training a deep neural network to detect adversarials, a much simpler approach was proposed based on statistics on outputs from convolutional layers. A cascade classifier was designed to efficiently detect adversarials. Furthermore, trained from one particular adversarial generating mechanism, the resulting classifier can successfully detect adversarials from a completely different mechanism as well. The resulting classifier is non-subdifferentiable, hence creates a difficulty for adversaries to attack by using the gradient of the classifier. After detecting adversarial examples, we show that many of them can be recovered by simply performing a small average filter on the image. Those findings should lead to more insights about the classification mechanisms in deep convolutional neural networks.
We propose a continuous optimization method for solving dense 3D scene flow problems from stereo imagery. As in recent work, we represent the dynamic 3D scene as a collection of rigidly moving planar segments. The scene flow problem then becomes the joint estimation of pixel-to-segment assignment, 3D position, normal vector and rigid motion parameters for each segment, leading to a complex and expensive discrete-continuous optimization problem. In contrast, we propose a purely continuous formulation which can be solved more efficiently. Using a fine superpixel segmentation that is fixed a-priori, we propose a factor graph formulation that decomposes the problem into photometric, geometric, and smoothing constraints. We initialize the solution with a novel, high-quality initialization method, then independently refine the geometry and motion of the scene, and finally perform a global non-linear refinement using Levenberg-Marquardt. We evaluate our method in the challenging KITTI Scene Flow benchmark, ranking in third position, while being 3 to 30 times faster than the top competitors.
A fundamental challenge to sensory processing tasks in perception and robotics is the problem of obtaining data associations across views. We present a robust solution for ascertaining potentially dense surface patch (superpixel) associations, requiring just range information. Our approach involves decomposition of a view into regularized surface patches. We represent them as sequences expressing geometry invariantly over their superpixel neighborhoods, as uniquely consistent partial orderings. We match these representations through an optimal sequence comparison metric based on the Damerau-Levenshtein distance - enabling robust association with quadratic complexity (in contrast to hitherto employed joint matching formulations which are NP-complete). The approach is able to perform under wide baselines, heavy rotations, partial overlaps, significant occlusions and sensor noise. The technique does not require any priors -- motion or otherwise, and does not make restrictive assumptions on scene structure and sensor movement. It does not require appearance -- is hence more widely applicable than appearance reliant methods, and invulnerable to related ambiguities such as textureless or aliased content. We present promising qualitative and quantitative results under diverse settings, along with comparatives with popular approaches based on range as well as RGB-D data.
Sentiment analysis predicts the presence of positive or negative emotions in a text document. In this paper we consider higher dimensional extensions of the sentiment concept, which represent a richer set of human emotions. Our approach goes beyond previous work in that our model contains a continuous manifold rather than a finite set of human emotions. We investigate the resulting model, compare it to psychological observations, and explore its predictive capabilities. Besides obtaining significant improvements over a baseline without manifold, we are also able to visualize different notions of positive sentiment in different domains.
We propose a new analytical approximation to the $\chi^2$ kernel that converges geometrically. The analytical approximation is derived with elementary methods and adapts to the input distribution for optimal convergence rate. Experiments show the new approximation leads to improved performance in image classification and semantic segmentation tasks using a random Fourier feature approximation of the $\exp-\chi^2$ kernel. Besides, out-of-core principal component analysis (PCA) methods are introduced to reduce the dimensionality of the approximation and achieve better performance at the expense of only an additional constant factor to the time complexity. Moreover, when PCA is performed jointly on the training and unlabeled testing data, further performance improvements can be obtained. Experiments conducted on the PASCAL VOC 2010 segmentation and the ImageNet ILSVRC 2010 datasets show statistically significant improvements over alternative approximation methods.
Sentiment analysis predicts the presence of positive or negative emotions in a text document. In this paper, we consider higher dimensional extensions of the sentiment concept, which represent a richer set of human emotions. Our approach goes beyond previous work in that our model contains a continuous manifold rather than a finite set of human emotions. We investigate the resulting model, compare it to psychological observations, and explore its predictive capabilities.
Approximations based on random Fourier features have recently emerged as an efficient and formally consistent methodology to design large-scale kernel machines. By expressing the kernel as a Fourier expansion, features are generated based on a finite set of random basis projections, sampled from the Fourier transform of the kernel, with inner products that are Monte Carlo approximations of the original kernel. Based on the observation that different kernel-induced Fourier sampling distributions correspond to different kernel parameters, we show that an optimization process in the Fourier domain can be used to identify the different frequency bands that are useful for prediction on training data. Moreover, the application of group Lasso to random feature vectors corresponding to a linear combination of multiple kernels, leads to efficient and scalable reformulations of the standard multiple kernel learning model \cite{Varma09}. In this paper we develop the linear Fourier approximation methodology for both single and multiple gradient-based kernel learning and show that it produces fast and accurate predictors on a complex dataset such as the Visual Object Challenge 2011 (VOC2011).