Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Guillermo Sapiro

University of Minnesota

GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions

Apr 30, 2021

Chenfei Wu, Lun Huang, Qianxi Zhang, Binyang Li, Lei Ji, Fan Yang, Guillermo Sapiro, Nan Duan

Figure 1 for GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions

Figure 2 for GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions

Figure 3 for GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions

Figure 4 for GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions

Abstract:Generating videos from text is a challenging task due to its high computational requirements for training and infinite possible answers for evaluation. Existing works typically experiment on simple or small datasets, where the generalization ability is quite limited. In this work, we propose GODIVA, an open-domain text-to-video pretrained model that can generate videos from text in an auto-regressive manner using a three-dimensional sparse attention mechanism. We pretrain our model on Howto100M, a large-scale text-video dataset that contains more than 136 million text-video pairs. Experiments show that GODIVA not only can be fine-tuned on downstream video generation tasks, but also has a good zero-shot capability on unseen texts. We also propose a new metric called Relative Matching (RM) to automatically evaluate the video generation quality. Several challenges are listed and discussed as future work.

Via

Access Paper or Ask Questions

Cirrus: A Long-range Bi-pattern LiDAR Dataset

Dec 05, 2020

Ze Wang, Sihao Ding, Ying Li, Jonas Fenn, Sohini Roychowdhury, Andreas Wallin, Lane Martin, Scott Ryvola, Guillermo Sapiro, Qiang Qiu

Figure 1 for Cirrus: A Long-range Bi-pattern LiDAR Dataset

Figure 2 for Cirrus: A Long-range Bi-pattern LiDAR Dataset

Figure 3 for Cirrus: A Long-range Bi-pattern LiDAR Dataset

Figure 4 for Cirrus: A Long-range Bi-pattern LiDAR Dataset

Abstract:In this paper, we introduce Cirrus, a new long-range bi-pattern LiDAR public dataset for autonomous driving tasks such as 3D object detection, critical to highway driving and timely decision making. Our platform is equipped with a high-resolution video camera and a pair of LiDAR sensors with a 250-meter effective range, which is significantly longer than existing public datasets. We record paired point clouds simultaneously using both Gaussian and uniform scanning patterns. Point density varies significantly across such a long range, and different scanning patterns further diversify object representation in LiDAR. In Cirrus, eight categories of objects are exhaustively annotated in the LiDAR point clouds for the entire effective range. To illustrate the kind of studies supported by this new dataset, we introduce LiDAR model adaptation across different ranges, scanning patterns, and sensor devices. Promising results show the great potential of this new dataset to the robotics and computer vision communities.

Via

Access Paper or Ask Questions

Using Text to Teach Image Retrieval

Nov 19, 2020

Haoyu Dong, Ze Wang, Qiang Qiu, Guillermo Sapiro

Figure 1 for Using Text to Teach Image Retrieval

Figure 2 for Using Text to Teach Image Retrieval

Figure 3 for Using Text to Teach Image Retrieval

Figure 4 for Using Text to Teach Image Retrieval

Abstract:Image retrieval relies heavily on the quality of the data modeling and the distance measurement in the feature space. Building on the concept of image manifold, we first propose to represent the feature space of images, learned via neural networks, as a graph. Neighborhoods in the feature space are now defined by the geodesic distance between images, represented as graph vertices or manifold samples. When limited images are available, this manifold is sparsely sampled, making the geodesic computation and the corresponding retrieval harder. To address this, we augment the manifold samples with geometrically aligned text, thereby using a plethora of sentences to teach us about images. In addition to extensive results on standard datasets illustrating the power of text to help in image retrieval, a new public dataset based on CLEVR is introduced to quantify the semantic similarity between visual data and text data. The experimental results show that the joint embedding manifold is a robust representation, allowing it to be a better basis to perform image retrieval given only an image and a textual instruction on the desired modifications over the image

Via

Access Paper or Ask Questions

Minimax Pareto Fairness: A Multi Objective Perspective

Nov 03, 2020

Natalia Martinez, Martin Bertran, Guillermo Sapiro

Figure 1 for Minimax Pareto Fairness: A Multi Objective Perspective

Figure 2 for Minimax Pareto Fairness: A Multi Objective Perspective

Figure 3 for Minimax Pareto Fairness: A Multi Objective Perspective

Figure 4 for Minimax Pareto Fairness: A Multi Objective Perspective

Abstract:In this work we formulate and formally characterize group fairness as a multi-objective optimization problem, where each sensitive group risk is a separate objective. We propose a fairness criterion where a classifier achieves minimax risk and is Pareto-efficient w.r.t. all groups, avoiding unnecessary harm, and can lead to the best zero-gap model if policy dictates so. We provide a simple optimization algorithm compatible with deep neural networks to satisfy these constraints. Since our method does not require test-time access to sensitive attributes, it can be applied to reduce worst-case classification errors between outcomes in unbalanced classification problems. We test the proposed methodology on real case-studies of predicting income, ICU patient mortality, skin lesions classification, and assessing credit risk, demonstrating how our framework compares favorably to other approaches.

* International Conference on Machine Learning, 2020

Via

Access Paper or Ask Questions

Instance based Generalization in Reinforcement Learning

Nov 02, 2020

Martin Bertran, Natalia Martinez, Mariano Phielipp, Guillermo Sapiro

Figure 1 for Instance based Generalization in Reinforcement Learning

Figure 2 for Instance based Generalization in Reinforcement Learning

Figure 3 for Instance based Generalization in Reinforcement Learning

Figure 4 for Instance based Generalization in Reinforcement Learning

Abstract:Agents trained via deep reinforcement learning (RL) routinely fail to generalize to unseen environments, even when these share the same underlying dynamics as the training levels. Understanding the generalization properties of RL is one of the challenges of modern machine learning. Towards this goal, we analyze policy learning in the context of Partially Observable Markov Decision Processes (POMDPs) and formalize the dynamics of training levels as instances. We prove that, independently of the exploration strategy, reusing instances introduces significant changes on the effective Markov dynamics the agent observes during training. Maximizing expected rewards impacts the learned belief state of the agent by inducing undesired instance specific speedrunning policies instead of generalizeable ones, which are suboptimal on the training set. We provide generalization bounds to the value gap in train and test environments based on the number of training instances, and use insights based on these to improve performance on unseen levels. We propose training a shared belief representation over an ensemble of specialized policies, from which we compute a consensus policy that is used for data collection, disallowing instance specific exploitation. We experimentally validate our theory, observations, and the proposed computational solution over the CoinRun benchmark.

* Accepted on NeurIPS 2020

Via

Access Paper or Ask Questions

ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution

Sep 04, 2020

Ze Wang, Xiuyuan Cheng, Guillermo Sapiro, Qiang Qiu

Figure 1 for ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution

Figure 2 for ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution

Figure 3 for ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution

Figure 4 for ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution

Abstract:Convolutional Neural Networks (CNNs) are known to be significantly over-parametrized, and difficult to interpret, train and adapt. In this paper, we introduce a structural regularization across convolutional kernels in a CNN. In our approach, each convolution kernel is first decomposed as 2D dictionary atoms linearly combined by coefficients. The widely observed correlation and redundancy in a CNN hint a common low-rank structure among the decomposed coefficients, which is here further supported by our empirical observations. We then explicitly regularize CNN kernels by enforcing decomposed coefficients to be shared across sub-structures, while leaving each sub-structure only its own dictionary atoms, a few hundreds of parameters typically, which leads to dramatic model reductions. We explore models with sharing across different sub-structures to cover a wide range of trade-offs between parameter reduction and expressiveness. Our proposed regularized network structures open the door to better interpreting, training and adapting deep models. We validate the flexibility and compatibility of our method by image classification experiments on multiple datasets and underlying network structures, and show that CNNs now maintain performance with dramatic reduction in parameters and computations, e.g., only 5\% parameters are used in a ResNet-18 to achieve comparable performance. Further experiments on few-shot classification show that faster and more robust task adaptation is obtained in comparison with models with standard convolutions.

Via

Access Paper or Ask Questions

Nested Learning For Multi-Granular Tasks

Jul 13, 2020

Raphaël Achddou, J. Matias di Martino, Guillermo Sapiro

Figure 1 for Nested Learning For Multi-Granular Tasks

Figure 2 for Nested Learning For Multi-Granular Tasks

Figure 3 for Nested Learning For Multi-Granular Tasks

Figure 4 for Nested Learning For Multi-Granular Tasks

Abstract:Standard deep neural networks (DNNs) are commonly trained in an end-to-end fashion for specific tasks such as object recognition, face identification, or character recognition, among many examples. This specificity often leads to overconfident models that generalize poorly to samples that are not from the original training distribution. Moreover, such standard DNNs do not allow to leverage information from heterogeneously annotated training data, where for example, labels may be provided with different levels of granularity. Furthermore, DNNs do not produce results with simultaneous different levels of confidence for different levels of detail, they are most commonly an all or nothing approach. To address these challenges, we introduce the concept of nested learning: how to obtain a hierarchical representation of the input such that a coarse label can be extracted first, and sequentially refine this representation, if the sample permits, to obtain successively refined predictions, all of them with the corresponding confidence. We explicitly enforce this behavior by creating a sequence of nested information bottlenecks. Looking at the problem of nested learning from an information theory perspective, we design a network topology with two important properties. First, a sequence of low dimensional (nested) feature embeddings are enforced. Then we show how the explicit combination of nested outputs can improve both the robustness and the accuracy of finer predictions. Experimental results on Cifar-10, Cifar-100, MNIST, Fashion-MNIST, Dbpedia, and Plantvillage demonstrate that nested learning outperforms the same network trained in the standard end-to-end fashion.

Via

Access Paper or Ask Questions

Differential 3D Facial Recognition: Adding 3D to Your State-of-the-Art 2D Method

Apr 03, 2020

J. Matias Di Martino, Fernando Suzacq, Mauricio Delbracio, Qiang Qiu, Guillermo Sapiro

Figure 1 for Differential 3D Facial Recognition: Adding 3D to Your State-of-the-Art 2D Method

Figure 2 for Differential 3D Facial Recognition: Adding 3D to Your State-of-the-Art 2D Method

Figure 3 for Differential 3D Facial Recognition: Adding 3D to Your State-of-the-Art 2D Method

Figure 4 for Differential 3D Facial Recognition: Adding 3D to Your State-of-the-Art 2D Method

Abstract:Active illumination is a prominent complement to enhance 2D face recognition and make it more robust, e.g., to spoofing attacks and low-light conditions. In the present work we show that it is possible to adopt active illumination to enhance state-of-the-art 2D face recognition approaches with 3D features, while bypassing the complicated task of 3D reconstruction. The key idea is to project over the test face a high spatial frequency pattern, which allows us to simultaneously recover real 3D information plus a standard 2D facial image. Therefore, state-of-the-art 2D face recognition solution can be transparently applied, while from the high frequency component of the input image, complementary 3D facial features are extracted. Experimental results on ND-2006 dataset show that the proposed ideas can significantly boost face recognition performance and dramatically improve the robustness to spoofing attacks.

Via

Access Paper or Ask Questions

Fairness With Minimal Harm: A Pareto-Optimal Approach For Healthcare

Nov 16, 2019

Natalia Martinez, Martin Bertran, Guillermo Sapiro

Figure 1 for Fairness With Minimal Harm: A Pareto-Optimal Approach For Healthcare

Figure 2 for Fairness With Minimal Harm: A Pareto-Optimal Approach For Healthcare

Figure 3 for Fairness With Minimal Harm: A Pareto-Optimal Approach For Healthcare

Abstract:Common fairness definitions in machine learning focus on balancing notions of disparity and utility. In this work, we study fairness in the context of risk disparity among sub-populations. We are interested in learning models that minimize performance discrepancies across sensitive groups without causing unnecessary harm. This is relevant to high-stakes domains such as healthcare, where non-maleficence is a core principle. We formalize this objective using Pareto frontiers, and provide analysis, based on recent works in fairness, to exemplify scenarios were perfect fairness might not be feasible without doing unnecessary harm. We present a methodology for training neural networks that achieve our goal by dynamically re-balancing subgroups risks. We argue that even in domains where fairness at cost is required, finding a non-unnecessary-harm fairness model is the optimal initial step. We demonstrate this methodology on real case-studies of predicting ICU patient mortality, and classifying skin lesions from dermatoscopic images.

Via

Access Paper or Ask Questions

SalGaze: Personalizing Gaze Estimation Using Visual Saliency

Oct 23, 2019

Zhuoqing Chang, Matias Di Martino, Qiang Qiu, Steven Espinosa, Guillermo Sapiro

Figure 1 for SalGaze: Personalizing Gaze Estimation Using Visual Saliency

Figure 2 for SalGaze: Personalizing Gaze Estimation Using Visual Saliency

Figure 3 for SalGaze: Personalizing Gaze Estimation Using Visual Saliency

Figure 4 for SalGaze: Personalizing Gaze Estimation Using Visual Saliency

Abstract:Traditional gaze estimation methods typically require explicit user calibration to achieve high accuracy. This process is cumbersome and recalibration is often required when there are changes in factors such as illumination and pose. To address this challenge, we introduce SalGaze, a framework that utilizes saliency information in the visual content to transparently adapt the gaze estimation algorithm to the user without explicit user calibration. We design an algorithm to transform a saliency map into a differentiable loss map that can be used for the optimization of CNN-based models. SalGaze is also able to greatly augment standard point calibration data with implicit video saliency calibration data using a unified framework. We show accuracy improvements over 24% using our technique on existing methods.

* Accepted by ICCV 2019 Workshop

Via

Access Paper or Ask Questions