Semi-supervised learning (SSL) based on deep neural networks (DNNs) has recently been proven effective. However, recent work [Oliver et al., 2018] shows that the performance of SSL could degrade substantially when the unlabeled set has out-of-distribution examples (OODs). In this work, we first study the key causes about the negative impact of OOD on SSL. We found that (1) OODs close to the decision boundary have a larger effect on the performance of existing SSL algorithms than the OODs far away from the decision boundary and (2) Batch Normalization (BN), a popular module in deep networks, could degrade the performance of a DNN for SSL substantially when the unlabeled set contains OODs. To address these causes, we proposed a novel unified robust SSL approach for many existing SSL algorithms in order to improve their robustness against OODs. In particular, we proposed a simple modification to batch normalization, called weighted batch normalization, capable of improving the robustness of BN against OODs. We developed two efficient hyperparameter optimization algorithms that have different tradeoffs in computational efficiency and accuracy. The first is meta-approximation and the second is implicit-differentiation based approximation. Both algorithms learn to reweight the unlabeled samples in order to improve the robustness of SSL against OODs. Extensive experiments on both synthetic and real-world datasets demonstrate that our proposed approach significantly improves the robustness of four representative SSL algorithms against OODs, in comparison with four state-of-the-art robust SSL approaches. We performed an ablation study to demonstrate which components of our approach are most important for its success.
Automatic video summarization is still an unsolved problem due to several challenges. We take steps towards making automatic video summarization more realistic by addressing them. Firstly, the currently available datasets either have very short videos or have few long videos of only a particular type. We introduce a new benchmarking dataset VISIOCITY which comprises of longer videos across six different categories with dense concept annotations capable of supporting different flavors of video summarization and can be used for other vision problems. Secondly, for long videos, human reference summaries are difficult to obtain. We present a novel recipe based on pareto optimality to automatically generate multiple reference summaries from indirect ground truth present in VISIOCITY. We show that these summaries are at par with human summaries. Thirdly, we demonstrate that in the presence of multiple ground truth summaries (due to the highly subjective nature of the task), learning from a single combined ground truth summary using a single loss function is not a good idea. We propose a simple recipe VISIOCITY-SUM to enhance an existing model using a combination of losses and demonstrate that it beats the current state of the art techniques when tested on VISIOCITY. We also show that a single measure to evaluate a summary, as is the current typical practice, falls short. We propose a framework for better quantitative assessment of summary quality which is closer to human judgment than a single measure, say F1. We report the performance of a few representative techniques of video summarization on VISIOCITY assessed using various measures and bring out the limitation of the techniques and/or the assessment mechanism in modeling human judgment and demonstrate the effectiveness of our evaluation framework in doing so.
The paradigm of data programming~\cite{bach2019snorkel} has shown a lot of promise in using weak supervision in the form of rules and labelling functions to learn in scenarios where labelled data is not available. Another approach which has shown a lot of promise is that of semi-supervised learning where we augment small amounts of labelled data with a large unlabelled dataset. In this work, we argue that by not using any labelled data, data programming based approaches can yield sub-optimal performance, particularly, in cases when the labelling functions are noisy. The first contribution of this work is to study a framework of joint learning which combines un-supervised consensus from labelling functions with semi-supervised learning and \emph{jointly learns a model} to efficiently use the rules/labelling functions along with semi-supervised loss functions on the feature space. Next, we also study a subset selection approach to \emph{select} the set of examples which can be used as the labelled set. We evaluate our techniques on synthetic data as well as four publicly available datasets and show improvement over state-of-the-art techniques\footnote{Source code of the paper at \url{https://github.com/ayushbits/Semi-Supervised-LFs-Subset-Selection}}.
Information-theoretic quantities like entropy and mutual information have found numerous uses in machine learning. It is well known that there is a strong connection between these entropic quantities and submodularity since entropy over a set of random variables is submodular. In this paper, we study combinatorial information measures that generalize independence, (conditional) entropy, (conditional) mutual information, and total correlation defined over sets of (not necessarily random) variables. These measures strictly generalize the corresponding entropic measures since they are all parameterized via submodular functions that themselves strictly generalize entropy. Critically, we show that, unlike entropic mutual information in general, the submodular mutual information is actually submodular in one argument, holding the other fixed, for a large class of submodular functions whose third-order partial derivatives satisfy a non-negativity property. This turns out to include a number of practically useful cases such as the facility location and set-cover functions. We study specific instantiations of the submodular information measures on these, as well as the probabilistic coverage, graph-cut, and saturated coverage functions, and see that they all have mathematically intuitive and practically useful expressions. Regarding applications, we connect the maximization of submodular (conditional) mutual information to problems such as mutual-information-based, query-based, and privacy-preserving summarization -- and we connect optimizing the multi-set submodular mutual information to clustering and robust partitioning.
Submodular Functions are a special class of set functions, which generalize several information-theoretic quantities such as entropy and mutual information [1]. Submodular functions have subgradients and subdifferentials [2] and admit polynomial-time algorithms for minimization, both of which are fundamental characteristics of convex functions. Submodular functions also show signs similar to concavity. Submodular function maximization, though NP-hard, admits constant-factor approximation guarantees, and concave functions composed with modular functions are submodular. In this paper, we try to provide a more complete picture of the relationship between submodularity with concavity. We characterize the super-differentials and polyhedra associated with upper bounds and provide optimality conditions for submodular maximization using the-super differentials. This paper is a concise and shorter version of our longer preprint [3].
Robust Optimization is becoming increasingly important in machine learning applications. This paper studies the problem of robust submodular minimization subject to combinatorial constraints. Constrained Submodular Minimization arises in several applications such as co-operative cuts in image segmentation, co-operative matchings in image correspondence, etc. Many of these models are defined over clusterings of data points (for example pixels in images), and it is important for these models to be robust to perturbations and uncertainty in the data. While several existing papers have studied robust submodular maximization, ours is the first work to study the minimization version under a broad range of combinatorial constraints including cardinality, knapsack, matroid as well as graph-based constraints such as cuts, paths, matchings, and trees. In each case, we provide scalable approximation algorithms and also study hardness bounds. Finally, we empirically demonstrate the utility of our algorithms on synthetic and real-world datasets.
In this paper, we shall study a unified framework of robust submodular optimization. We study this problem both from a minimization and maximization perspective (previous work has only focused on variants of robust submodular maximization). We do this under a broad range of combinatorial constraints including cardinality, knapsack, matroid as well as graph based constraints such as cuts, paths, matchings and trees. Furthermore, we also study robust submodular minimization and maximization under multiple submodular upper and lower bound constraints. We show that all these problems are motivated by important machine learning applications including robust data subset selection, robust co-operative cuts and robust co-operative matchings. In each case, we provide scalable approximation algorithms and also study hardness bounds. Finally, we empirically demonstrate the utility of our algorithms on real world applications.
We are motivated by large scale submodular optimization problems, where standard algorithms that treat the submodular functions in the \emph{value oracle model} do not scale. In this paper, we present a model called the \emph{precomputational complexity model}, along with a unifying memoization based framework, which looks at the specific form of the given submodular function. A key ingredient in this framework is the notion of a \emph{precomputed statistic}, which is maintained in the course of the algorithms. We show that we can easily integrate this idea into a large class of submodular optimization problems including constrained and unconstrained submodular maximization, minimization, difference of submodular optimization, optimization with submodular constraints and several other related optimization problems. Moreover, memoization can be integrated in both discrete and continuous relaxation flavors of algorithms for these problems. We demonstrate this idea for several commonly occurring submodular functions, and show how the precomputational model provides significant speedups compared to the value oracle model. Finally, we empirically demonstrate this for large scale machine learning problems of data subset selection and summarization.