With increasing data, techniques for finding smaller, yet effective subsets with specific characteristics become important. Motivated by this, we present PRISM, a rich class of Parameterized Submodular Information Measures, that can be used in applications where such targeted subsets are desired. We demonstrate the utility of PRISM in two such applications. First, we apply PRISM to improve a supervised model's performance at a given additional labeling cost by targeted subset selection (PRISM-TSS) where a subset of unlabeled points matching a target set are added to the training set. We show that PRISM-TSS generalizes and is connected to several existing approaches to targeted data subset selection. Second, we apply PRISM to a more nuanced targeted summarization (PRISM-TSUM) where data (e.g., image collections, text or videos) is summarized for quicker human consumption with additional user intent. PRISM-TSUM handles multiple flavors of targeted summarization such as query-focused, topic-irrelevant, privacy-preserving and update summarization in a unified way. We show that PRISM-TSUM also generalizes and unifies several existing past work on targeted summarization. Through extensive experiments on image classification and image-collection summarization we empirically verify the superiority of PRISM-TSS and PRISM-TSUM over the state-of-the-art.
The great success of modern machine learning models on large datasets is contingent on extensive computational resources with high financial and environmental costs. One way to address this is by extracting subsets that generalize on par with the full data. In this work, we propose a general framework, GRAD-MATCH, which finds subsets that closely match the gradient of the training or validation set. We find such subsets effectively using an orthogonal matching pursuit algorithm. We show rigorous theoretical and convergence guarantees of the proposed algorithm and, through our extensive experiments on real-world datasets, show the effectiveness of our proposed framework. We show that GRAD-MATCH significantly and consistently outperforms several recent data-selection algorithms and is Pareto-optimal with respect to the accuracy-efficiency trade-off. The code of GRADMATCH is available as a part of the CORDS toolkit: https://github.com/decile-team/cords.
Automatic video summarization is still an unsolved problem due to several challenges. The currently available datasets either have very short videos or have few long videos of only a particular type. We introduce a new benchmarking video dataset called VISIOCITY (VIdeo SummarIzatiOn based on Continuity, Intent and DiversiTY) which comprises of longer videos across six different categories with dense concept annotations capable of supporting different flavors of video summarization and other vision problems. For long videos, human reference summaries necessary for supervised video summarization techniques are difficult to obtain. We explore strategies to automatically generate multiple reference summaries from indirect ground truth present in VISIOCITY. We show that these summaries are at par with human summaries. We also present a study of different desired characteristics of a good summary and demonstrate how it is normal to have two good summaries with different characteristics. Thus we argue that evaluating a summary against one or more human summaries and using a single measure has its shortcomings. We propose an evaluation framework for better quantitative assessment of summary quality which is closer to human judgment. Lastly, we present insights into how a model can be enhanced to yield better summaries. Sepcifically, when multiple diverse ground truth summaries can exist, learning from them individually and using a combination of loss functions measuring different characteristics is better than learning from a single combined (oracle) ground truth summary using a single loss function. We demonstrate the effectiveness of doing so as compared to some of the representative state of the art techniques tested on VISIOCITY. We release VISIOCITY as a benchmarking dataset and invite researchers to test the effectiveness of their video summarization algorithms on VISIOCITY.
Large scale machine learning and deep models are extremely data-hungry. Unfortunately, obtaining large amounts of labeled data is expensive, and training state-of-the-art models (with hyperparameter tuning) requires significant computing resources and time. Secondly, real-world data is noisy and imbalanced. As a result, several recent papers try to make the training process more efficient and robust. However, most existing work either focuses on robustness or efficiency, but not both. In this work, we introduce Glister, a GeneraLIzation based data Subset selecTion for Efficient and Robust learning framework. We formulate Glister as a mixed discrete-continuous bi-level optimization problem to select a subset of the training data, which maximizes the log-likelihood on a held-out validation set. Next, we propose an iterative online algorithm Glister-Online, which performs data selection iteratively along with the parameter updates and can be applied to any loss-based learning algorithm. We then show that for a rich class of loss functions including cross-entropy, hinge-loss, squared-loss, and logistic-loss, the inner discrete data selection is an instance of (weakly) submodular optimization, and we analyze conditions for which Glister-Online reduces the validation loss and converges. Finally, we propose Glister-Active, an extension to batch active learning, and we empirically demonstrate the performance of Glister on a wide range of tasks including, (a) data selection to reduce training time, (b) robust learning under label noise and imbalance settings, and (c) batch-active learning with several deep and shallow models. We show that our framework improves upon state of the art both in efficiency and accuracy (in cases (a) and (c)) and is more efficient compared to other state-of-the-art robust learning algorithms in case (b).
Model-Agnostic Meta-Learning (MAML) is a popular gradient-based meta-learning framework that tries to find an optimal initialization to minimize the expected loss across all tasks during meta-training. However, it inherently assumes that the contribution of each instance/task to the meta-learner is equal. Therefore, it fails to address the problem of domain differences between base and novel classes in few-shot learning. In this work, we propose a novel and robust meta-learning algorithm, called RW-MAML, which learns to assign weights to training instances or tasks. We consider these weights to be hyper-parameters. Hence, we iteratively optimize the weights using a small set of validation tasks and an online approximation in a \emph{bi-bi-level} optimization framework, in contrast to the standard bi-level optimization in MAML. Therefore, we investigate a practical evaluation setting to demonstrate the scalability of our RW-MAML in two scenarios: (1) out-of-distribution tasks and (2) noisy labels in the meta-training stage. Extensive experiments on synthetic and real-world datasets demonstrate that our framework efficiently mitigates the effects of "unwanted" instances, showing that our proposed technique significantly outperforms state-of-the-art robust meta-learning methods.
Deep Models are increasingly becoming prevalent in summarization problems (e.g. document, video and images) due to their ability to learn complex feature interactions and representations. However, they do not model characteristics such as diversity, representation, and coverage, which are also very important for summarization tasks. On the other hand, submodular functions naturally model these characteristics because of their diminishing returns property. Most approaches for modelling and learning submodular functions rely on very simple models, such as weighted mixtures of submodular functions. Unfortunately, these models only learn the relative importance of the different submodular functions (such as diversity, representation or importance), but cannot learn more complex feature representations, which are often required for state-of-the-art performance. We propose Deep Submodular Networks (DSN), an end-to-end learning framework that facilitates the learning of more complex features and richer functions, crafted for better modelling of all aspects of summarization. The DSN framework can be used to learn features appropriate for summarization from scratch. We demonstrate the utility of DSNs on both generic and query focused image-collection summarization, and show significant improvement over the state-of-the-art. In particular, we show that DSNs outperform simple mixture models using off the shelf features. Secondly, we also show that just using four submodular functions in a DSN with end-to-end learning performs comparably to the state-of-the-art mixture model with a hand-crafted set of 594 components and outperforms other methods for image collection summarization.
Federated Learning (FL) is a decentralized machine learning protocol that allows a set of participating agents to collaboratively train a model without sharing their data. This makes FL particularly suitable for settings where data privacy is desired. However, it has been observed that the performance of FL is closely tied with the local data distributions of agents. Particularly, in settings where local data distributions vastly differ among agents, FL performs rather poorly with respect to the centralized training. To address this problem, we hypothesize the reasons behind the performance degradation, and develop some techniques to address these reasons accordingly. In this work, we identify four simple techniques that can improve the performance of trained models without incurring any additional communication overhead to FL, but rather, some light computation overhead either on the client, or the server-side. In our experimental analysis, combination of our techniques improved the validation accuracy of a model trained via FL by more than 12% with respect to our baseline. This is about 5% less than the accuracy of the model trained on centralized data.
We study submodular information measures as a rich framework for generic, query-focused, privacy sensitive, and update summarization tasks. While past work generally treats these problems differently ({\em e.g.}, different models are often used for generic and query-focused summarization), the submodular information measures allow us to study each of these problems via a unified approach. We first show that several previous query-focused and update summarization techniques have, unknowingly, used various instantiations of the aforesaid submodular information measures, providing evidence for the benefit and naturalness of these models. We then carefully study and demonstrate the modelling capabilities of the proposed functions in different settings and empirically verify our findings on both a synthetic dataset and an existing real-world image collection dataset (that has been extended by adding concept annotations to each image making it suitable for this task) and will be publicly released. We employ a max-margin framework to learn a mixture model built using the proposed instantiations of submodular information measures and demonstrate the effectiveness of our approach. While our experiments are in the context of image summarization, our framework is generic and can be easily extended to other summarization settings (e.g., videos or documents).