We study the problem of dataset distillation - creating a small set of synthetic examples capable of training a good model. In particular, we study the problem of label distillation - creating synthetic labels for a small set of real images, and show it to be more effective than the prior image-based approach to dataset distillation. Interestingly, label distillation can be applied across datasets, for example enabling learning Japanese character recognition by training only on synthetically labeled English letters. Methodologically, we introduce a more robust and flexible meta-learning algorithm for distillation, as well as an effective first-order strategy based on convex optimization layers. Distilling labels with our new algorithm leads to improved results over prior image-based distillation. More importantly, it leads to clear improvements in flexibility of the distilled dataset in terms of compatibility with off-the-shelf optimizers and diverse neural architectures.
In semi-supervised learning (SSL), a rule to predict labels $y$ for data $x$ is learned from labelled data $(x^l,y^l)$ and unlabelled samples $x^u$. Strong progress has been made by combining a variety of methods, some of which pertain to $p(x)$, e.g. data augmentation that generates artificial samples from true $x$; whilst others relate to model outputs $p(y|x)$, e.g. regularising predictions on unlabelled data to minimise entropy or induce mutual exclusivity. Focusing on the latter, we fill a gap in the standard text by introducing a unifying probabilistic model for discriminative semi-supervised learning, mirroring that for classical generative methods. We show that several SSL methods can be theoretically justified under our model as inducing approximate priors over predicted parameters of $p(y|x)$. For tasks where labels represent binary attributes, our model leads to a principled approach to neuro-symbolic SSL, bridging the divide between statistical learning and logical rules.
Learning low-dimensional latent state space dynamics models has been a powerful paradigm for enabling vision-based planning and learning for control. We introduce a latent dynamics learning framework that is uniquely designed to induce proportional controlability in the latent space, thus enabling the use of much simpler controllers than prior work. We show that our learned dynamics model enables proportional control from pixels, dramatically simplifies and accelerates behavioural cloning of vision-based controllers, and provides interpretable goal discovery when applied to imitation learning of switching controllers from demonstration.
A practical shortcoming of deep neural networks is their specialization to a single task and domain. While recent techniques in domain adaptation and multi-domain learning enable the learning of more domain-agnostic features, their success relies on the presence of domain labels, typically requiring manual annotation and careful curation of datasets. Here we focus on a less explored, but more realistic case: learning from data from multiple domains, without access to domain annotations. In this scenario, standard model training leads to the overfitting of large domains, while disregarding smaller ones. We address this limitation via dynamic residual adapters, an adaptive gating mechanism that helps account for latent domains, coupled with an augmentation strategy inspired by recent style transfer techniques. Our proposed approach is examined on image classification tasks containing multiple latent domains, and we showcase its ability to obtain robust performance across these. Dynamic residual adapters significantly outperform off-the-shelf networks with much larger capacity, and can be incorporated seamlessly with existing architectures in an end-to-end manner.
We consider the challenging problem of zero-shot video object segmentation (VOS). That is, segmenting and tracking multiple moving objects within a video fully automatically, without any manual initialization. We treat this as a grouping problem by exploiting object proposals and making a joint inference about grouping over both space and time. We propose a network architecture for tractably performing proposal selection and joint grouping. Crucially, we then show how to train this network with reinforcement learning so that it learns to perform the optimal non-myopic sequence of grouping decisions to segment the whole video. Unlike standard supervised techniques, this also enables us to directly optimize for the non-differentiable overlap-based metrics used to evaluate VOS. We show that the proposed method, which we call ALBA outperforms the previous stateof-the-art on three benchmarks: DAVIS 2017 [2], FBMS [20] and Youtube-VOS [27].
Robots are still limited to controlled conditions, that the robot designer knows with enough details to endow the robot with the appropriate models or behaviors. Learning algorithms add some flexibility with the ability to discover the appropriate behavior given either some demonstrations or a reward to guide its exploration with a reinforcement learning algorithm. Reinforcement learning algorithms rely on the definition of state and action spaces that define reachable behaviors. Their adaptation capability critically depends on the representations of these spaces: small and discrete spaces result in fast learning while large and continuous spaces are challenging and either require a long training period or prevent the robot from converging to an appropriate behavior. Beside the operational cycle of policy execution and the learning cycle, which works at a slower time scale to acquire new policies, we introduce the redescription cycle, a third cycle working at an even slower time scale to generate or adapt the required representations to the robot, its environment and the task. We introduce the challenges raised by this cycle and we present DREAM (Deferred Restructuring of Experience in Autonomous Machines), a developmental cognitive architecture to bootstrap this redescription process stage by stage, build new state representations with appropriate motivations, and transfer the acquired knowledge across domains or tasks or even across robots. We describe results obtained so far with this approach and end up with a discussion of the questions it raises in Neuroscience.
We propose a simple yet effective multiple kernel clustering algorithm, termed simple multiple kernel k-means (SimpleMKKM). It extends the widely used supervised kernel alignment criterion to multi-kernel clustering. Our criterion is given by an intractable minimization-maximization problem in the kernel coefficient and clustering partition matrix. To optimize it, we re-formulate the problem as a smooth minimization one, which can be solved efficiently using a reduced gradient descent algorithm. We theoretically analyze the performance of SimpleMKKM in terms of its clustering generalization error. Comprehensive experiments on 11 benchmark datasets demonstrate that SimpleMKKM outperforms state of the art multi-kernel clustering alternatives.
The field of meta-learning, or learning-to-learn, has seen a dramatic rise in interest in recent years. Contrary to conventional approaches to AI where a given task is solved from scratch using a fixed learning algorithm, meta-learning aims to improve the learning algorithm itself, given the experience of multiple learning episodes. This paradigm provides an opportunity to tackle many of the conventional challenges of deep learning, including data and computation bottlenecks, as well as the fundamental issue of generalization. In this survey we describe the contemporary meta-learning landscape. We first discuss definitions of meta-learning and position it with respect to related fields, such as transfer learning, multi-task learning, and hyperparameter optimization. We then propose a new taxonomy that provides a more comprehensive breakdown of the space of meta-learning methods today. We survey promising applications and successes of meta-learning including few-shot learning, reinforcement learning and architecture search. Finally, we discuss outstanding challenges and promising areas for future research.
Domain adaptation (DA) is the topical problem of adapting models from labelled source datasets so that they perform well on target datasets where only unlabelled or partially labelled data is available. Many methods have been proposed to address this problem through different ways to minimise the domain shift between source and target datasets. In this paper we take an orthogonal perspective and propose a framework to further enhance performance by meta-learning the initial conditions of existing DA algorithms. This is challenging compared to the more widely considered setting of few-shot meta-learning, due to the length of the computation graph involved. Therefore we propose an online shortest-path meta-learning framework that is both computationally tractable and practically effective for improving DA performance. We present variants for both multi-source unsupervised domain adaptation (MSDA), and semi-supervised domain adaptation (SSDA). Importantly, our approach is agnostic to the base adaptation algorithm, and can be applied to improve many techniques. Experimentally, we demonstrate improvements on classic (DANN) and recent (MCD and MME) techniques for MSDA and SSDA, and ultimately achieve state of the art results on several DA benchmarks including the largest scale DomainNet.
In this paper we propose a sequential learning framework for Domain Generalization (DG), the problem of training a model that is robust to domain shift by design. Various DG approaches have been proposed with different motivating intuitions, but they typically optimize for a single step of domain generalization -- training on one set of domains and generalizing to one other. Our sequential learning is inspired by the idea lifelong learning, where accumulated experience means that learning the $n^{th}$ thing becomes easier than the $1^{st}$ thing. In DG this means encountering a sequence of domains and at each step training to maximise performance on the next domain. The performance at domain $n$ then depends on the previous $n-1$ learning problems. Thus backpropagating through the sequence means optimizing performance not just for the next domain, but all following domains. Training on all such sequences of domains provides dramatically more `practice' for a base DG learner compared to existing approaches, thus improving performance on a true testing domain. This strategy can be instantiated for different base DG algorithms, but we focus on its application to the recently proposed Meta-Learning Domain generalization (MLDG). We show that for MLDG it leads to a simple to implement and fast algorithm that provides consistent performance improvement on a variety of DG benchmarks.