Alert button
Picture for Chris Pal

Chris Pal

Alert button

ArK: Augmented Reality with Knowledge Interactive Emergent Ability

May 01, 2023
Qiuyuan Huang, Jae Sung Park, Abhinav Gupta, Paul Bennett, Ran Gong, Subhojit Som, Baolin Peng, Owais Khan Mohammed, Chris Pal, Yejin Choi, Jianfeng Gao

Figure 1 for ArK: Augmented Reality with Knowledge Interactive Emergent Ability
Figure 2 for ArK: Augmented Reality with Knowledge Interactive Emergent Ability
Figure 3 for ArK: Augmented Reality with Knowledge Interactive Emergent Ability
Figure 4 for ArK: Augmented Reality with Knowledge Interactive Emergent Ability

Despite the growing adoption of mixed reality and interactive AI agents, it remains challenging for these systems to generate high quality 2D/3D scenes in unseen environments. The common practice requires deploying an AI agent to collect large amounts of data for model training for every new task. This process is costly, or even impossible, for many domains. In this study, we develop an infinite agent that learns to transfer knowledge memory from general foundation models (e.g. GPT4, DALLE) to novel domains or scenarios for scene understanding and generation in the physical or virtual world. The heart of our approach is an emerging mechanism, dubbed Augmented Reality with Knowledge Inference Interaction (ArK), which leverages knowledge-memory to generate scenes in unseen physical world and virtual reality environments. The knowledge interactive emergent ability (Figure 1) is demonstrated as the observation learns i) micro-action of cross-modality: in multi-modality models to collect a large amount of relevant knowledge memory data for each interaction task (e.g., unseen scene understanding) from the physical reality; and ii) macro-behavior of reality-agnostic: in mix-reality environments to improve interactions that tailor to different characterized roles, target variables, collaborative information, and so on. We validate the effectiveness of ArK on the scene generation and editing tasks. We show that our ArK approach, combined with large foundation models, significantly improves the quality of generated 2D/3D scenes, compared to baselines, demonstrating the potential benefit of incorporating ArK in generative AI for applications such as metaverse and gaming simulation.

Viaarxiv icon

A General Purpose Neural Architecture for Geospatial Systems

Nov 04, 2022
Nasim Rahaman, Martin Weiss, Frederik Träuble, Francesco Locatello, Alexandre Lacoste, Yoshua Bengio, Chris Pal, Li Erran Li, Bernhard Schölkopf

Figure 1 for A General Purpose Neural Architecture for Geospatial Systems
Figure 2 for A General Purpose Neural Architecture for Geospatial Systems
Figure 3 for A General Purpose Neural Architecture for Geospatial Systems
Figure 4 for A General Purpose Neural Architecture for Geospatial Systems

Geospatial Information Systems are used by researchers and Humanitarian Assistance and Disaster Response (HADR) practitioners to support a wide variety of important applications. However, collaboration between these actors is difficult due to the heterogeneous nature of geospatial data modalities (e.g., multi-spectral images of various resolutions, timeseries, weather data) and diversity of tasks (e.g., regression of human activity indicators or detecting forest fires). In this work, we present a roadmap towards the construction of a general-purpose neural architecture (GPNA) with a geospatial inductive bias, pre-trained on large amounts of unlabelled earth observation data in a self-supervised manner. We envision how such a model may facilitate cooperation between members of the community. We show preliminary results on the first step of the roadmap, where we instantiate an architecture that can process a wide variety of geospatial data modalities and demonstrate that it can achieve competitive performance with domain-specific architectures on tasks relating to the U.N.'s Sustainable Development Goals.

* Presented at AI + HADR Workshop at NeurIPS 2022 
Viaarxiv icon

Neural Attentive Circuits

Oct 19, 2022
Nasim Rahaman, Martin Weiss, Francesco Locatello, Chris Pal, Yoshua Bengio, Bernhard Schölkopf, Li Erran Li, Nicolas Ballas

Figure 1 for Neural Attentive Circuits
Figure 2 for Neural Attentive Circuits
Figure 3 for Neural Attentive Circuits
Figure 4 for Neural Attentive Circuits

Recent work has seen the development of general purpose neural architectures that can be trained to perform tasks across diverse data modalities. General purpose models typically make few assumptions about the underlying data-structure and are known to perform well in the large-data regime. At the same time, there has been growing interest in modular neural architectures that represent the data using sparsely interacting modules. These models can be more robust out-of-distribution, computationally efficient, and capable of sample-efficient adaptation to new data. However, they tend to make domain-specific assumptions about the data, and present challenges in how module behavior (i.e., parameterization) and connectivity (i.e., their layout) can be jointly learned. In this work, we introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs) that jointly learns the parameterization and a sparse connectivity of neural modules without using domain knowledge. NACs are best understood as the combination of two systems that are jointly trained end-to-end: one that determines the module configuration and the other that executes it on an input. We demonstrate qualitatively that NACs learn diverse and meaningful module configurations on the NLVR2 dataset without additional supervision. Quantitatively, we show that by incorporating modularity in this way, NACs improve upon a strong non-modular baseline in terms of low-shot adaptation on CIFAR and CUBs dataset by about 10%, and OOD robustness on Tiny ImageNet-R by about 2.5%. Further, we find that NACs can achieve an 8x speedup at inference time while losing less than 3% performance. Finally, we find NACs to yield competitive results on diverse data modalities spanning point-cloud classification, symbolic processing and text-classification from ASCII bytes, thereby confirming its general purpose nature.

* To appear at NeurIPS 2022 
Viaarxiv icon

Workflow Discovery from Dialogues in the Low Data Regime

May 24, 2022
Amine El Hattami, Stefania Raimondo, Issam Laradji, David Vazquez, Pau Rodriguez, Chris Pal

Figure 1 for Workflow Discovery from Dialogues in the Low Data Regime
Figure 2 for Workflow Discovery from Dialogues in the Low Data Regime
Figure 3 for Workflow Discovery from Dialogues in the Low Data Regime
Figure 4 for Workflow Discovery from Dialogues in the Low Data Regime

Text-based dialogues are now widely used to solve real-world problems. In cases where solution strategies are already known, they can sometimes be codified into workflows and used to guide humans or artificial agents through the task of helping clients. We are interested in the situation where a formal workflow may not yet exist, but we wish to discover the steps of actions that have been taken to resolve problems. We examine a novel transformer-based approach for this situation and we present experiments where we summarize dialogues in the Action-Based Conversations Dataset (ABCD) with workflows. Since the ABCD dialogues were generated using known workflows to guide agents we can evaluate our ability to extract such workflows using ground truth sequences of action steps, organized as workflows. We propose and evaluate an approach that conditions models on the set of allowable action steps and we show that using this strategy we can improve workflow discovery (WD) performance. Our conditioning approach also improves zero-shot and few-shot WD performance when transferring learned models to entirely new domains (i.e. the MultiWOZ setting). Further, a modified variant of our architecture achieves state-of-the-art performance on the related but different problems of Action State Tracking (AST) and Cascading Dialogue Success (CDS) on the ABCD.

Viaarxiv icon

Learned Image Compression for Machine Perception

Nov 03, 2021
Felipe Codevilla, Jean Gabriel Simard, Ross Goroshin, Chris Pal

Figure 1 for Learned Image Compression for Machine Perception
Figure 2 for Learned Image Compression for Machine Perception
Figure 3 for Learned Image Compression for Machine Perception
Figure 4 for Learned Image Compression for Machine Perception

Recent work has shown that learned image compression strategies can outperform standard hand-crafted compression algorithms that have been developed over decades of intensive research on the rate-distortion trade-off. With growing applications of computer vision, high quality image reconstruction from a compressible representation is often a secondary objective. Compression that ensures high accuracy on computer vision tasks such as image segmentation, classification, and detection therefore has the potential for significant impact across a wide variety of settings. In this work, we develop a framework that produces a compression format suitable for both human perception and machine perception. We show that representations can be learned that simultaneously optimize for compression and performance on core vision tasks. Our approach allows models to be trained directly from compressed representations, and this approach yields increased performance on new tasks and in low-shot learning settings. We present results that improve upon segmentation and detection performance compared to standard high quality JPGs, but with representations that are four to ten times smaller in terms of bits per pixel. Further, unlike naive compression methods, at a level ten times smaller than standard JEPGs, segmentation and detection models trained from our format suffer only minor degradation in performance.

* 13 pages, 6 figures 
Viaarxiv icon

Learning Multi-Objective Curricula for Deep Reinforcement Learning

Oct 06, 2021
Jikun Kang, Miao Liu, Abhinav Gupta, Chris Pal, Xue Liu, Jie Fu

Figure 1 for Learning Multi-Objective Curricula for Deep Reinforcement Learning
Figure 2 for Learning Multi-Objective Curricula for Deep Reinforcement Learning
Figure 3 for Learning Multi-Objective Curricula for Deep Reinforcement Learning
Figure 4 for Learning Multi-Objective Curricula for Deep Reinforcement Learning

Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL). They are designed to control how a DRL agent collects data, which is inspired by how humans gradually adapt their learning processes to their capabilities. For example, ACL can be used for subgoal generation, reward shaping, environment generation, or initial state generation. However, prior work only considers curriculum learning following one of the aforementioned predefined paradigms. It is unclear which of these paradigms are complementary, and how the combination of them can be learned from interactions with the environment. Therefore, in this paper, we propose a unified automatic curriculum learning framework to create multi-objective but coherent curricula that are generated by a set of parametric curriculum modules. Each curriculum module is instantiated as a neural network and is responsible for generating a particular curriculum. In order to coordinate those potentially conflicting modules in unified parameter space, we propose a multi-task hyper-net learning framework that uses a single hyper-net to parameterize all those curriculum modules. In addition to existing hand-designed curricula paradigms, we further design a flexible memory mechanism to learn an abstract curriculum, which may otherwise be difficult to design manually. We evaluate our method on a series of robotic manipulation tasks and demonstrate its superiority over other state-of-the-art ACL methods in terms of sample efficiency and final performance.

Viaarxiv icon

Accounting for Variance in Machine Learning Benchmarks

Mar 01, 2021
Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Naz Sepah, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Dmitriy Serdyuk, Tal Arbel, Chris Pal, Gaël Varoquaux, Pascal Vincent

Figure 1 for Accounting for Variance in Machine Learning Benchmarks
Figure 2 for Accounting for Variance in Machine Learning Benchmarks
Figure 3 for Accounting for Variance in Machine Learning Benchmarks
Figure 4 for Accounting for Variance in Machine Learning Benchmarks

Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the learning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices. This is prohibitively expensive, and corners are cut to reach conclusions. We model the whole benchmarking process, revealing that variance due to data sampling, parameter initialization and hyperparameter choice impact markedly the results. We analyze the predominant comparison methods used today in the light of this variance. We show a counter-intuitive result that adding more sources of variation to an imperfect estimator approaches better the ideal estimator at a 51 times reduction in compute cost. Building on these results, we study the error rate of detecting improvements, on five different deep-learning tasks/architectures. This study leads us to propose recommendations for performance comparisons.

* Submitted to MLSys2021 
Viaarxiv icon

Predicting Infectiousness for Proactive Contact Tracing

Oct 23, 2020
Yoshua Bengio, Prateek Gupta, Tegan Maharaj, Nasim Rahaman, Martin Weiss, Tristan Deleu, Eilif Muller, Meng Qu, Victor Schmidt, Pierre-Luc St-Charles, Hannah Alsdurf, Olexa Bilanuik, David Buckeridge, Gáetan Marceau Caron, Pierre-Luc Carrier, Joumana Ghosn, Satya Ortiz-Gagne, Chris Pal, Irina Rish, Bernhard Schölkopf, Abhinav Sharma, Jian Tang, Andrew Williams

Figure 1 for Predicting Infectiousness for Proactive Contact Tracing
Figure 2 for Predicting Infectiousness for Proactive Contact Tracing
Figure 3 for Predicting Infectiousness for Proactive Contact Tracing
Figure 4 for Predicting Infectiousness for Proactive Contact Tracing

The COVID-19 pandemic has spread rapidly worldwide, overwhelming manual contact tracing in many countries and resulting in widespread lockdowns for emergency containment. Large-scale digital contact tracing (DCT) has emerged as a potential solution to resume economic and social activity while minimizing spread of the virus. Various DCT methods have been proposed, each making trade-offs between privacy, mobility restrictions, and public health. The most common approach, binary contact tracing (BCT), models infection as a binary event, informed only by an individual's test results, with corresponding binary recommendations that either all or none of the individual's contacts quarantine. BCT ignores the inherent uncertainty in contacts and the infection process, which could be used to tailor messaging to high-risk individuals, and prompt proactive testing or earlier warnings. It also does not make use of observations such as symptoms or pre-existing medical conditions, which could be used to make more accurate infectiousness predictions. In this paper, we use a recently-proposed COVID-19 epidemiological simulator to develop and test methods that can be deployed to a smartphone to locally and proactively predict an individual's infectiousness (risk of infecting others) based on their contact history and other information, while respecting strong privacy constraints. Predictions are used to provide personalized recommendations to the individual via an app, as well as to send anonymized messages to the individual's contacts, who use this information to better predict their own infectiousness, an approach we call proactive contact tracing (PCT). We find a deep-learning based PCT method which improves over BCT for equivalent average mobility, suggesting PCT could help in safe re-opening and second-wave prevention.

Viaarxiv icon

DuoRAT: Towards Simpler Text-to-SQL Models

Oct 21, 2020
Torsten Scholak, Raymond Li, Dzmitry Bahdanau, Harm de Vries, Chris Pal

Figure 1 for DuoRAT: Towards Simpler Text-to-SQL Models
Figure 2 for DuoRAT: Towards Simpler Text-to-SQL Models
Figure 3 for DuoRAT: Towards Simpler Text-to-SQL Models
Figure 4 for DuoRAT: Towards Simpler Text-to-SQL Models

Recent research has shown that neural text-to-SQL models can effectively translate natural language questions into corresponding SQL queries on unseen databases. Working mostly on the Spider dataset, researchers have been proposing increasingly sophisticated modelling approaches to the problem. Contrary to this trend, in this paper we identify the aspects in which text-to-SQL models can be simplified. We begin by building DuoRAT, a re-implementation of the state-of-the-art RAT-SQL model that unlike RAT-SQL is using only relation-aware or vanilla transformers as the building blocks. We perform several ablation experiments using DuoRAT as the baseline model. Our experiments confirm the usefulness of some of the techniques and point out the redundancy of others, including structural SQL features and features that link the question with the schema.

* Code is available at https://github.com/ElementAI/duorat 
Viaarxiv icon

Medical Imaging with Deep Learning: MIDL 2020 -- Short Paper Track

Jun 29, 2020
Tal Arbel, Ismail Ben Ayed, Marleen de Bruijne, Maxime Descoteaux, Herve Lombaert, Chris Pal

This compendium gathers all the accepted extended abstracts from the Third International Conference on Medical Imaging with Deep Learning (MIDL 2020), held in Montreal, Canada, 6-9 July 2020. Note that only accepted extended abstracts are listed here, the Proceedings of the MIDL 2020 Full Paper Track are published in the Proceedings of Machine Learning Research (PMLR).

* Accepted extended abstracts can also be found at https://openreview.net/group?id=MIDL.io/2020/Conference#abstract-accept-papers 
Viaarxiv icon