Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study

Jan 12, 2020
Jinlan Fu, Pengfei Liu, Qi Zhang, Xuanjing Huang

While neural network-based models have achieved impressive performance on a large body of NLP tasks, the generalization behavior of different models remains poorly understood: Does this excellent performance imply a perfect generalization model, or are there still some limitations? In this paper, we take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives and characterize the differences of their generalization abilities through the lens of our proposed measures, which guides us to better design models and training methods. Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models in terms of breakdown performance analysis, annotation errors, dataset bias, and category relationships, which suggest directions for improvement. We have released the datasets: (ReCoNLL, PLONER) for the future research at our project page: As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers and classifies them into different research topics:

* Accepted by AAAI 2020 

  Access Paper or Ask Questions

Exploring Context, Attention and Audio Features for Audio Visual Scene-Aware Dialog

Dec 20, 2019
Shachi H Kumar, Eda Okur, Saurav Sahay, Jonathan Huang, Lama Nachman

We are witnessing a confluence of vision, speech and dialog system technologies that are enabling the IVAs to learn audio-visual groundings of utterances and have conversations with users about the objects, activities and events surrounding them. Recent progress in visual grounding techniques and Audio Understanding are enabling machines to understand shared semantic concepts and listen to the various sensory events in the environment. With audio and visual grounding methods, end-to-end multimodal SDS are trained to meaningfully communicate with us in natural language about the real dynamic audio-visual sensory world around us. In this work, we explore the role of `topics' as the context of the conversation along with multimodal attention into such an end-to-end audio-visual scene-aware dialog system architecture. We also incorporate an end-to-end audio classification ConvNet, AclNet, into our models. We develop and test our approaches on the Audio Visual Scene-Aware Dialog (AVSD) dataset released as a part of the DSTC7. We present the analysis of our experiments and show that some of our model variations outperform the baseline system released for AVSD.

* Presented at the Visual Question Answering and Dialog Workshop, CVPR 2019, Long Beach, USA 

  Access Paper or Ask Questions

MetalGAN: Multi-Domain Label-Less Image Synthesis Using cGANs and Meta-Learning

Dec 05, 2019
Tomaso Fontanini, Eleonora Iotti, Luca Donati, Andrea Prati

Image synthesis is currently one of the most addressed image processing topic in computer vision and deep learning fields of study. Researchers have tackled this problem focusing their efforts on its several challenging problems, e.g. image quality and size, domain and pose changing, architecture of the networks, and so on. Above all, producing images belonging to different domains by using a single architecture is a very relevant goal for image generation. In fact, a single multi-domain network would allow greater flexibility and robustness in the image synthesis task than other approaches. This paper proposes a novel architecture and a training algorithm, which are able to produce multi-domain outputs using a single network. A small portion of a dataset is intentionally used, and there are no hard-coded labels (or classes). This is achieved by combining a conditional Generative Adversarial Network (cGAN) for image generation and a Meta-Learning algorithm for domain switch, and we called our approach MetalGAN. The approach has proved to be appropriate for solving the multi-domain problem and it is validated on facial attribute transfer, using CelebA dataset.

  Access Paper or Ask Questions

On the Proof of Fixed-Point Convergence for Plug-and-Play ADMM

Oct 31, 2019
Ruturaj G. Gavaskar, Kunal N. Chaudhury

In most state-of-the-art image restoration methods, the sum of a data-fidelity and a regularization term is optimized using an iterative algorithm such as ADMM (alternating direction method of multipliers). In recent years, the possibility of using denoisers for regularization has been explored in several works. A popular approach is to formally replace the proximal operator within the ADMM framework with some powerful denoiser. However, since most state-of-the-art denoisers cannot be posed as a proximal operator, one cannot guarantee the convergence of these so-called plug-and-play (PnP) algorithms. In fact, the theoretical convergence of PnP algorithms is an active research topic. In this letter, we consider the result of Chan et al. (IEEE TCI, 2017), where fixed-point convergence of an ADMM-based PnP algorithm was established for a class of denoisers. We argue that the original proof is incomplete, since convergence is not analyzed for one of the three possible cases outlined in the paper. Moreover, we explain why the argument for the other cases does not apply in this case. We give a different analysis to fill this gap, which firmly establishes the original convergence theorem.

* Accepted in IEEE Signal Processing Letters 

  Access Paper or Ask Questions

A Closer Look At Feature Space Data Augmentation For Few-Shot Intent Classification

Oct 09, 2019
Varun Kumar, Hadrien Glaude, Cyprien de Lichy, William Campbell

New conversation topics and functionalities are constantly being added to conversational AI agents like Amazon Alexa and Apple Siri. As data collection and annotation is not scalable and is often costly, only a handful of examples for the new functionalities are available, which results in poor generalization performance. We formulate it as a Few-Shot Integration (FSI) problem where a few examples are used to introduce a new intent. In this paper, we study six feature space data augmentation methods to improve classification performance in FSI setting in combination with both supervised and unsupervised representation learning methods such as BERT. Through realistic experiments on two public conversational datasets, SNIPS, and the Facebook Dialog corpus, we show that data augmentation in feature space provides an effective way to improve intent classification performance in few-shot setting beyond traditional transfer learning approaches. In particular, we show that (a) upsampling in latent space is a competitive baseline for feature space augmentation (b) adding the difference between two examples to a new example is a simple yet effective data augmentation method.

* Accepted at Deep Learning for low-resource NLP workshop @ EMNLP 2019 

  Access Paper or Ask Questions

Improved Patient Classification with Language Model Pretraining Over Clinical Notes

Oct 02, 2019
Jonas Kemp, Alvin Rajkomar, Andrew M. Dai

Clinical notes in electronic health records contain highly heterogeneous writing styles, including non-standard terminology or abbreviations. Using these notes in predictive modeling has traditionally required preprocessing (e.g. taking frequent terms or topic modeling) that removes much of the richness of the source data. We propose a pretrained hierarchical recurrent neural network model that parses minimally processed clinical notes in an intuitive fashion, and show that it improves performance for multiple classification tasks on the Medical Information Mart for Intensive Care III (MIMIC-III) dataset, improving top-5 recall to 89.7% (increase of 4.8%) for primary diagnosis classification and AUPRC to 35.2% (increase of 2.1%) for multilabel diagnosis classification compared to models that treat the notes as an unordered collection of terms, using no pretraining. We also apply an attribution technique to several examples to identify the words and the nearby context that the model uses to make its prediction, and show the importance of the words' context.

* Accepted at NeurIPS ML4H 2019, extended abstract track 

  Access Paper or Ask Questions

Comparison Study of Well-Known Inverted Pendulum Models for Balance Recovery in Humanoid Robot

Jun 05, 2019
Mohammadreza Kasaei, Nuno Lau, Artur Pereira

Bipedal robots are essentially unstable because of their complex kinematics as well as high dimensional state space dynamics, hence control and generation of stable walking is a complex subject and still one of the active topics in the robotic community. Nowadays, there are many humanoids performing stable walking, but fewer show effective push recovery under pushes. In this paper, we firstly review more common used abstract dynamics models for a humanoid robot which are based on the inverted pendulum and show how these models can be used to provide walking for a humanoid robot and also how a hierarchical control structure could fade the complexities of a humanoid walking. Secondly, the reviewed models are compared together not only in an analytical manner but also by performing several numerical simulations in a push recovery scenario using \mbox{MATLAB}. These theoretical and simulation studies quantitatively compare these models regarding regaining balance. The results showed that the enhanced version of Linear Inverted Pendulum Plus Flywheel is the ablest dynamics model to regain the stability of the robot even in very challenging situations.

  Access Paper or Ask Questions

Fast Prototyping a Dialogue Comprehension System for Nurse-Patient Conversations on Symptom Monitoring

Apr 05, 2019
Zhengyuan Liu, Hazel Lim, Nur Farah Ain Binte Suhaimi, Shao Chuen Tong, Sharon Ong, Angela Ng, Sheldon Lee, Michael R. Macdonald, Savitha Ramasamy, Pavitra Krishnaswamy, Wai Leng Chow, Nancy F. Chen

Data for human-human spoken dialogues for research and development are currently very limited in quantity, variety, and sources; such data are even scarcer in healthcare. In this work, we investigate fast prototyping of a dialogue comprehension system by leveraging on minimal nurse-to-patient conversations. We propose a framework inspired by nurse-initiated clinical symptom monitoring conversations to construct a simulated human-human dialogue dataset, embodying linguistic characteristics of spoken interactions like thinking aloud, self-contradiction, and topic drift. We then adopt an established bidirectional attention pointer network on this simulated dataset, achieving more than 80% F1 score on a held-out test set from real-world nurse-to-patient conversations. The ability to automatically comprehend conversations in the healthcare domain by exploiting only limited data has implications for improving clinical workflows through red flag symptom detection and triaging capabilities. We demonstrate the feasibility for efficient and effective extraction, retrieval and comprehension of symptom checking information discussed in multi-turn human-human spoken conversations.

* 8 pages. To appear in NAACL 2019 

  Access Paper or Ask Questions

Improving Dialogue State Tracking by Discerning the Relevant Context

Apr 04, 2019
Sanuj Sharma, Prafulla Kumar Choubey, Ruihong Huang

A typical conversation comprises of multiple turns between participants where they go back-and-forth between different topics. At each user turn, dialogue state tracking (DST) aims to estimate user's goal by processing the current utterance. However, in many turns, users implicitly refer to the previous goal, necessitating the use of relevant dialogue history. Nonetheless, distinguishing relevant history is challenging and a popular method of using dialogue recency for that is inefficient. We, therefore, propose a novel framework for DST that identifies relevant historical context by referring to the past utterances where a particular slot-value changes and uses that together with weighted system utterance to identify the relevant context. Specifically, we use the current user utterance and the most recent system utterance to determine the relevance of a system utterance. Empirical analyses show that our method improves joint goal accuracy by 2.75% and 2.36% on WoZ 2.0 and MultiWoZ 2.0 restaurant domain datasets respectively over the previous state-of-the-art GLAD model.

* NAACL 2019 

  Access Paper or Ask Questions

Feature-Critic Networks for Heterogeneous Domain Generalization

Jan 31, 2019
Yiying Li, Yongxin Yang, Wei Zhou, Timothy M. Hospedales

The well known domain shift issue causes model performance to degrade when deployed to a new target domain with different statistics to training. Domain adaptation techniques alleviate this, but need some instances from the target domain to drive adaptation. Domain generalization is the recently topical problem of learning a model that generalizes to unseen domains out of the box, without accessing any target data. Various domain generalization approaches aim to train a domain-invariant feature extractor, typically by adding some manually designed losses. In this work, we propose a learning to learn approach, where the auxiliary loss that helps generalization is itself learned. This approach is conceptually simple and flexible, and leads to considerable improvement in robustness to domain shift. Beyond conventional domain generalization, we consider a more challenging setting of heterogeneous domain generalization, where the unseen domains do not share label space with the seen ones, and the goal is to train a feature which is useful off-the-shelf for novel data and novel categories. Experimental evaluation demonstrates that our method outperforms state-of-the-art solutions in both settings.

  Access Paper or Ask Questions