Building models of natural language processing (NLP) is challenging in low-resource scenarios where only limited data are available. Optimization-based meta-learning algorithms achieve promising results in low-resource scenarios by adapting a well-generalized model initialization to handle new tasks. Nonetheless, these approaches suffer from the memorization overfitting issue, where the model tends to memorize the meta-training tasks while ignoring support sets when adapting to new tasks. To address this issue, we propose a memory imitation meta-learning (MemIML) method that enhances the model's reliance on support sets for task adaptation. Specifically, we introduce a task-specific memory module to store support set information and construct an imitation module to force query sets to imitate the behaviors of some representative support-set samples stored in the memory. A theoretical analysis is provided to prove the effectiveness of our method, and empirical results also demonstrate that our method outperforms competitive baselines on both text classification and generation tasks.
Embedding real-world networks presents challenges because it is not clear how to identify their latent geometries. Embedding some disassortative networks, such as scale-free networks, to the Euclidean space has been shown to incur distortions. Embedding scale-free networks to hyperbolic spaces offer an exciting alternative but incurs distortions when embedding assortative networks with latent geometries not hyperbolic. We propose an inductive model that leverages both the expressiveness of GCNs and trivial bundle to learn inductive node representations for networks with or without node features. A trivial bundle is a simple case of fiber bundles,a space that is globally a product space of its base space and fiber. The coordinates of base space and those of fiber can be used to express the assortative and disassortative factors in generating edges. Therefore, the model has the ability to learn embeddings that can express those factors. In practice, it reduces errors for link prediction and node classification when compared to the Euclidean and hyperbolic GCNs.
Personalized conversation models (PCMs) generate responses according to speaker preferences. Existing personalized conversation tasks typically require models to extract speaker preferences from user descriptions or their conversation histories, which are scarce for newcomers and inactive users. In this paper, we propose a few-shot personalized conversation task with an auxiliary social network. The task requires models to generate personalized responses for a speaker given a few conversations from the speaker and a social network. Existing methods are mainly designed to incorporate descriptions or conversation histories. Those methods can hardly model speakers with so few conversations or connections between speakers. To better cater for newcomers with few resources, we propose a personalized conversation model (PCM) that learns to adapt to new speakers as well as enabling new speakers to learn from resource-rich speakers. Particularly, based on a meta-learning based PCM, we propose a task aggregator (TA) to collect other speakers' information from the social network. The TA provides prior knowledge of the new speaker in its meta-learning. Experimental results show our methods outperform all baselines in appropriateness, diversity, and consistency with speakers.
Model-Agnostic Meta-Learning (MAML), a model-agnostic meta-learning method, is successfully employed in NLP applications including few-shot text classification and multi-domain low-resource language generation. Many impacting factors, including data quantity, similarity among tasks, and the balance between general language model and task-specific adaptation, can affect the performance of MAML in NLP, but few works have thoroughly studied them. In this paper, we conduct an empirical study to investigate these impacting factors and conclude when MAML works the best based on the experimental results.
Neural conversation models are known to generate appropriate but non-informative responses in general. A scenario where informativeness can be significantly enhanced is Conversing by Reading (CbR), where conversations take place with respect to a given external document. In previous work, the external document is utilized by (1) creating a context-aware document memory that integrates information from the document and the conversational context, and then (2) generating responses referring to the memory. In this paper, we propose to create the document memory with some anticipated responses in mind. This is achieved using a teacher-student framework. The teacher is given the external document, the context, and the ground-truth response, and learns how to build a response-aware document memory from three sources of information. The student learns to construct a response-anticipated document memory from the first two sources, and the teacher's insight on memory creation. Empirical results show that our model outperforms the previous state-of-the-art for the CbR task.
Mortality prediction of diverse rare diseases using electronic health record (EHR) data is a crucial task for intelligent healthcare. However, data insufficiency and the clinical diversity of rare diseases make it hard for directly training deep learning models on individual disease data or all the data from different diseases. Mortality prediction for these patients with different diseases can be viewed as a multi-task learning problem with insufficient data and large task number. But the tasks with little training data also make it hard to train task-specific modules in multi-task learning models. To address the challenges of data insufficiency and task diversity, we propose an initialization-sharing multi-task learning method (Ada-Sit) which learns the parameter initialization for fast adaptation to dynamically measured similar tasks. We use Ada-Sit to train long short-term memory networks (LSTM) based prediction models on longitudinal EHR data. And experimental results demonstrate that the proposed model is effective for mortality prediction of diverse rare diseases.
* 10 pages, 3 Figures, submitted to AMIA Annual Symposium
Personalized conversation systems have received increasing attention recently. Existing personalized conversation models tend to employ the meta-learning framework that first finds the initial parameters, then fine-tunes on a few personal utterances. However, fine-tuning can only make slight changes to the initial parameters, resulting in similar language models for different users. In this paper, we propose to customize a conversation model with unique network structures for each user. Concretely, we introduce a private network to the language model, whose structure will evolve during training to better capture the unique characteristics of the user. The private network is only trained on the corpora of the corresponding user, and similar users can share partial private structure for data reuse purpose. Experiment results show that our algorithm excels all the baselines in terms of personality, quality, and diversity measurement.
Open-domain human-computer conversation has attracted much attention in the field of NLP. Contrary to rule- or template-based domain-specific dialog systems, open-domain conversation usually requires data-driven approaches, which can be roughly divided into two categories: retrieval-based and generation-based systems. Retrieval systems search a user-issued utterance (called a query) in a large database, and return a reply that best matches the query. Generative approaches, typically based on recurrent neural networks (RNNs), can synthesize new replies, but they suffer from the problem of generating short, meaningless utterances. In this paper, we propose a novel ensemble of retrieval-based and generation-based dialog systems in the open domain. In our approach, the retrieved candidate, in addition to the original query, is fed to an RNN-based reply generator, so that the neural model is aware of more information. The generated reply is then fed back as a new candidate for post-reranking. Experimental results show that such ensemble outperforms each single part of it by a large margin.
Using neural networks to generate replies in human-computer dialogue systems is attracting increasing attention over the past few years. However, the performance is not satisfactory: the neural network tends to generate safe, universally relevant replies which carry little meaning. In this paper, we propose a content-introducing approach to neural network-based generative dialogue systems. We first use pointwise mutual information (PMI) to predict a noun as a keyword, reflecting the main gist of the reply. We then propose seq2BF, a "sequence to backward and forward sequences" model, which generates a reply containing the given keyword. Experimental results show that our approach significantly outperforms traditional sequence-to-sequence models in terms of human evaluation and the entropy measure, and that the predicted keyword can appear at an appropriate position in the reply.