Medical coding translates professionally written medical reports into standardized codes, which is an essential part of medical information systems and health insurance reimbursement. Manual coding by trained human coders is time-consuming and error-prone. Thus, automated coding algorithms have been developed, building especially on the recent advances in machine learning and deep neural networks. To solve the challenges of encoding lengthy and noisy clinical documents and capturing code associations, we propose a multitask recalibrated aggregation network. In particular, multitask learning shares information across different coding schemes and captures the dependencies between different medical codes. Feature recalibration and aggregation in shared modules enhance representation learning for lengthy notes. Experiments with a real-world MIMIC-III dataset show significantly improved predictive performance.
Using deep latent variable models in causal inference has attracted considerable interest recently, but an essential open question is their identifiability. While they have yielded promising results and theory exists on the identifiability of some simple model formulations, we also know that causal effects cannot be identified in general with latent variables. We investigate this gap between theory and empirical results with theoretical considerations and extensive experiments under multiple synthetic and real-world data sets, using the causal effect variational autoencoder (CEVAE) as a case study. While CEVAE seems to work reliably under some simple scenarios, it does not identify the correct causal effect with a misspecified latent variable or a complex data distribution, as opposed to the original goals of the model. Our results show that the question of identifiability cannot be disregarded, and we argue that more attention should be paid to it in future work.
Unsupervised pretraining is an integral part of many natural language processing systems, and transfer learning with language models has achieved remarkable results in many downstream tasks. In the clinical application of medical code assignment, diagnosis and procedure codes are inferred from lengthy clinical notes such as hospital discharge summaries. However, it is not clear if pretrained models are useful for medical code prediction without further architecture engineering. This paper conducts a comprehensive quantitative analysis of various contextualized language models' performance, pretrained in different domains, for medical code assignment from clinical notes. We propose a hierarchical fine-tuning architecture to capture interactions between distant words and adopt label-wise attention to exploit label information. Contrary to current trends, we demonstrate that a carefully trained classical CNN outperforms attention-based models on a MIMIC-III subset with frequent codes. Our empirical findings suggest directions for improving the medical code assignment application.
Medical code assignment from clinical text is a fundamental task in clinical information system management. As medical notes are typically lengthy and the medical coding system's code space is large, this task is a long-standing challenge. Recent work applies deep neural network models to encode the medical notes and assign medical codes to clinical documents. However, these methods are still ineffective as they do not fully encode and capture the lengthy and rich semantic information of medical notes nor explicitly exploit the interactions between the notes and codes. We propose a novel method, gated convolutional neural networks, and a note-code interaction (GatedCNN-NCI), for automatic medical code assignment to overcome these challenges. Our methods capture the rich semantic information of the lengthy clinical text for better representation by utilizing embedding injection and gated information propagation in the medical note encoding module. With a novel note-code interaction design and a graph message passing mechanism, we explicitly capture the underlying dependency between notes and codes, enabling effective code prediction. A weight sharing scheme is further designed to decrease the number of trainable parameters. Empirical experiments on real-world clinical datasets show that our proposed model outperforms state-of-the-art models in most cases, and our model size is on par with light-weighted baselines.
Medical code assignment, which predicts medical codes from clinical texts, is a fundamental task of intelligent medical information systems. The emergence of deep models in natural language processing has boosted the development of automatic assignment methods. However, recent advanced neural architectures with flat convolutions or multi-channel feature concatenation ignore the sequential causal constraint within a text sequence and may not learn meaningful clinical text representations, especially for lengthy clinical notes with long-term sequential dependency. This paper proposes a Dilated Convolutional Attention Network (DCAN), integrating dilated convolutions, residual connections, and label attention, for medical code assignment. It adopts dilated convolutions to capture complex medical patterns with a receptive field which increases exponentially with dilation size. Experiments on a real-world clinical dataset empirically show that our model improves the state of the art.
Encoding domain knowledge into the prior over the high-dimensional weight space is challenging in Bayesian neural networks. Two types of domain knowledge are commonly available in scientific applications: 1. feature sparsity (number of relevant features); 2. signal-to-noise ratio, quantified, for instance, as the proportion of variance explained (PVE). We show both types of domain knowledge can be encoded into the widely used Gaussian scale mixture priors with Automatic Relevance Determination. Specifically, we propose a new joint prior over the local (i.e., feature-specific) scale parameters to encode the knowledge about feature sparsity, and an algorithm to determine the global scale parameter (shared by all features) according to the PVE. Empirically, we show that the proposed informative prior improves prediction accuracy on publicly available datasets and in a genetics application.
Human knowledge provides a formal understanding of the world. Knowledge graphs that represent structural relations between entities have become an increasingly popular research direction towards cognition and human-level intelligence. In this survey, we provide a comprehensive review on knowledge graph covering overall research topics about 1) knowledge graph representation learning, 2) knowledge acquisition and completion, 3) temporal knowledge graph, and 4) knowledge-aware applications, and summarize recent breakthroughs and perspective directions to facilitate future research. We propose a full-view categorization and new taxonomies on these topics. Knowledge graph embedding is organized from four aspects of representation space, scoring function, encoding models and auxiliary information. For knowledge acquisition, especially knowledge graph completion, embedding methods, path inference and logical rule reasoning are reviewed. We further explore several emerging topics including meta relational learning, commonsense reasoning, and temporal knowledge graphs. To facilitate future research on knowledge graphs, we also provide a curated collection of datasets and open-source libraries on different tasks. In the end, we have a thorough outlook on several promising research directions.
Surrogate models such as Gaussian processes (GP) have been proposed to accelerate approximate Bayesian computation (ABC) when the statistical model of interest is expensive-to-simulate. In one such promising framework the discrepancy between simulated and observed data is modelled with a GP. So far principled strategies have been proposed only for sequential selection of the simulation locations. To address this limitation, we develop Bayesian optimal design strategies to parallellise the expensive simulations. Current surrogate-based ABC methods also produce only a point estimate of the ABC posterior while there can be substantial additional uncertainty due to the limited budget of simulations. We also address the problem of quantifying the uncertainty of ABC posterior and discuss the connections between our resulting framework called Bayesian ABC, Bayesian quadrature (BQ) and Bayesian optimisation (BO). Experiments with several toy and real-world simulation models demonstrate advantages of the proposed techniques.
Estimating the effect of a treatment on a given outcome, conditioned on a vector of covariates, is central in many applications. However, learning the impact of a treatment on a continuous temporal response, when the covariates suffer extensively from measurement error and even the timing of the treatments is uncertain, has not been addressed. We introduce a novel data-driven method that can estimate treatment-response trajectories in this challenging scenario. We model personalized treatment-response curves as a combination of parametric response functions, hierarchically sharing information across individuals, and a sparse Gaussian process for the baseline trend. Importantly, our model considers measurement error not only in treatment covariates, but also in treatment times, a problem which arises in practice for example when treatment information is based on self-reporting. In a challenging and timely problem of estimating the impact of diet on continuous blood glucose measurements, our model leads to significant improvements in estimation accuracy and prediction.