Alert button
Picture for Ricardo Henao

Ricardo Henao

Alert button

Improving Event Time Prediction by Learning to Partition the Event Time Space

Oct 24, 2023
Jimmy Hickey, Ricardo Henao, Daniel Wojdyla, Michael Pencina, Matthew M. Engelhard

Recently developed survival analysis methods improve upon existing approaches by predicting the probability of event occurrence in each of a number pre-specified (discrete) time intervals. By avoiding placing strong parametric assumptions on the event density, this approach tends to improve prediction performance, particularly when data are plentiful. However, in clinical settings with limited available data, it is often preferable to judiciously partition the event time space into a limited number of intervals well suited to the prediction task at hand. In this work, we develop a method to learn from data a set of cut points defining such a partition. We show that in two simulated datasets, we are able to recover intervals that match the underlying generative model. We then demonstrate improved prediction performance on three real-world observational datasets, including a large, newly harmonized stroke risk prediction dataset. Finally, we argue that our approach facilitates clinical decision-making by suggesting time intervals that are most appropriate for each task, in the sense that they facilitate more accurate risk prediction.

* 16 pages, 5 figures, 2 tables 
Viaarxiv icon

InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding

Jun 08, 2023
Junda Wu, Tong Yu, Rui Wang, Zhao Song, Ruiyi Zhang, Handong Zhao, Chaochao Lu, Shuai Li, Ricardo Henao

Figure 1 for InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding
Figure 2 for InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding
Figure 3 for InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding
Figure 4 for InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding

Soft prompt tuning achieves superior performances across a wide range of few-shot tasks. However, the performances of prompt tuning can be highly sensitive to the initialization of the prompts. We also empirically observe that conventional prompt tuning methods cannot encode and learn sufficient task-relevant information from prompt tokens. In this work, we develop an information-theoretic framework that formulates soft prompt tuning as maximizing mutual information between prompts and other model parameters (or encoded representations). This novel view helps us to develop a more efficient, accurate and robust soft prompt tuning method InfoPrompt. With this framework, we develop two novel mutual information based loss functions, to (i) discover proper prompt initialization for the downstream tasks and learn sufficient task-relevant information from prompt tokens and (ii) encourage the output representation from the pretrained language model to be more aware of the task-relevant information captured in the learnt prompt. Extensive experiments validate that InfoPrompt can significantly accelerate the convergence of the prompt tuning and outperform traditional prompt tuning methods. Finally, we provide a formal theoretical result for showing to show that gradient descent type algorithm can be used to train our mutual information loss.

Viaarxiv icon

An Effective Meaningful Way to Evaluate Survival Models

Jun 01, 2023
Shi-ang Qi, Neeraj Kumar, Mahtab Farrokh, Weijie Sun, Li-Hao Kuan, Rajesh Ranganath, Ricardo Henao, Russell Greiner

Figure 1 for An Effective Meaningful Way to Evaluate Survival Models
Figure 2 for An Effective Meaningful Way to Evaluate Survival Models
Figure 3 for An Effective Meaningful Way to Evaluate Survival Models
Figure 4 for An Effective Meaningful Way to Evaluate Survival Models

One straightforward metric to evaluate a survival prediction model is based on the Mean Absolute Error (MAE) -- the average of the absolute difference between the time predicted by the model and the true event time, over all subjects. Unfortunately, this is challenging because, in practice, the test set includes (right) censored individuals, meaning we do not know when a censored individual actually experienced the event. In this paper, we explore various metrics to estimate MAE for survival datasets that include (many) censored individuals. Moreover, we introduce a novel and effective approach for generating realistic semi-synthetic survival datasets to facilitate the evaluation of metrics. Our findings, based on the analysis of the semi-synthetic datasets, reveal that our proposed metric (MAE using pseudo-observations) is able to rank models accurately based on their performance, and often closely matches the true MAE -- in particular, is better than several alternative methods.

* Accepted to ICML 2023 
Viaarxiv icon

Open World Classification with Adaptive Negative Samples

Mar 09, 2023
Ke Bai, Guoyin Wang, Jiwei Li, Sunghyun Park, Sungjin Lee, Puyang Xu, Ricardo Henao, Lawrence Carin

Figure 1 for Open World Classification with Adaptive Negative Samples
Figure 2 for Open World Classification with Adaptive Negative Samples
Figure 3 for Open World Classification with Adaptive Negative Samples
Figure 4 for Open World Classification with Adaptive Negative Samples

Open world classification is a task in natural language processing with key practical relevance and impact. Since the open or {\em unknown} category data only manifests in the inference phase, finding a model with a suitable decision boundary accommodating for the identification of known classes and discrimination of the open category is challenging. The performance of existing models is limited by the lack of effective open category data during the training stage or the lack of a good mechanism to learn appropriate decision boundaries. We propose an approach based on \underline{a}daptive \underline{n}egative \underline{s}amples (ANS) designed to generate effective synthetic open category samples in the training stage and without requiring any prior knowledge or external datasets. Empirically, we find a significant advantage in using auxiliary one-versus-rest binary classifiers, which effectively utilize the generated negative samples and avoid the complex threshold-seeking stage in previous works. Extensive experiments on three benchmark datasets show that ANS achieves significant improvements over state-of-the-art methods.

* Accepted by EMNLP 2021 (Main Track, Long Paper) 
Viaarxiv icon

Toward Fairness in Text Generation via Mutual Information Minimization based on Importance Sampling

Feb 25, 2023
Rui Wang, Pengyu Cheng, Ricardo Henao

Figure 1 for Toward Fairness in Text Generation via Mutual Information Minimization based on Importance Sampling
Figure 2 for Toward Fairness in Text Generation via Mutual Information Minimization based on Importance Sampling
Figure 3 for Toward Fairness in Text Generation via Mutual Information Minimization based on Importance Sampling
Figure 4 for Toward Fairness in Text Generation via Mutual Information Minimization based on Importance Sampling

Pretrained language models (PLMs), such as GPT2, have achieved remarkable empirical performance in text generation tasks. However, pretrained on large-scale natural language corpora, the generated text from PLMs may exhibit social bias against disadvantaged demographic groups. To improve the fairness of PLMs in text generation, we propose to minimize the mutual information between the semantics in the generated text sentences and their demographic polarity, i.e., the demographic group to which the sentence is referring. In this way, the mentioning of a demographic group (e.g., male or female) is encouraged to be independent from how it is described in the generated text, thus effectively alleviating the social bias. Moreover, we propose to efficiently estimate the upper bound of the above mutual information via importance sampling, leveraging a natural language corpus. We also propose a distillation mechanism that preserves the language modeling ability of the PLMs after debiasing. Empirical results on real-world benchmarks demonstrate that the proposed method yields superior performance in term of both fairness and language modeling ability.

Viaarxiv icon

Neural Insights for Digital Marketing Content Design

Feb 02, 2023
Fanjie Kong, Yuan Li, Houssam Nassif, Tanner Fiez, Shreya Chakrabarti, Ricardo Henao

Figure 1 for Neural Insights for Digital Marketing Content Design
Figure 2 for Neural Insights for Digital Marketing Content Design
Figure 3 for Neural Insights for Digital Marketing Content Design
Figure 4 for Neural Insights for Digital Marketing Content Design

In digital marketing, experimenting with new website content is one of the key levers to improve customer engagement. However, creating successful marketing content is a manual and time-consuming process that lacks clear guiding principles. This paper seeks to close the loop between content creation and online experimentation by offering marketers AI-driven actionable insights based on historical data to improve their creative process. We present a neural-network-based system that scores and extracts insights from a marketing content design, namely, a multimodal neural network predicts the attractiveness of marketing contents, and a post-hoc attribution method generates actionable insights for marketers to improve their content in specific marketing locations. Our insights not only point out the advantages and drawbacks of a given current content, but also provide design recommendations based on historical data. We show that our scoring model and insights work well both quantitatively and qualitatively.

Viaarxiv icon

Pushing the Efficiency Limit Using Structured Sparse Convolutions

Oct 23, 2022
Vinay Kumar Verma, Nikhil Mehta, Shijing Si, Ricardo Henao, Lawrence Carin

Figure 1 for Pushing the Efficiency Limit Using Structured Sparse Convolutions
Figure 2 for Pushing the Efficiency Limit Using Structured Sparse Convolutions
Figure 3 for Pushing the Efficiency Limit Using Structured Sparse Convolutions
Figure 4 for Pushing the Efficiency Limit Using Structured Sparse Convolutions

Weight pruning is among the most popular approaches for compressing deep convolutional neural networks. Recent work suggests that in a randomly initialized deep neural network, there exist sparse subnetworks that achieve performance comparable to the original network. Unfortunately, finding these subnetworks involves iterative stages of training and pruning, which can be computationally expensive. We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter. This leads to improved efficiency of convolutional architectures compared to existing methods that perform pruning at initialization. We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in ``efficient architectures.'' Extensive experiments on well-known CNN models and datasets show the effectiveness of the proposed method. Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.

* Accepted at the IEEE Winter Conference on Applications of Computer Vision, WACV 2023 
Viaarxiv icon

Toward Sustainable Continual Learning: Detection and Knowledge Repurposing of Similar Tasks

Oct 11, 2022
Sijia Wang, Yoojin Choi, Junya Chen, Mostafa El-Khamy, Ricardo Henao

Figure 1 for Toward Sustainable Continual Learning: Detection and Knowledge Repurposing of Similar Tasks
Figure 2 for Toward Sustainable Continual Learning: Detection and Knowledge Repurposing of Similar Tasks
Figure 3 for Toward Sustainable Continual Learning: Detection and Knowledge Repurposing of Similar Tasks
Figure 4 for Toward Sustainable Continual Learning: Detection and Knowledge Repurposing of Similar Tasks

Most existing works on continual learning (CL) focus on overcoming the catastrophic forgetting (CF) problem, with dynamic models and replay methods performing exceptionally well. However, since current works tend to assume exclusivity or dissimilarity among learning tasks, these methods require constantly accumulating task-specific knowledge in memory for each task. This results in the eventual prohibitive expansion of the knowledge repository if we consider learning from a long sequence of tasks. In this work, we introduce a paradigm where the continual learner gets a sequence of mixed similar and dissimilar tasks. We propose a new continual learning framework that uses a task similarity detection function that does not require additional learning, with which we analyze whether there is a specific task in the past that is similar to the current task. We can then reuse previous task knowledge to slow down parameter expansion, ensuring that the CL system expands the knowledge repository sublinearly to the number of learned tasks. Our experiments show that the proposed framework performs competitively on widely used computer vision benchmarks such as CIFAR10, CIFAR100, and EMNIST.

Viaarxiv icon

Capturing Actionable Dynamics with Structured Latent Ordinary Differential Equations

Feb 25, 2022
Paidamoyo Chapfuwa, Sherri Rose, Lawrence Carin, Edward Meeds, Ricardo Henao

Figure 1 for Capturing Actionable Dynamics with Structured Latent Ordinary Differential Equations
Figure 2 for Capturing Actionable Dynamics with Structured Latent Ordinary Differential Equations
Figure 3 for Capturing Actionable Dynamics with Structured Latent Ordinary Differential Equations
Figure 4 for Capturing Actionable Dynamics with Structured Latent Ordinary Differential Equations

End-to-end learning of dynamical systems with black-box models, such as neural ordinary differential equations (ODEs), provides a flexible framework for learning dynamics from data without prescribing a mathematical model for the dynamics. Unfortunately, this flexibility comes at the cost of understanding the dynamical system, for which ODEs are used ubiquitously. Further, experimental data are collected under various conditions (inputs), such as treatments, or grouped in some way, such as part of sub-populations. Understanding the effects of these system inputs on system outputs is crucial to have any meaningful model of a dynamical system. To that end, we propose a structured latent ODE model that explicitly captures system input variations within its latent representation. Building on a static latent variable specification, our model learns (independent) stochastic factors of variation for each input to the system, thus separating the effects of the system inputs in the latent space. This approach provides actionable modeling through the controlled generation of time-series data for novel input combinations (or perturbations). Additionally, we propose a flexible approach for quantifying uncertainties, leveraging a quantile regression formulation. Experimental results on challenging biological datasets show consistent improvements over competitive baselines in the controlled generation of observational data and prediction of biologically meaningful system inputs.

* Github code can be found at https://github.com/paidamoyo/structured_latent_ODEs 
Viaarxiv icon

Flexible Triggering Kernels for Hawkes Process Modeling

Feb 03, 2022
Yamac Alican Isik, Connor Davis, Paidamoyo Chapfuwa, Ricardo Henao

Figure 1 for Flexible Triggering Kernels for Hawkes Process Modeling
Figure 2 for Flexible Triggering Kernels for Hawkes Process Modeling
Figure 3 for Flexible Triggering Kernels for Hawkes Process Modeling
Figure 4 for Flexible Triggering Kernels for Hawkes Process Modeling

Recently proposed encoder-decoder structures for modeling Hawkes processes use transformer-inspired architectures, which encode the history of events via embeddings and self-attention mechanisms. These models deliver better prediction and goodness-of-fit than their RNN-based counterparts. However, they often require high computational and memory complexity requirements and sometimes fail to adequately capture the triggering function of the underlying process. So motivated, we introduce an efficient and general encoding of the historical event sequence by replacing the complex (multilayered) attention structures with triggering kernels of the observed data. Noting the similarity between the triggering kernels of a point process and the attention scores, we use a triggering kernel to replace the weights used to build history representations. Our estimate for the triggering function is equipped with a sigmoid gating mechanism that captures local-in-time triggering effects that are otherwise challenging with standard decaying-over-time kernels. Further, taking both event type representations and temporal embeddings as inputs, the model learns the underlying triggering type-time kernel parameters given pairs of event types. We present experiments on synthetic and real data sets widely used by competing models, while further including a COVID-19 dataset to illustrate a scenario where longitudinal covariates are available. Results show the proposed model outperforms existing approaches while being more efficient in terms of computational complexity and yielding interpretable results via direct application of the newly introduced kernel.

Viaarxiv icon