Annotating data via crowdsourcing is time-consuming and expensive. Owing to these costs, dataset creators often have each annotator label only a small subset of the data. This leads to sparse datasets with examples that are marked by few annotators; if an annotator is not selected to label an example, their opinion regarding it is lost. This is especially concerning for subjective NLP datasets where there is no correct label: people may have different valid opinions. Thus, we propose using imputation methods to restore the opinions of all annotators for all examples, creating a dataset that does not leave out any annotator's view. We then train and prompt models with data from the imputed dataset (rather than the original sparse dataset) to make predictions about majority and individual annotations. Unfortunately, the imputed data provided by our baseline methods does not improve predictions. However, through our analysis of it, we develop a strong understanding of how different imputation methods impact the original data in order to inform future imputation techniques. We make all of our code and data publicly available.
In NLP annotation, it is common to have multiple annotators label the text and then obtain the ground truth labels based on the agreement of major annotators. However, annotators are individuals with different backgrounds, and minors' opinions should not be simply ignored. As annotation tasks become subjective and topics are controversial in modern NLP tasks, we need NLP systems that can represent people's diverse voices on subjective matters and predict the level of diversity. This paper examines whether the text of the task and annotators' demographic background information can be used to estimate the level of disagreement among annotators. Particularly, we extract disagreement labels from the annotators' voting histories in the five subjective datasets, and then fine-tune language models to predict annotators' disagreement. Our results show that knowing annotators' demographic information, like gender, ethnicity, and education level, helps predict disagreements. In order to distinguish the disagreement from the inherent controversy from text content and the disagreement in the annotators' different perspectives, we simulate everyone's voices with different combinations of annotators' artificial demographics and examine its variance of the finetuned disagreement predictor. Our paper aims to improve the annotation process for more efficient and inclusive NLP systems through a novel disagreement prediction mechanism. Our code and dataset are publicly available.
Dietary supplements (DS) have been widely used by consumers, but the information around the effectiveness and safety of DS is disparate or incomplete, making barriers to consumers to find information effectively. Conversational agent systems have been applied to the healthcare domain but there is no such a system to answer consumers regarding DS use, although widespread use of the dietary supplement. In this study, we develop the first conversational agent system for DS use.
There is growing evidence showing the significant role of social determinant of health (SDOH) on a wide variety of health outcomes. In the era of artificial intelligence (AI), electronic health records (EHRs) have been widely used to conduct observational studies. However, how to make the best of SDOH information from EHRs is yet to be studied. In this paper, we systematically reviewed recently published papers and provided a methodology review of AI methods using the SDOH information in EHR data. A total of 1250 articles were retrieved from the literature between 2010 and 2020, and 74 papers were included in this review after abstract and full-text screening. We summarized these papers in terms of general characteristics (including publication years, venues, countries etc.), SDOH types, disease areas, study outcomes, AI methods to extract SDOH from EHRs and AI methods using SDOH for healthcare outcomes. Finally, we conclude this paper with discussion on the current trends, challenges, and future directions on using SDOH from EHRs.