Alert button
Picture for Ozlem Uzuner

Ozlem Uzuner

Alert button

MASON-NLP at eRisk 2023: Deep Learning-Based Detection of Depression Symptoms from Social Media Texts

Oct 17, 2023
Fardin Ahsan Sakib, Ahnaf Atef Choudhury, Ozlem Uzuner

Depression is a mental health disorder that has a profound impact on people's lives. Recent research suggests that signs of depression can be detected in the way individuals communicate, both through spoken words and written texts. In particular, social media posts are a rich and convenient text source that we may examine for depressive symptoms. The Beck Depression Inventory (BDI) Questionnaire, which is frequently used to gauge the severity of depression, is one instrument that can aid in this study. We can narrow our study to only those symptoms since each BDI question is linked to a particular depressive symptom. It's important to remember that not everyone with depression exhibits all symptoms at once, but rather a combination of them. Therefore, it is extremely useful to be able to determine if a sentence or a piece of user-generated content is pertinent to a certain condition. With this in mind, the eRisk 2023 Task 1 was designed to do exactly that: assess the relevance of different sentences to the symptoms of depression as outlined in the BDI questionnaire. This report is all about how our team, Mason-NLP, participated in this subtask, which involved identifying sentences related to different depression symptoms. We used a deep learning approach that incorporated MentalBERT, RoBERTa, and LSTM. Despite our efforts, the evaluation results were lower than expected, underscoring the challenges inherent in ranking sentences from an extensive dataset about depression, which necessitates both appropriate methodological choices and significant computational resources. We anticipate that future iterations of this shared task will yield improved results as our understanding and techniques evolve.

* Working Notes of CLEF (2023): 18-21  
Viaarxiv icon

MasonNLP+ at SemEval-2023 Task 8: Extracting Medical Questions, Experiences and Claims from Social Media using Knowledge-Augmented Pre-trained Language Models

Apr 26, 2023
Giridhar Kaushik Ramachandran, Haritha Gangavarapu, Kevin Lybarger, Ozlem Uzuner

Figure 1 for MasonNLP+ at SemEval-2023 Task 8: Extracting Medical Questions, Experiences and Claims from Social Media using Knowledge-Augmented Pre-trained Language Models
Figure 2 for MasonNLP+ at SemEval-2023 Task 8: Extracting Medical Questions, Experiences and Claims from Social Media using Knowledge-Augmented Pre-trained Language Models
Figure 3 for MasonNLP+ at SemEval-2023 Task 8: Extracting Medical Questions, Experiences and Claims from Social Media using Knowledge-Augmented Pre-trained Language Models
Figure 4 for MasonNLP+ at SemEval-2023 Task 8: Extracting Medical Questions, Experiences and Claims from Social Media using Knowledge-Augmented Pre-trained Language Models

In online forums like Reddit, users share their experiences with medical conditions and treatments, including making claims, asking questions, and discussing the effects of treatments on their health. Building systems to understand this information can effectively monitor the spread of misinformation and verify user claims. The Task-8 of the 2023 International Workshop on Semantic Evaluation focused on medical applications, specifically extracting patient experience- and medical condition-related entities from user posts on social media. The Reddit Health Online Talk (RedHot) corpus contains posts from medical condition-related subreddits with annotations characterizing the patient experience and medical conditions. In Subtask-1, patient experience is characterized by personal experience, questions, and claims. In Subtask-2, medical conditions are characterized by population, intervention, and outcome. For the automatic extraction of patient experiences and medical condition information, as a part of the challenge, we proposed language-model-based extraction systems that ranked $3^{rd}$ on both subtasks' leaderboards. In this work, we describe our approach and, in addition, explore the automatic extraction of this information using domain-specific language models and the inclusion of external knowledge.

Viaarxiv icon

LeafAI: query generator for clinical cohort discovery rivaling a human programmer

Apr 13, 2023
Nicholas J Dobbins, Bin Han, Weipeng Zhou, Kristine Lan, H. Nina Kim, Robert Harrington, Ozlem Uzuner, Meliha Yetisgen

Figure 1 for LeafAI: query generator for clinical cohort discovery rivaling a human programmer
Figure 2 for LeafAI: query generator for clinical cohort discovery rivaling a human programmer
Figure 3 for LeafAI: query generator for clinical cohort discovery rivaling a human programmer
Figure 4 for LeafAI: query generator for clinical cohort discovery rivaling a human programmer

Objective: Identifying study-eligible patients within clinical databases is a critical step in clinical research. However, accurate query design typically requires extensive technical and biomedical expertise. We sought to create a system capable of generating data model-agnostic queries while also providing novel logical reasoning capabilities for complex clinical trial eligibility criteria. Materials and Methods: The task of query creation from eligibility criteria requires solving several text-processing problems, including named entity recognition and relation extraction, sequence-to-sequence transformation, normalization, and reasoning. We incorporated hybrid deep learning and rule-based modules for these, as well as a knowledge base of the Unified Medical Language System (UMLS) and linked ontologies. To enable data-model agnostic query creation, we introduce a novel method for tagging database schema elements using UMLS concepts. To evaluate our system, called LeafAI, we compared the capability of LeafAI to a human database programmer to identify patients who had been enrolled in 8 clinical trials conducted at our institution. We measured performance by the number of actual enrolled patients matched by generated queries. Results: LeafAI matched a mean 43% of enrolled patients with 27,225 eligible across 8 clinical trials, compared to 27% matched and 14,587 eligible in queries by a human database programmer. The human programmer spent 26 total hours crafting queries compared to several minutes by LeafAI. Conclusions: Our work contributes a state-of-the-art data model-agnostic query generation system capable of conditional reasoning using a knowledge base. We demonstrate that LeafAI can rival a human programmer in finding patients eligible for clinical trials.

Viaarxiv icon

Progress Note Understanding -- Assessment and Plan Reasoning: Overview of the 2022 N2C2 Track 3 Shared Task

Mar 14, 2023
Yanjun Gao, Dmitriy Dligach, Timothy Miller, Matthew M Churpek, Ozlem Uzuner, Majid Afshar

Figure 1 for Progress Note Understanding -- Assessment and Plan Reasoning: Overview of the 2022 N2C2 Track 3 Shared Task
Figure 2 for Progress Note Understanding -- Assessment and Plan Reasoning: Overview of the 2022 N2C2 Track 3 Shared Task
Figure 3 for Progress Note Understanding -- Assessment and Plan Reasoning: Overview of the 2022 N2C2 Track 3 Shared Task
Figure 4 for Progress Note Understanding -- Assessment and Plan Reasoning: Overview of the 2022 N2C2 Track 3 Shared Task

Daily progress notes are common types in the electronic health record (EHR) where healthcare providers document the patient's daily progress and treatment plans. The EHR is designed to document all the care provided to patients, but it also enables note bloat with extraneous information that distracts from the diagnoses and treatment plans. Applications of natural language processing (NLP) in the EHR is a growing field with the majority of methods in information extraction. Few tasks use NLP methods for downstream diagnostic decision support. We introduced the 2022 National NLP Clinical Challenge (N2C2) Track 3: Progress Note Understanding - Assessment and Plan Reasoning as one step towards a new suite of tasks. The Assessment and Plan Reasoning task focuses on the most critical components of progress notes, Assessment and Plan subsections where health problems and diagnoses are contained. The goal of the task was to develop and evaluate NLP systems that automatically predict causal relations between the overall status of the patient contained in the Assessment section and its relation to each component of the Plan section which contains the diagnoses and treatment plans. The goal of the task was to identify and prioritize diagnoses as the first steps in diagnostic decision support to find the most relevant information in long documents like daily progress notes. We present the results of 2022 n2c2 Track 3 and provide a description of the data, evaluation, participation and system performance.

* To appear in Journal of Biomedical Informatics 
Viaarxiv icon

The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria

Jul 27, 2022
Nicholas J Dobbins, Tony Mullen, Ozlem Uzuner, Meliha Yetisgen

Identifying cohorts of patients based on eligibility criteria such as medical conditions, procedures, and medication use is critical to recruitment for clinical trials. Such criteria are often most naturally described in free-text, using language familiar to clinicians and researchers. In order to identify potential participants at scale, these criteria must first be translated into queries on clinical databases, which can be labor-intensive and error-prone. Natural language processing (NLP) methods offer a potential means of such conversion into database queries automatically. However they must first be trained and evaluated using corpora which capture clinical trials criteria in sufficient detail. In this paper, we introduce the Leaf Clinical Trials (LCT) corpus, a human-annotated corpus of over 1,000 clinical trial eligibility criteria descriptions using highly granular structured labels capturing a range of biomedical phenomena. We provide details of our schema, annotation process, corpus quality, and statistics. Additionally, we present baseline information extraction results on this corpus as benchmarks for future work.

Viaarxiv icon

Online User Profiling to Detect Social Bots on Twitter

Mar 09, 2022
Maryam Heidari, James H Jr Jones, Ozlem Uzuner

Figure 1 for Online User Profiling to Detect Social Bots on Twitter
Figure 2 for Online User Profiling to Detect Social Bots on Twitter
Figure 3 for Online User Profiling to Detect Social Bots on Twitter
Figure 4 for Online User Profiling to Detect Social Bots on Twitter

Social media platforms can expose influential trends in many aspects of everyday life. However, the movements they represent can be contaminated by disinformation. Social bots are one of the significant sources of disinformation in social media. Social bots can pose serious cyber threats to society and public opinion. This research aims to develop machine learning models to detect bots based on the extracted user's profile from a Tweet's text. Online users' profile shows the user's personal information, such as age, gender, education, and personality. In this work, the user's profile is constructed based on the user's online posts. This work's main contribution is three-fold: First, we aim to improve bot detection through machine learning models based on the user's personal information generated by the user's online comments. When comparing two online posts, the similarity of personal information makes it difficult to differentiate a bot from a human user. However, this research turns personal information similarity among two online posts into an advantage for the new bot detection model. The new proposed model for bot detection creates user profiles based on personal information such as age, personality, gender, education from users' online posts and introduces a machine learning model to detect social bots with high prediction accuracy based on personal information. Second, create a new public data set that shows the user's profile for more than 6900 Twitter accounts in the Cresci 2017 data set.

Viaarxiv icon

A Scoping Review of Publicly Available Language Tasks in Clinical Natural Language Processing

Dec 07, 2021
Yanjun Gao, Dmitriy Dligach, Leslie Christensen, Samuel Tesch, Ryan Laffin, Dongfang Xu, Timothy Miller, Ozlem Uzuner, Matthew M Churpek, Majid Afshar

Figure 1 for A Scoping Review of Publicly Available Language Tasks in Clinical Natural Language Processing
Figure 2 for A Scoping Review of Publicly Available Language Tasks in Clinical Natural Language Processing
Figure 3 for A Scoping Review of Publicly Available Language Tasks in Clinical Natural Language Processing
Figure 4 for A Scoping Review of Publicly Available Language Tasks in Clinical Natural Language Processing

Objective: to provide a scoping review of papers on clinical natural language processing (NLP) tasks that use publicly available electronic health record data from a cohort of patients. Materials and Methods: We searched six databases, including biomedical research and computer science literature database. A round of title/abstract screening and full-text screening were conducted by two reviewers. Our method followed the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines. Results: A total of 35 papers with 47 clinical NLP tasks met inclusion criteria between 2007 and 2021. We categorized the tasks by the type of NLP problems, including name entity recognition, summarization, and other NLP tasks. Some tasks were introduced with a topic of clinical decision support applications, such as substance abuse, phenotyping, cohort selection for clinical trial. We summarized the tasks by publication and dataset information. Discussion: The breadth of clinical NLP tasks keeps growing as the field of NLP evolves with advancements in language systems. However, gaps exist in divergent interests between general domain NLP community and clinical informatics community, and in generalizability of the data sources. We also identified issues in data selection and preparation including the lack of time-sensitive data, and invalidity of problem size and evaluation. Conclusions: The existing clinical NLP tasks cover a wide range of topics and the field will continue to grow and attract more attention from both general domain NLP and clinical informatics community. We encourage future work to incorporate multi-disciplinary collaboration, reporting transparency, and standardization in data preparation.

* Paper submitted to Journal of American Medical Informatics Association (JAMIA) 
Viaarxiv icon

Extracting Radiological Findings With Normalized Anatomical Information Using a Span-Based BERT Relation Extraction Model

Aug 20, 2021
Kevin Lybarger, Aashka Damani, Martin Gunn, Ozlem Uzuner, Meliha Yetisgen

Figure 1 for Extracting Radiological Findings With Normalized Anatomical Information Using a Span-Based BERT Relation Extraction Model
Figure 2 for Extracting Radiological Findings With Normalized Anatomical Information Using a Span-Based BERT Relation Extraction Model
Figure 3 for Extracting Radiological Findings With Normalized Anatomical Information Using a Span-Based BERT Relation Extraction Model
Figure 4 for Extracting Radiological Findings With Normalized Anatomical Information Using a Span-Based BERT Relation Extraction Model

Medical imaging is critical to the diagnosis and treatment of numerous medical problems, including many forms of cancer. Medical imaging reports distill the findings and observations of radiologists, creating an unstructured textual representation of unstructured medical images. Large-scale use of this text-encoded information requires converting the unstructured text to a structured, semantic representation. We explore the extraction and normalization of anatomical information in radiology reports that is associated with radiological findings. We investigate this extraction and normalization task using a span-based relation extraction model that jointly extracts entities and relations using BERT. This work examines the factors that influence extraction and normalization performance, including the body part/organ system, frequency of occurrence, span length, and span diversity. It discusses approaches for improving performance and creating high-quality semantic representations of radiological phenomena.

Viaarxiv icon

Jointly Learning Clinical Entities and Relations with Contextual Language Models and Explicit Context

Feb 17, 2021
Paul Barry, Sam Henry, Meliha Yetisgen, Bridget McInnes, Ozlem Uzuner

Figure 1 for Jointly Learning Clinical Entities and Relations with Contextual Language Models and Explicit Context

We hypothesize that explicit integration of contextual information into an Multi-task Learning framework would emphasize the significance of context for boosting performance in jointly learning Named Entity Recognition (NER) and Relation Extraction (RE). Our work proves this hypothesis by segmenting entities from their surrounding context and by building contextual representations using each independent segment. This relation representation allows for a joint NER/RE system that achieves near state-of-the-art (SOTA) performance on both NER and RE tasks while beating the SOTA RE system at end-to-end NER & RE with a 49.07 F1.

Viaarxiv icon

Transfer Learning Approach for Arabic Offensive Language Detection System -- BERT-Based Model

Feb 09, 2021
Fatemah Husain, Ozlem Uzuner

Figure 1 for Transfer Learning Approach for Arabic Offensive Language Detection System -- BERT-Based Model
Figure 2 for Transfer Learning Approach for Arabic Offensive Language Detection System -- BERT-Based Model
Figure 3 for Transfer Learning Approach for Arabic Offensive Language Detection System -- BERT-Based Model
Figure 4 for Transfer Learning Approach for Arabic Offensive Language Detection System -- BERT-Based Model

Developing a system to detect online offensive language is very important to the health and the security of online users. Studies have shown that cyberhate, online harassment and other misuses of technology are on the rise, particularly during the global Coronavirus pandemic in 2020. According to the latest report by the Anti-Defamation League (ADL), 35% of online users reported online harassment related to their identity-based characteristics, which is a 3% increase over 2019. Applying advanced techniques from the Natural Language Processing (NLP) field to support the development of an online hate-free community is a critical task for social justice. Transfer learning enhances the performance of the classifier by allowing the transfer of knowledge from one domain or one dataset to others that have not been seen before, thus, supporting the classifier to be more generalizable. In our study, we apply the principles of transfer learning cross multiple Arabic offensive language datasets to compare the effects on system performance. This study aims at investigating the effects of fine-tuning and training Bidirectional Encoder Representations from Transformers (BERT) model on multiple Arabic offensive language datasets individually and testing it using other datasets individually. Our experiment starts with a comparison among multiple BERT models to guide the selection of the main model that is used for our study. The study also investigates the effects of concatenating all datasets to be used for fine-tuning and training BERT model. Our results demonstrate the limited effects of transfer learning on the performance of the classifiers, particularly for highly dialectic comments.

* 2021 4th International Conference on Computer Applications & Information Security (ICCAIS) - Contemporary Computer Technologies and Applications 
Viaarxiv icon