Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Heather Whalley

GS-BrainText: A Multi-Site Brain Imaging Report Dataset from Generation Scotland for Clinical Natural Language Processing Development and Validation

Mar 27, 2026

Beatrice Alex, Claire Grover, Arlene Casey, Richard Tobin, Heather Whalley, William Whiteley

Abstract:We present GS-BrainText, a curated dataset of 8,511 brain radiology reports from the Generation Scotland cohort, of which 2,431 are annotated for 24 brain disease phenotypes. This multi-site dataset spans five Scottish NHS health boards and includes broad age representation (mean age 58, median age 53), making it uniquely valuable for developing and evaluating generalisable clinical natural language processing (NLP) algorithms and tools. Expert annotations were performed by a multidisciplinary clinical team using an annotation schema, with 10-100% double annotation per NHS health board and rigorous quality assurance. Benchmark evaluation using EdIE-R, an existing rule-based NLP system developed in conjunction with the annotation schema, revealed some performance variation across health boards (F1: 86.13-98.13), phenotypes (F1: 22.22-100) and age groups (F1: 87.01-98.13), highlighting critical challenges in generalisation of NLP tools. The GS-BrainText dataset addresses a significant gap in available UK clinical text resources and provides a valuable resource for the study of linguistic variation, diagnostic uncertainty expression and the impact of data characteristics on NLP system performance.

* 11 pages, 1 figure

Via

Access Paper or Ask Questions

Examining the Role of Mood Patterns in Predicting Self-Reported Depressive symptoms

Jun 14, 2020

Lucia Lushi Chen, Walid Magdy, Heather Whalley, Maria Wolters

Figure 1 for Examining the Role of Mood Patterns in Predicting Self-Reported Depressive symptoms

Figure 2 for Examining the Role of Mood Patterns in Predicting Self-Reported Depressive symptoms

Figure 3 for Examining the Role of Mood Patterns in Predicting Self-Reported Depressive symptoms

Figure 4 for Examining the Role of Mood Patterns in Predicting Self-Reported Depressive symptoms

Abstract:Depression is the leading cause of disability worldwide. Initial efforts to detect depression signals from social media posts have shown promising results. Given the high internal validity, results from such analyses are potentially beneficial to clinical judgment. The existing models for automatic detection of depressive symptoms learn proxy diagnostic signals from social media data, such as help-seeking behavior for mental health or medication names. However, in reality, individuals with depression typically experience depressed mood, loss of pleasure nearly in all the activities, feeling of worthlessness or guilt, and diminished ability to think. Therefore, a lot of the proxy signals used in these models lack the theoretical underpinnings for depressive symptoms. It is also reported that social media posts from many patients in the clinical setting do not contain these signals. Based on this research gap, we propose to monitor a type of signal that is well-established as a class of symptoms in affective disorders -- mood. The mood is an experience of feeling that can last for hours, days, or even weeks. In this work, we attempt to enrich current technology for detecting symptoms of potential depression by constructing a 'mood profile' for social media users.

* Accepted at The Web Science Conference 2020

Via

Access Paper or Ask Questions

Named Entity Recognition for Electronic Health Records: A Comparison of Rule-based and Machine Learning Approaches

Mar 10, 2019

Philip John Gorinski, Honghan Wu, Claire Grover, Richard Tobin, Conn Talbot, Heather Whalley, Cathie Sudlow, William Whiteley, Beatrice Alex

Figure 1 for Named Entity Recognition for Electronic Health Records: A Comparison of Rule-based and Machine Learning Approaches

Figure 2 for Named Entity Recognition for Electronic Health Records: A Comparison of Rule-based and Machine Learning Approaches

Figure 3 for Named Entity Recognition for Electronic Health Records: A Comparison of Rule-based and Machine Learning Approaches

Figure 4 for Named Entity Recognition for Electronic Health Records: A Comparison of Rule-based and Machine Learning Approaches

Abstract:This work investigates multiple approaches to Named Entity Recognition (NER) for text in Electronic Health Record (EHR) data. In particular, we look into the application of (i) rule-based, (ii) deep learning and (iii) transfer learning systems for the task of NER on brain imaging reports with a focus on records from patients with stroke. We explore the strengths and weaknesses of each approach, develop rules and train on a common dataset, and evaluate each system's performance on common test sets of Scottish radiology reports from two sources (brain imaging reports in ESS -- Edinburgh Stroke Study data collected by NHS Lothian as well as radiology reports created in NHS Tayside). Our comparison shows that a hand-crafted system is the most accurate way to automatically label EHR, but machine learning approaches can provide a feasible alternative where resources for a manual system are not readily available.

* 8 pages, accepted at HealTAC 2019, Cardiff, 24-25/04/2019

Via

Access Paper or Ask Questions