Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Albert M. Lai

Document-level Clinical Entity and Relation Extraction via Knowledge Base-Guided Generation

Jul 13, 2024

Kriti Bhattarai, Inez Y. Oh, Zachary B. Abrams, Albert M. Lai

Abstract:Generative pre-trained transformer (GPT) models have shown promise in clinical entity and relation extraction tasks because of their precise extraction and contextual understanding capability. In this work, we further leverage the Unified Medical Language System (UMLS) knowledge base to accurately identify medical concepts and improve clinical entity and relation extraction at the document level. Our framework selects UMLS concepts relevant to the text and combines them with prompts to guide language models in extracting entities. Our experiments demonstrate that this initial concept mapping and the inclusion of these mapped concepts in the prompts improves extraction results compared to few-shot extraction tasks on generic language models that do not leverage UMLS. Further, our results show that this approach is more effective than the standard Retrieval Augmented Generation (RAG) technique, where retrieved data is compared with prompt embeddings to generate results. Overall, we find that integrating UMLS concepts with GPT models significantly improves entity and relation identification, outperforming the baseline and RAG models. By combining the precise concept mapping capability of knowledge-based approaches like UMLS with the contextual understanding capability of GPT, our method highlights the potential of these approaches in specialized domains like healthcare.

* Accepted at Association for Computational Linguistics BioNLP 2024

Via

Access Paper or Ask Questions

Jointly Embedding Entities and Text with Distant Supervision

Jul 09, 2018

Denis Newman-Griffis, Albert M. Lai, Eric Fosler-Lussier

Figure 1 for Jointly Embedding Entities and Text with Distant Supervision

Figure 2 for Jointly Embedding Entities and Text with Distant Supervision

Figure 3 for Jointly Embedding Entities and Text with Distant Supervision

Figure 4 for Jointly Embedding Entities and Text with Distant Supervision

Abstract:Learning representations for knowledge base entities and concepts is becoming increasingly important for NLP applications. However, recent entity embedding methods have relied on structured resources that are expensive to create for new domains and corpora. We present a distantly-supervised method for jointly learning embeddings of entities and text from an unnanotated corpus, using only a list of mappings between entities and surface forms. We learn embeddings from open-domain and biomedical corpora, and compare against prior methods that rely on human-annotated text or large knowledge graph structure. Our embeddings capture entity similarity and relatedness better than prior work, both in existing biomedical datasets and a new Wikipedia-based dataset that we release to the community. Results on analogy completion and entity sense disambiguation indicate that entities and words capture complementary information that can be effectively combined for downstream use.

* 12 pages; Accepted to 3rd Workshop on Representation Learning for NLP (Repl4NLP 2018). Code at https://github.com/OSU-slatelab/JET

Via

Access Paper or Ask Questions

How essential are unstructured clinical narratives and information fusion to clinical trial recruitment?

Feb 13, 2015

Preethi Raghavan, James L. Chen, Eric Fosler-Lussier, Albert M. Lai

Figure 1 for How essential are unstructured clinical narratives and information fusion to clinical trial recruitment?

Figure 2 for How essential are unstructured clinical narratives and information fusion to clinical trial recruitment?

Abstract:Electronic health records capture patient information using structured controlled vocabularies and unstructured narrative text. While structured data typically encodes lab values, encounters and medication lists, unstructured data captures the physician's interpretation of the patient's condition, prognosis, and response to therapeutic intervention. In this paper, we demonstrate that information extraction from unstructured clinical narratives is essential to most clinical applications. We perform an empirical study to validate the argument and show that structured data alone is insufficient in resolving eligibility criteria for recruiting patients onto clinical trials for chronic lymphocytic leukemia (CLL) and prostate cancer. Unstructured data is essential to solving 59% of the CLL trial criteria and 77% of the prostate cancer trial criteria. More specifically, for resolving eligibility criteria with temporal constraints, we show the need for temporal reasoning and information integration with medical events within and across unstructured clinical narratives and structured data.

* AMIA TBI 2014, 6 pages

Via

Access Paper or Ask Questions