Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Byung-Hak Kim

CREFT: Sequential Multi-Agent LLM for Character Relation Extraction

May 30, 2025

Ye Eun Chun, Taeyoon Hwang, Seung-won Hwang, Byung-Hak Kim

Abstract:Understanding complex character relations is crucial for narrative analysis and efficient script evaluation, yet existing extraction methods often fail to handle long-form narratives with nuanced interactions. To address this challenge, we present CREFT, a novel sequential framework leveraging specialized Large Language Model (LLM) agents. First, CREFT builds a base character graph through knowledge distillation, then iteratively refines character composition, relation extraction, role identification, and group assignments. Experiments on a curated Korean drama dataset demonstrate that CREFT significantly outperforms single-agent LLM baselines in both accuracy and completeness. By systematically visualizing character networks, CREFT streamlines narrative comprehension and accelerates script review -- offering substantial benefits to the entertainment, publishing, and educational sectors.

Via

Access Paper or Ask Questions

NexusSum: Hierarchical LLM Agents for Long-Form Narrative Summarization

May 30, 2025

Hyuntak Kim, Byung-Hak Kim

Abstract:Summarizing long-form narratives--such as books, movies, and TV scripts--requires capturing intricate plotlines, character interactions, and thematic coherence, a task that remains challenging for existing LLMs. We introduce NexusSum, a multi-agent LLM framework for narrative summarization that processes long-form text through a structured, sequential pipeline--without requiring fine-tuning. Our approach introduces two key innovations: (1) Dialogue-to-Description Transformation: A narrative-specific preprocessing method that standardizes character dialogue and descriptive text into a unified format, improving coherence. (2) Hierarchical Multi-LLM Summarization: A structured summarization pipeline that optimizes chunk processing and controls output length for accurate, high-quality summaries. Our method establishes a new state-of-the-art in narrative summarization, achieving up to a 30.0% improvement in BERTScore (F1) across books, movies, and TV scripts. These results demonstrate the effectiveness of multi-agent LLMs in handling long-form content, offering a scalable approach for structured summarization in diverse storytelling domains.

* Accepted to the main track of ACL 2025

Via

Access Paper or Ask Questions

Agent-as-Judge for Factual Summarization of Long Narratives

Jan 17, 2025

Yeonseok Jeong, Minsoo Kim, Seung-won Hwang, Byung-Hak Kim

Figure 1 for Agent-as-Judge for Factual Summarization of Long Narratives

Figure 2 for Agent-as-Judge for Factual Summarization of Long Narratives

Figure 3 for Agent-as-Judge for Factual Summarization of Long Narratives

Figure 4 for Agent-as-Judge for Factual Summarization of Long Narratives

Abstract:Large Language Models (LLMs) have demonstrated near-human performance in summarization tasks based on traditional metrics such as ROUGE and BERTScore. However, these metrics do not adequately capture critical aspects of summarization quality, such as factual accuracy, particularly for long narratives (>100K tokens). Recent advances, such as LLM-as-a-Judge, address the limitations of metrics based on lexical similarity but still exhibit factual inconsistencies, especially in understanding character relationships and states. In this work, we introduce NarrativeFactScore, a novel "Agent-as-a-Judge" framework for evaluating and refining summaries. By leveraging a Character Knowledge Graph (CKG) extracted from input and generated summaries, NarrativeFactScore assesses the factual consistency and provides actionable guidance for refinement, such as identifying missing or erroneous facts. We demonstrate the effectiveness of NarrativeFactScore through a detailed workflow illustration and extensive validation on widely adopted benchmarks, achieving superior performance compared to competitive methods. Our results highlight the potential of agent-driven evaluation systems to improve the factual reliability of LLM-generated summaries.

Via

Access Paper or Ask Questions

Multimodal Representation Learning of Cardiovascular Magnetic Resonance Imaging

Apr 16, 2023

Jielin Qiu, Peide Huang, Makiya Nakashima, Jaehyun Lee, Jiacheng Zhu, Wilson Tang, Pohao Chen, Christopher Nguyen, Byung-Hak Kim, Debbie Kwon(+3 more)

Abstract:Self-supervised learning is crucial for clinical imaging applications, given the lack of explicit labels in healthcare. However, conventional approaches that rely on precise vision-language alignment are not always feasible in complex clinical imaging modalities, such as cardiac magnetic resonance (CMR). CMR provides a comprehensive visualization of cardiac anatomy, physiology, and microstructure, making it challenging to interpret. Additionally, CMR reports require synthesizing information from sequences of images and different views, resulting in potentially weak alignment between the study and diagnosis report pair. To overcome these challenges, we propose \textbf{CMRformer}, a multimodal learning framework to jointly learn sequences of CMR images and associated cardiologist's reports. Moreover, one of the major obstacles to improving CMR study is the lack of large, publicly available datasets. To bridge this gap, we collected a large \textbf{CMR dataset}, which consists of 13,787 studies from clinical cases. By utilizing our proposed CMRformer and our collected dataset, we achieved remarkable performance in real-world clinical tasks, such as CMR image retrieval and diagnosis report retrieval. Furthermore, the learned representations are evaluated to be practically helpful for downstream applications, such as disease classification. Our work could potentially expedite progress in the CMR study and lead to more accurate and effective diagnosis and treatment.

* 24 pages

Via

Access Paper or Ask Questions

RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild

Nov 02, 2022

Weiyao Wang, Byung-Hak Kim, Varun Ganapathi

Abstract:Recent advances in self-supervised learning (SSL) using large models to learn visual representations from natural images are rapidly closing the gap between the results produced by fully supervised learning and those produced by SSL on downstream vision tasks. Inspired by this advancement and primarily motivated by the emergence of tabular and structured document image applications, we investigate which self-supervised pretraining objectives, architectures, and fine-tuning strategies are most effective. To address these questions, we introduce RegCLR, a new self-supervised framework that combines contrastive and regularized methods and is compatible with the standard Vision Transformer architecture. Then, RegCLR is instantiated by integrating masked autoencoders as a representative example of a contrastive method and enhanced Barlow Twins as a representative example of a regularized method with configurable input image augmentations in both branches. Several real-world table recognition scenarios (e.g., extracting tables from document images), ranging from standard Word and Latex documents to even more challenging electronic health records (EHR) computer screen images, have been shown to benefit greatly from the representations learned from this new framework, with detection average-precision (AP) improving relatively by 4.8% for Table, 11.8% for Column, and 11.1% for GUI objects over a previous fully supervised baseline on real-world EHR screen images.

* To be presented at the 36th Conference on Neural Information Processing Systems, New Orleans, USA, on December 2, 2022, at the First Table Representation Learning (TRL) Workshop

Via

Access Paper or Ask Questions

Medical Codes Prediction from Clinical Notes: From Human Coders to Machines

Oct 30, 2022

Byung-Hak Kim

Abstract:Prediction of medical codes from clinical notes is a practical and essential need for every healthcare delivery organization within current medical systems. Automating annotation will save significant time and excessive effort that human coders spend today. However, the biggest challenge is directly identifying appropriate medical codes from several thousands of high-dimensional codes from unstructured free-text clinical notes. This complex medical codes prediction problem from clinical notes has received substantial interest in the NLP community, and several recent studies have shown the state-of-the-art code prediction results of full-fledged deep learning-based methods. This progress raises the fundamental question of how far automated machine learning systems are from human coders' working performance, as well as the important question of how well current explainability methods apply to advanced neural network models such as transformers. This is to predict correct codes and present references in clinical notes that support code prediction, as this level of explainability and accuracy of the prediction outcomes is critical to gaining trust from professional medical coders.

* The 11th Bay Area Machine Learning Symposium (BayLearn 2022), San Francisco, CA, October 20, 2022. arXiv admin note: substantial text overlap with arXiv:2210.15882. substantial text overlap with arXiv:2107.10650

Via

Access Paper or Ask Questions

Can Current Explainability Help Provide References in Clinical Notes to Support Humans Annotate Medical Codes?

Oct 28, 2022

Byung-Hak Kim, Zhongfen Deng, Philip S. Yu, Varun Ganapathi

Figure 1 for Can Current Explainability Help Provide References in Clinical Notes to Support Humans Annotate Medical Codes?

Figure 2 for Can Current Explainability Help Provide References in Clinical Notes to Support Humans Annotate Medical Codes?

Figure 3 for Can Current Explainability Help Provide References in Clinical Notes to Support Humans Annotate Medical Codes?

Abstract:The medical codes prediction problem from clinical notes has received substantial interest in the NLP community, and several recent studies have shown the state-of-the-art (SOTA) code prediction results of full-fledged deep learning-based methods. However, most previous SOTA works based on deep learning are still in early stages in terms of providing textual references and explanations of the predicted codes, despite the fact that this level of explainability of the prediction outcomes is critical to gaining trust from professional medical coders. This raises the important question of how well current explainability methods apply to advanced neural network models such as transformers to predict correct codes and present references in clinical notes that support code prediction. First, we present an explainable Read, Attend, and Code (xRAC) framework and assess two approaches, attention score-based xRAC-ATTN and model-agnostic knowledge-distillation-based xRAC-KD, through simplified but thorough human-grounded evaluations with SOTA transformer-based model, RAC. We find that the supporting evidence text highlighted by xRAC-ATTN is of higher quality than xRAC-KD whereas xRAC-KD has potential advantages in production deployment scenarios. More importantly, we show for the first time that, given the current state of explainability methodologies, using the SOTA medical codes prediction system still requires the expertise and competencies of professional coders, even though its prediction accuracy is superior to that of human coders. This, we believe, is a very meaningful step toward developing explainable and accurate machine learning systems for fully autonomous medical code prediction from clinical notes.

* To appear in Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (Louhi 2022), Virtual, December 7, 2022

Via

Access Paper or Ask Questions

Read, Attend, and Code: Pushing the Limits of Medical Codes Prediction from Clinical Notes by Machines

Jul 10, 2021

Byung-Hak Kim, Varun Ganapathi

Figure 1 for Read, Attend, and Code: Pushing the Limits of Medical Codes Prediction from Clinical Notes by Machines

Figure 2 for Read, Attend, and Code: Pushing the Limits of Medical Codes Prediction from Clinical Notes by Machines

Figure 3 for Read, Attend, and Code: Pushing the Limits of Medical Codes Prediction from Clinical Notes by Machines

Figure 4 for Read, Attend, and Code: Pushing the Limits of Medical Codes Prediction from Clinical Notes by Machines

Abstract:Prediction of medical codes from clinical notes is both a practical and essential need for every healthcare delivery organization within current medical systems. Automating annotation will save significant time and excessive effort spent by human coders today. However, the biggest challenge is directly identifying appropriate medical codes out of several thousands of high-dimensional codes from unstructured free-text clinical notes. In the past three years, with Convolutional Neural Networks (CNN) and Long Short-Term Memory (LTSM) networks, there have been vast improvements in tackling the most challenging benchmark of the MIMIC-III-full-label inpatient clinical notes dataset. This progress raises the fundamental question of how far automated machine learning (ML) systems are from human coders' working performance. We assessed the baseline of human coders' performance on the same subsampled testing set. We also present our Read, Attend, and Code (RAC) model for learning the medical code assignment mappings. By connecting convolved embeddings with self-attention and code-title guided attention modules, combined with sentence permutation-based data augmentations and stochastic weight averaging training, RAC establishes a new state of the art (SOTA), considerably outperforming the current best Macro-F1 by 18.7%, and reaches past the human-level coding baseline. This new milestone marks a meaningful step toward fully autonomous medical coding (AMC) in machines reaching parity with human coders' performance in medical code prediction.

* To appear in Proceedings of Machine Learning Research, Volume 149: Machine Learning for Healthcare Conference (MLHC), Virtual, August 6-7, 2021

Via

Access Paper or Ask Questions

Deep Claim: Payer Response Prediction from Claims Data with Deep Learning

Jul 13, 2020

Byung-Hak Kim, Seshadri Sridharan, Andy Atwal, Varun Ganapathi

Figure 1 for Deep Claim: Payer Response Prediction from Claims Data with Deep Learning

Figure 2 for Deep Claim: Payer Response Prediction from Claims Data with Deep Learning

Figure 3 for Deep Claim: Payer Response Prediction from Claims Data with Deep Learning

Figure 4 for Deep Claim: Payer Response Prediction from Claims Data with Deep Learning

Abstract:Each year, almost 10% of claims are denied by payers (i.e., health insurance plans). With the cost to recover these denials and underpayments, predicting payer response (likelihood of payment) from claims data with a high degree of accuracy and precision is anticipated to improve healthcare staffs' performance productivity and drive better patient financial experience and satisfaction in the revenue cycle (Barkholz, 2017). However, constructing advanced predictive analytics models has been considered challenging in the last twenty years. That said, we propose a (low-level) context-dependent compact representation of patients' historical claim records by effectively learning complicated dependencies in the (high-level) claim inputs. Built on this new latent representation, we demonstrate that a deep learning-based framework, Deep Claim, can accurately predict various responses from multiple payers using 2,905,026 de-identified claims data from two US health systems. Deep Claim's improvements over carefully chosen baselines in predicting claim denials are most pronounced as 22.21% relative recall gain (at 95% precision) on Health System A, which implies Deep Claim can find 22.21% more denials than the best baseline system.

* To be presented at the Healthcare Systems, Population Health, and the Role of Health-Tech (HSYS) Workshop at the 37th International Conference on Machine Learning, Vienna, Austria, July 13-18, 2020

Via

Access Paper or Ask Questions

LumièreNet: Lecture Video Synthesis from Audio

Jul 04, 2019

Byung-Hak Kim, Varun Ganapathi

Figure 1 for LumièreNet: Lecture Video Synthesis from Audio

Figure 2 for LumièreNet: Lecture Video Synthesis from Audio

Figure 3 for LumièreNet: Lecture Video Synthesis from Audio

Figure 4 for LumièreNet: Lecture Video Synthesis from Audio

Abstract:We present Lumi\`ereNet, a simple, modular, and completely deep-learning based architecture that synthesizes, high quality, full-pose headshot lecture videos from instructor's new audio narration of any length. Unlike prior works, Lumi\`ereNet is entirely composed of trainable neural network modules to learn mapping functions from the audio to video through (intermediate) estimated pose-based compact and abstract latent codes. Our video demos are available at [22] and [23].

Via

Access Paper or Ask Questions