Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Idan Tankel

INFORM-CT: INtegrating LLMs and VLMs FOR Incidental Findings Management in Abdominal CT

Dec 10, 2025

Idan Tankel, Nir Mazor, Rafi Brada, Christina LeBedis, Guy ben-Yosef

Figure 1 for INFORM-CT: INtegrating LLMs and VLMs FOR Incidental Findings Management in Abdominal CT

Figure 2 for INFORM-CT: INtegrating LLMs and VLMs FOR Incidental Findings Management in Abdominal CT

Figure 3 for INFORM-CT: INtegrating LLMs and VLMs FOR Incidental Findings Management in Abdominal CT

Figure 4 for INFORM-CT: INtegrating LLMs and VLMs FOR Incidental Findings Management in Abdominal CT

Abstract:Incidental findings in CT scans, though often benign, can have significant clinical implications and should be reported following established guidelines. Traditional manual inspection by radiologists is time-consuming and variable. This paper proposes a novel framework that leverages large language models (LLMs) and foundational vision-language models (VLMs) in a plan-and-execute agentic approach to improve the efficiency and precision of incidental findings detection, classification, and reporting for abdominal CT scans. Given medical guidelines for abdominal organs, the process of managing incidental findings is automated through a planner-executor framework. The planner, based on LLM, generates Python scripts using predefined base functions, while the executor runs these scripts to perform the necessary checks and detections, via VLMs, segmentation models, and image processing subroutines. We demonstrate the effectiveness of our approach through experiments on a CT abdominal benchmark for three organs, in a fully automatic end-to-end manner. Our results show that the proposed framework outperforms existing pure VLM-based approaches in terms of accuracy and efficiency.

Via

Access Paper or Ask Questions

Real Classification by Description: Extending CLIP's Limits of Part Attributes Recognition

Dec 18, 2024

Ethan Baron, Idan Tankel, Peter Tu, Guy Ben-Yosef

Figure 1 for Real Classification by Description: Extending CLIP's Limits of Part Attributes Recognition

Figure 2 for Real Classification by Description: Extending CLIP's Limits of Part Attributes Recognition

Figure 3 for Real Classification by Description: Extending CLIP's Limits of Part Attributes Recognition

Figure 4 for Real Classification by Description: Extending CLIP's Limits of Part Attributes Recognition

Abstract:In this study, we define and tackle zero shot "real" classification by description, a novel task that evaluates the ability of Vision-Language Models (VLMs) like CLIP to classify objects based solely on descriptive attributes, excluding object class names. This approach highlights the current limitations of VLMs in understanding intricate object descriptions, pushing these models beyond mere object recognition. To facilitate this exploration, we introduce a new challenge and release description data for six popular fine-grained benchmarks, which omit object names to encourage genuine zero-shot learning within the research community. Additionally, we propose a method to enhance CLIP's attribute detection capabilities through targeted training using ImageNet21k's diverse object categories, paired with rich attribute descriptions generated by large language models. Furthermore, we introduce a modified CLIP architecture that leverages multiple resolutions to improve the detection of fine-grained part attributes. Through these efforts, we broaden the understanding of part-attribute recognition in CLIP, improving its performance in fine-grained classification tasks across six popular benchmarks, as well as in the PACO dataset, a widely used benchmark for object-attribute recognition. Code is available at: https://github.com/ethanbar11/grounding_ge_public.

Via

Access Paper or Ask Questions