Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zehui Liao

VIHD: Visual Intervention-based Hallucination Detection for Medical Visual Question Answering

May 20, 2026

Jiayi Chen, Benteng Ma, Zehui Liao, Winston Chong, Yasmeen George, Jianfei Cai

Abstract:While medical Multimodal Large Language Models (MLLMs) have shown promise in assisting diagnosis, they still frequently generate hallucinated responses that appear linguistically plausible but lack visual evidence. Such hallucinations pose risks to clinical decision-making and necessitate effective detection. Existing introspective detection methods primarily perform uncertainty estimation or logical verification by analyzing model responses conditioned on original or perturbed inputs. However, such external perturbations are often heuristic and context-agnostic, which overlooks the internal cross-modal dependency between generated tokens and related visual tokens during decoding. To address this issue, we propose VIHD, a Visual Intervention-based Hallucination Detection method that leverages targeted visual token masking to calibrate semantic entropy for more effective hallucination detection. VIHD locates visually dominant decoder layers via Visual Dependency Probing (VDP), executes Visual Intervention Decoding (VID) via token masking to calibrate the semantic distribution, and quantifies the resulting Calibrated Semantic Entropy (CSE) as a reliable hallucination signal. Extensive experiments on three medical VQA benchmarks with two medical MLLMs demonstrate that VIHD consistently outperforms state-of-the-art methods, underscoring the importance of fine-grained visual dependency for hallucination detection. The code will be available at https://github.com/Jiayi-Chen-AU/VIHD

* Early accepted by MICCAI 2026

Via

Access Paper or Ask Questions

V-Loop: Visual Logical Loop Verification for Hallucination Detection in Medical Visual Question Answering

Jan 26, 2026

Mengyuan Jin, Zehui Liao, Yong Xia

Abstract:Multimodal Large Language Models (MLLMs) have shown remarkable capability in assisting disease diagnosis in medical visual question answering (VQA). However, their outputs remain vulnerable to hallucinations (i.e., responses that contradict visual facts), posing significant risks in high-stakes medical scenarios. Recent introspective detection methods, particularly uncertainty-based approaches, offer computational efficiency but are fundamentally indirect, as they estimate predictive uncertainty for an image-question pair rather than verifying the factual correctness of a specific answer. To address this limitation, we propose Visual Logical Loop Verification (V-Loop), a training-free and plug-and-play framework for hallucination detection in medical VQA. V-Loop introduces a bidirectional reasoning process that forms a visually grounded logical loop to verify factual correctness. Given an input, the MLLM produces an answer for the primary input pair. V-Loop extracts semantic units from the primary QA pair, generates a verification question by conditioning on the answer unit to re-query the question unit, and enforces visual attention consistency to ensure answering both primary question and verification question rely on the same image evidence. If the verification answer matches the expected semantic content, the logical loop closes, indicating factual grounding; otherwise, the primary answer is flagged as hallucinated. Extensive experiments on multiple medical VQA benchmarks and MLLMs show that V-Loop consistently outperforms existing introspective methods, remains highly efficient, and further boosts uncertainty-based approaches when used in combination.

Via

Access Paper or Ask Questions

Vision-Amplified Semantic Entropy for Hallucination Detection in Medical Visual Question Answering

Mar 26, 2025

Zehui Liao, Shishuai Hu, Ke Zou, Huazhu Fu, Liangli Zhen, Yong Xia

Abstract:Multimodal large language models (MLLMs) have demonstrated significant potential in medical Visual Question Answering (VQA). Yet, they remain prone to hallucinations-incorrect responses that contradict input images, posing substantial risks in clinical decision-making. Detecting these hallucinations is essential for establishing trust in MLLMs among clinicians and patients, thereby enabling their real-world adoption. Current hallucination detection methods, especially semantic entropy (SE), have demonstrated promising hallucination detection capacity for LLMs. However, adapting SE to medical MLLMs by incorporating visual perturbations presents a dilemma. Weak perturbations preserve image content and ensure clinical validity, but may be overlooked by medical MLLMs, which tend to over rely on language priors. In contrast, strong perturbations can distort essential diagnostic features, compromising clinical interpretation. To address this issue, we propose Vision Amplified Semantic Entropy (VASE), which incorporates weak image transformations and amplifies the impact of visual input, to improve hallucination detection in medical VQA. We first estimate the semantic predictive distribution under weak visual transformations to preserve clinical validity, and then amplify visual influence by contrasting this distribution with that derived from a distorted image. The entropy of the resulting distribution is estimated as VASE. Experiments on two medical open-ended VQA datasets demonstrate that VASE consistently outperforms existing hallucination detection methods.

* 11 pages, 2 figures

Via

Access Paper or Ask Questions

Unleashing the Potential of Open-set Noisy Samples Against Label Noise for Medical Image Classification

Jun 18, 2024

Zehui Liao, Shishuai Hu, Yong Xia

Figure 1 for Unleashing the Potential of Open-set Noisy Samples Against Label Noise for Medical Image Classification

Figure 2 for Unleashing the Potential of Open-set Noisy Samples Against Label Noise for Medical Image Classification

Figure 3 for Unleashing the Potential of Open-set Noisy Samples Against Label Noise for Medical Image Classification

Figure 4 for Unleashing the Potential of Open-set Noisy Samples Against Label Noise for Medical Image Classification

Abstract:The challenge of addressing mixed closed-set and open-set label noise in medical image classification remains largely unexplored. Unlike natural image classification where there is a common practice of segregation and separate processing of closed-set and open-set noisy samples from clean ones, medical image classification faces difficulties due to high inter-class similarity which complicates the identification of open-set noisy samples. Moreover, prevailing methods do not leverage the full potential of open-set noisy samples for label noise mitigation, often leading to their exclusion or application of uniform soft labels. To address these issues, we propose an Extended Noise-robust Contrastive and Open-set Feature Augmentation (ENCOFA) framework. ENCOFA includes the Extended Noise-robust Supervised Contrastive (ENSC) Loss, which aids in distinguishing features across classes. The ENSC loss regards open-set noisy samples as an extended class and mitigates label noise by weighting contrastive pairs with label reliability. Furthermore, we develop an Open-set Feature Augmentation (OSFeatAug) module that enriches the features of open-set samples, utilizing the model's extra capacity to prevent overfitting to noisy data. We conducted experiments on a synthetic noisy dataset and a real-world noisy dataset. Our results indicate the superiority of ENCOFA and the effectiveness of leveraging the open-set noisy samples to combat label noise.

* 10 pages, 1 figure

Via

Access Paper or Ask Questions

Towards Clinician-Preferred Segmentation: Leveraging Human-in-the-Loop for Test Time Adaptation in Medical Image Segmentation

May 14, 2024

Shishuai Hu, Zehui Liao, Zeyou Liu, Yong Xia

Figure 1 for Towards Clinician-Preferred Segmentation: Leveraging Human-in-the-Loop for Test Time Adaptation in Medical Image Segmentation

Figure 2 for Towards Clinician-Preferred Segmentation: Leveraging Human-in-the-Loop for Test Time Adaptation in Medical Image Segmentation

Figure 3 for Towards Clinician-Preferred Segmentation: Leveraging Human-in-the-Loop for Test Time Adaptation in Medical Image Segmentation

Figure 4 for Towards Clinician-Preferred Segmentation: Leveraging Human-in-the-Loop for Test Time Adaptation in Medical Image Segmentation

Abstract:Deep learning-based medical image segmentation models often face performance degradation when deployed across various medical centers, largely due to the discrepancies in data distribution. Test Time Adaptation (TTA) methods, which adapt pre-trained models to test data, have been employed to mitigate such discrepancies. However, existing TTA methods primarily focus on manipulating Batch Normalization (BN) layers or employing prompt and adversarial learning, which may not effectively rectify the inconsistencies arising from divergent data distributions. In this paper, we propose a novel Human-in-the-loop TTA (HiTTA) framework that stands out in two significant ways. First, it capitalizes on the largely overlooked potential of clinician-corrected predictions, integrating these corrections into the TTA process to steer the model towards predictions that coincide more closely with clinical annotation preferences. Second, our framework conceives a divergence loss, designed specifically to diminish the prediction divergence instigated by domain disparities, through the careful calibration of BN parameters. Our HiTTA is distinguished by its dual-faceted capability to acclimatize to the distribution of test data whilst ensuring the model's predictions align with clinical expectations, thereby enhancing its relevance in a medical context. Extensive experiments on a public dataset underscore the superiority of our HiTTA over existing TTA methods, emphasizing the advantages of integrating human feedback and our divergence loss in enhancing the model's performance and adaptability across diverse medical centers.

Via

Access Paper or Ask Questions

URL: Combating Label Noise for Lung Nodule Malignancy Grading

Aug 17, 2023

Xianze Ai, Zehui Liao, Yong Xia

Figure 1 for URL: Combating Label Noise for Lung Nodule Malignancy Grading

Figure 2 for URL: Combating Label Noise for Lung Nodule Malignancy Grading

Figure 3 for URL: Combating Label Noise for Lung Nodule Malignancy Grading

Figure 4 for URL: Combating Label Noise for Lung Nodule Malignancy Grading

Abstract:Due to the complexity of annotation and inter-annotator variability, most lung nodule malignancy grading datasets contain label noise, which inevitably degrades the performance and generalizability of models. Although researchers adopt the label-noise-robust methods to handle label noise for lung nodule malignancy grading, they do not consider the inherent ordinal relation among classes of this task. To model the ordinal relation among classes to facilitate tackling label noise in this task, we propose a Unimodal-Regularized Label-noise-tolerant (URL) framework. Our URL contains two stages, the Supervised Contrastive Learning (SCL) stage and the Memory pseudo-labels generation and Unimodal regularization (MU) stage. In the SCL stage, we select reliable samples and adopt supervised contrastive learning to learn better representations. In the MU stage, we split samples with multiple annotations into multiple samples with a single annotation and shuffle them into different batches. To handle label noise, pseudo-labels are generated using the similarity between each sample and the central feature of each class, and temporal ensembling is used to obtain memory pseudo-labels that supervise the model training. To model the ordinal relation, we introduce unimodal regularization to keep the ordinal relation among classes in the predictions. Moreover, each lung nodule is characterized by three orthographic views. Experiments conducted on the LIDC-IDRI dataset indicate the superiority of our URL over other competing methods. Code is available at https://github.com/axz520/UR.

* 11 pages, accepted by DALI@MICCAI2023

Via

Access Paper or Ask Questions

Devil is in Channels: Contrastive Single Domain Generalization for Medical Image Segmentation

Jun 24, 2023

Shishuai Hu, Zehui Liao, Yong Xia

Figure 1 for Devil is in Channels: Contrastive Single Domain Generalization for Medical Image Segmentation

Figure 2 for Devil is in Channels: Contrastive Single Domain Generalization for Medical Image Segmentation

Figure 3 for Devil is in Channels: Contrastive Single Domain Generalization for Medical Image Segmentation

Figure 4 for Devil is in Channels: Contrastive Single Domain Generalization for Medical Image Segmentation

Abstract:Deep learning-based medical image segmentation models suffer from performance degradation when deployed to a new healthcare center. To address this issue, unsupervised domain adaptation and multi-source domain generalization methods have been proposed, which, however, are less favorable for clinical practice due to the cost of acquiring target-domain data and the privacy concerns associated with redistributing the data from multiple source domains. In this paper, we propose a \textbf{C}hannel-level \textbf{C}ontrastive \textbf{S}ingle \textbf{D}omain \textbf{G}eneralization (\textbf{C$^2$SDG}) model for medical image segmentation. In C$^2$SDG, the shallower features of each image and its style-augmented counterpart are extracted and used for contrastive training, resulting in the disentangled style representations and structure representations. The segmentation is performed based solely on the structure representations. Our method is novel in the contrastive perspective that enables channel-wise feature disentanglement using a single source domain. We evaluated C$^2$SDG against six SDG methods on a multi-domain joint optic cup and optic disc segmentation benchmark. Our results suggest the effectiveness of each module in C$^2$SDG and also indicate that C$^2$SDG outperforms the baseline and all competing methods with a large margin. The code will be available at \url{https://github.com/ShishuaiHu/CCSDG}.

* MICCAI 2023 Early Accept (Top 14%), 12 pages, 5 figures

Via

Access Paper or Ask Questions

Transformer-based Annotation Bias-aware Medical Image Segmentation

Jun 02, 2023

Zehui Liao, Yutong Xie, Shishuai Hu, Yong Xia

Figure 1 for Transformer-based Annotation Bias-aware Medical Image Segmentation

Figure 2 for Transformer-based Annotation Bias-aware Medical Image Segmentation

Figure 3 for Transformer-based Annotation Bias-aware Medical Image Segmentation

Figure 4 for Transformer-based Annotation Bias-aware Medical Image Segmentation

Abstract:Manual medical image segmentation is subjective and suffers from annotator-related bias, which can be mimicked or amplified by deep learning methods. Recently, researchers have suggested that such bias is the combination of the annotator preference and stochastic error, which are modeled by convolution blocks located after decoder and pixel-wise independent Gaussian distribution, respectively. It is unlikely that convolution blocks can effectively model the varying degrees of preference at the full resolution level. Additionally, the independent pixel-wise Gaussian distribution disregards pixel correlations, leading to a discontinuous boundary. This paper proposes a Transformer-based Annotation Bias-aware (TAB) medical image segmentation model, which tackles the annotator-related bias via modeling annotator preference and stochastic errors. TAB employs the Transformer with learnable queries to extract the different preference-focused features. This enables TAB to produce segmentation with various preferences simultaneously using a single segmentation head. Moreover, TAB takes the multivariant normal distribution assumption that models pixel correlations, and learns the annotation distribution to disentangle the stochastic error. We evaluated our TAB on an OD/OC segmentation benchmark annotated by six annotators. Our results suggest that TAB outperforms existing medical image segmentation models which take into account the annotator-related bias.

* 11 pages, 2 figures

Via

Access Paper or Ask Questions

Instance-specific Label Distribution Regularization for Learning with Label Noise

Dec 16, 2022

Zehui Liao, Shishuai Hu, Yutong Xie, Yong Xia

Figure 1 for Instance-specific Label Distribution Regularization for Learning with Label Noise

Figure 2 for Instance-specific Label Distribution Regularization for Learning with Label Noise

Figure 3 for Instance-specific Label Distribution Regularization for Learning with Label Noise

Figure 4 for Instance-specific Label Distribution Regularization for Learning with Label Noise

Abstract:Modeling noise transition matrix is a kind of promising method for learning with label noise. Based on the estimated noise transition matrix and the noisy posterior probabilities, the clean posterior probabilities, which are jointly called Label Distribution (LD) in this paper, can be calculated as the supervision. To reliably estimate the noise transition matrix, some methods assume that anchor points are available during training. Nonetheless, if anchor points are invalid, the noise transition matrix might be poorly learned, resulting in poor performance. Consequently, other methods treat reliable data points, extracted from training data, as pseudo anchor points. However, from a statistical point of view, the noise transition matrix can be inferred from data with noisy labels under the clean-label-domination assumption. Therefore, we aim to estimate the noise transition matrix without (pseudo) anchor points. There is evidence showing that samples are more likely to be mislabeled as other similar class labels, which means the mislabeling probability is highly correlated with the inter-class correlation. Inspired by this observation, we propose an instance-specific Label Distribution Regularization (LDR), in which the instance-specific LD is estimated as the supervision, to prevent DCNNs from memorizing noisy labels. Specifically, we estimate the noisy posterior under the supervision of noisy labels, and approximate the batch-level noise transition matrix by estimating the inter-class correlation matrix with neither anchor points nor pseudo anchor points. Experimental results on two synthetic noisy datasets and two real-world noisy datasets demonstrate that our LDR outperforms existing methods.

Via

Access Paper or Ask Questions

ProSFDA: Prompt Learning based Source-free Domain Adaptation for Medical Image Segmentation

Nov 21, 2022

Shishuai Hu, Zehui Liao, Yong Xia

Figure 1 for ProSFDA: Prompt Learning based Source-free Domain Adaptation for Medical Image Segmentation

Figure 2 for ProSFDA: Prompt Learning based Source-free Domain Adaptation for Medical Image Segmentation

Figure 3 for ProSFDA: Prompt Learning based Source-free Domain Adaptation for Medical Image Segmentation

Figure 4 for ProSFDA: Prompt Learning based Source-free Domain Adaptation for Medical Image Segmentation

Abstract:The domain discrepancy existed between medical images acquired in different situations renders a major hurdle in deploying pre-trained medical image segmentation models for clinical use. Since it is less possible to distribute training data with the pre-trained model due to the huge data size and privacy concern, source-free unsupervised domain adaptation (SFDA) has recently been increasingly studied based on either pseudo labels or prior knowledge. However, the image features and probability maps used by pseudo label-based SFDA and the consistent prior assumption and the prior prediction network used by prior-guided SFDA may become less reliable when the domain discrepancy is large. In this paper, we propose a \textbf{Pro}mpt learning based \textbf{SFDA} (\textbf{ProSFDA}) method for medical image segmentation, which aims to improve the quality of domain adaption by minimizing explicitly the domain discrepancy. Specifically, in the prompt learning stage, we estimate source-domain images via adding a domain-aware prompt to target-domain images, then optimize the prompt via minimizing the statistic alignment loss, and thereby prompt the source model to generate reliable predictions on (altered) target-domain images. In the feature alignment stage, we also align the features of target-domain images and their styles-augmented counterparts to optimize the source model, and hence push the model to extract compact features. We evaluate our ProSFDA on two multi-domain medical image segmentation benchmarks. Our results indicate that the proposed ProSFDA outperforms substantially other SFDA methods and is even comparable to UDA methods. Code will be available at \url{https://github.com/ShishuaiHu/ProSFDA}.

Via

Access Paper or Ask Questions