Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mong Li Lee

TRUST-VL: An Explainable News Assistant for General Multimodal Misinformation Detection

Sep 04, 2025

Zehong Yan, Peng Qi, Wynne Hsu, Mong Li Lee

Abstract:Multimodal misinformation, encompassing textual, visual, and cross-modal distortions, poses an increasing societal threat that is amplified by generative AI. Existing methods typically focus on a single type of distortion and struggle to generalize to unseen scenarios. In this work, we observe that different distortion types share common reasoning capabilities while also requiring task-specific skills. We hypothesize that joint training across distortion types facilitates knowledge sharing and enhances the model's ability to generalize. To this end, we introduce TRUST-VL, a unified and explainable vision-language model for general multimodal misinformation detection. TRUST-VL incorporates a novel Question-Aware Visual Amplifier module, designed to extract task-specific visual features. To support training, we also construct TRUST-Instruct, a large-scale instruction dataset containing 198K samples featuring structured reasoning chains aligned with human fact-checking workflows. Extensive experiments on both in-domain and zero-shot benchmarks demonstrate that TRUST-VL achieves state-of-the-art performance, while also offering strong generalization and interpretability.

* EMNLP 2025; Project Homepage: https://yanzehong.github.io/trust-vl/

Via

Access Paper or Ask Questions

From Personas to Talks: Revisiting the Impact of Personas on LLM-Synthesized Emotional Support Conversations

Feb 17, 2025

Shenghan Wu, Yang Deng, Yimo Zhu, Wynne Hsu, Mong Li Lee

Figure 1 for From Personas to Talks: Revisiting the Impact of Personas on LLM-Synthesized Emotional Support Conversations

Figure 2 for From Personas to Talks: Revisiting the Impact of Personas on LLM-Synthesized Emotional Support Conversations

Figure 3 for From Personas to Talks: Revisiting the Impact of Personas on LLM-Synthesized Emotional Support Conversations

Figure 4 for From Personas to Talks: Revisiting the Impact of Personas on LLM-Synthesized Emotional Support Conversations

Abstract:The rapid advancement of Large Language Models (LLMs) has revolutionized the generation of emotional support conversations (ESC), offering scalable solutions with reduced costs and enhanced data privacy. This paper explores the role of personas in the creation of ESC by LLMs. Our research utilizes established psychological frameworks to measure and infuse persona traits into LLMs, which then generate dialogues in the emotional support scenario. We conduct extensive evaluations to understand the stability of persona traits in dialogues, examining shifts in traits post-generation and their impact on dialogue quality and strategy distribution. Experimental results reveal several notable findings: 1) LLMs can infer core persona traits, 2) subtle shifts in emotionality and extraversion occur, influencing the dialogue dynamics, and 3) the application of persona traits modifies the distribution of emotional support strategies, enhancing the relevance and empathetic quality of the responses. These findings highlight the potential of persona-driven LLMs in crafting more personalized, empathetic, and effective emotional support dialogues, which has significant implications for the future design of AI-driven emotional support systems.

Via

Access Paper or Ask Questions

Mitigating GenAI-powered Evidence Pollution for Out-of-Context Multimodal Misinformation Detection

Jan 24, 2025

Zehong Yan, Peng Qi, Wynne Hsu, Mong Li Lee

Abstract:While large generative artificial intelligence (GenAI) models have achieved significant success, they also raise growing concerns about online information security due to their potential misuse for generating deceptive content. Out-of-context (OOC) multimodal misinformation detection, which often retrieves Web evidence to identify the repurposing of images in false contexts, faces the issue of reasoning over GenAI-polluted evidence to derive accurate predictions. Existing works simulate GenAI-powered pollution at the claim level with stylistic rewriting to conceal linguistic cues, and ignore evidence-level pollution for such information-seeking applications. In this work, we investigate how polluted evidence affects the performance of existing OOC detectors, revealing a performance degradation of more than 9 percentage points. We propose two strategies, cross-modal evidence reranking and cross-modal claim-evidence reasoning, to address the challenges posed by polluted evidence. Extensive experiments on two benchmark datasets show that these strategies can effectively enhance the robustness of existing out-of-context detectors amidst polluted evidence.

* 12 pages, 11 figures

Via

Access Paper or Ask Questions

ChronoFact: Timeline-based Temporal Fact Verification

Oct 19, 2024

Anab Maulana Barik, Wynne Hsu, Mong Li Lee

Figure 1 for ChronoFact: Timeline-based Temporal Fact Verification

Figure 2 for ChronoFact: Timeline-based Temporal Fact Verification

Figure 3 for ChronoFact: Timeline-based Temporal Fact Verification

Figure 4 for ChronoFact: Timeline-based Temporal Fact Verification

Abstract:Automated fact verification plays an essential role in fostering trust in the digital space. Despite the growing interest, the verification of temporal facts has not received much attention in the community. Temporal fact verification brings new challenges where cues of the temporal information need to be extracted and temporal reasoning involving various temporal aspects of the text must be applied. In this work, we propose an end-to-end solution for temporal fact verification that considers the temporal information in claims to obtain relevant evidence sentences and harness the power of large language model for temporal reasoning. Recognizing that temporal facts often involve events, we model these events in the claim and evidence sentences. We curate two temporal fact datasets to learn time-sensitive representations that encapsulate not only the semantic relationships among the events, but also their chronological proximity. This allows us to retrieve the top-k relevant evidence sentences and provide the context for a large language model to perform temporal reasoning and outputs whether a claim is supported or refuted by the retrieved evidence sentences. Experiment results demonstrate that the proposed approach significantly enhances the accuracy of temporal claim verification, thereby advancing current state-of-the-art in automated fact verification.

Via

Access Paper or Ask Questions

Evidence-Based Temporal Fact Verification

Jul 21, 2024

Anab Maulana Barik, Wynne Hsu, Mong Li Lee

Via

Access Paper or Ask Questions

Cross-Domain Feature Augmentation for Domain Generalization

May 14, 2024

Yingnan Liu, Yingtian Zou, Rui Qiao, Fusheng Liu, Mong Li Lee, Wynne Hsu

Figure 1 for Cross-Domain Feature Augmentation for Domain Generalization

Figure 2 for Cross-Domain Feature Augmentation for Domain Generalization

Figure 3 for Cross-Domain Feature Augmentation for Domain Generalization

Figure 4 for Cross-Domain Feature Augmentation for Domain Generalization

Abstract:Domain generalization aims to develop models that are robust to distribution shifts. Existing methods focus on learning invariance across domains to enhance model robustness, and data augmentation has been widely used to learn invariant predictors, with most methods performing augmentation in the input space. However, augmentation in the input space has limited diversity whereas in the feature space is more versatile and has shown promising results. Nonetheless, feature semantics is seldom considered and existing feature augmentation methods suffer from a limited variety of augmented features. We decompose features into class-generic, class-specific, domain-generic, and domain-specific components. We propose a cross-domain feature augmentation method named XDomainMix that enables us to increase sample diversity while emphasizing the learning of invariant representations to achieve domain generalization. Experiments on widely used benchmark datasets demonstrate that our proposed method is able to achieve state-of-the-art performance. Quantitative analysis indicates that our feature augmentation approach facilitates the learning of effective models that are invariant across different domains.

* Accepted to the 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024); Code is available at https://github.com/NancyQuris/XDomainMix

Via

Access Paper or Ask Questions

SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection

Mar 05, 2024

Peng Qi, Zehong Yan, Wynne Hsu, Mong Li Lee

Abstract:Misinformation is a prevalent societal issue due to its potential high risks. Out-of-context (OOC) misinformation, where authentic images are repurposed with false text, is one of the easiest and most effective ways to mislead audiences. Current methods focus on assessing image-text consistency but lack convincing explanations for their judgments, which is essential for debunking misinformation. While Multimodal Large Language Models (MLLMs) have rich knowledge and innate capability for visual reasoning and explanation generation, they still lack sophistication in understanding and discovering the subtle crossmodal differences. In this paper, we introduce SNIFFER, a novel multimodal large language model specifically engineered for OOC misinformation detection and explanation. SNIFFER employs two-stage instruction tuning on InstructBLIP. The first stage refines the model's concept alignment of generic objects with news-domain entities and the second stage leverages language-only GPT-4 generated OOC-specific instruction data to fine-tune the model's discriminatory powers. Enhanced by external tools and retrieval, SNIFFER not only detects inconsistencies between text and image but also utilizes external knowledge for contextual verification. Our experiments show that SNIFFER surpasses the original MLLM by over 40% and outperforms state-of-the-art methods in detection accuracy. SNIFFER also provides accurate and persuasive explanations as validated by quantitative and human evaluations.

* To appear in CVPR 2024

Via

Access Paper or Ask Questions

Leveraging Old Knowledge to Continually Learn New Classes in Medical Images

Mar 24, 2023

Evelyn Chee, Mong Li Lee, Wynne Hsu

Abstract:Class-incremental continual learning is a core step towards developing artificial intelligence systems that can continuously adapt to changes in the environment by learning new concepts without forgetting those previously learned. This is especially needed in the medical domain where continually learning from new incoming data is required to classify an expanded set of diseases. In this work, we focus on how old knowledge can be leveraged to learn new classes without catastrophic forgetting. We propose a framework that comprises of two main components: (1) a dynamic architecture with expanding representations to preserve previously learned features and accommodate new features; and (2) a training procedure alternating between two objectives to balance the learning of new features while maintaining the model's performance on old classes. Experiment results on multiple medical datasets show that our solution is able to achieve superior performance over state-of-the-art baselines in terms of class accuracy and forgetting.

* Accepted to AAAI23

Via

Access Paper or Ask Questions

Distributional Shifts in Automated Diabetic Retinopathy Screening

Jul 25, 2021

Jay Nandy, Wynne Hsu, Mong Li Lee

Figure 1 for Distributional Shifts in Automated Diabetic Retinopathy Screening

Figure 2 for Distributional Shifts in Automated Diabetic Retinopathy Screening

Figure 3 for Distributional Shifts in Automated Diabetic Retinopathy Screening

Figure 4 for Distributional Shifts in Automated Diabetic Retinopathy Screening

Abstract:Deep learning-based models are developed to automatically detect if a retina image is `referable' in diabetic retinopathy (DR) screening. However, their classification accuracy degrades as the input images distributionally shift from their training distribution. Further, even if the input is not a retina image, a standard DR classifier produces a high confident prediction that the image is `referable'. Our paper presents a Dirichlet Prior Network-based framework to address this issue. It utilizes an out-of-distribution (OOD) detector model and a DR classification model to improve generalizability by identifying OOD images. Experiments on real-world datasets indicate that the proposed framework can eliminate the unknown non-retina images and identify the distributionally shifted retina images for human intervention.

* Accepted at IEEE ICIP 2021

Via

Access Paper or Ask Questions

Towards Fully Interpretable Deep Neural Networks: Are We There Yet?

Jun 24, 2021

Sandareka Wickramanayake, Wynne Hsu, Mong Li Lee

Figure 1 for Towards Fully Interpretable Deep Neural Networks: Are We There Yet?

Figure 2 for Towards Fully Interpretable Deep Neural Networks: Are We There Yet?

Figure 3 for Towards Fully Interpretable Deep Neural Networks: Are We There Yet?

Abstract:Despite the remarkable performance, Deep Neural Networks (DNNs) behave as black-boxes hindering user trust in Artificial Intelligence (AI) systems. Research on opening black-box DNN can be broadly categorized into post-hoc methods and inherently interpretable DNNs. While many surveys have been conducted on post-hoc interpretation methods, little effort is devoted to inherently interpretable DNNs. This paper provides a review of existing methods to develop DNNs with intrinsic interpretability, with a focus on Convolutional Neural Networks (CNNs). The aim is to understand the current progress towards fully interpretable DNNs that can cater to different interpretation requirements. Finally, we identify gaps in current work and suggest potential research directions.

* Presented at the ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI

Via

Access Paper or Ask Questions