Picture for Hao Fei

Hao Fei

MCM-DPO: Multifaceted Cross-Modal Direct Preference Optimization for Alt-text Generation

Add code
Oct 01, 2025
Viaarxiv icon

Enhancing Hyperbole and Metaphor Detection with Their Bidirectional Dynamic Interaction and Emotion Knowledge

Add code
Jun 18, 2025
Viaarxiv icon

DragNeXt: Rethinking Drag-Based Image Editing

Add code
Jun 09, 2025
Viaarxiv icon

Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models

Add code
Jun 09, 2025
Figure 1 for Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models
Figure 2 for Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models
Figure 3 for Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models
Figure 4 for Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models
Viaarxiv icon

On the Adaptive Psychological Persuasion of Large Language Models

Add code
Jun 07, 2025
Viaarxiv icon

Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models

Add code
May 30, 2025
Figure 1 for Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
Figure 2 for Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
Figure 3 for Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
Figure 4 for Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
Viaarxiv icon

SMAP: Self-supervised Motion Adaptation for Physically Plausible Humanoid Whole-body Control

Add code
May 26, 2025
Viaarxiv icon

CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models

Add code
May 25, 2025
Figure 1 for CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models
Figure 2 for CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models
Figure 3 for CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models
Figure 4 for CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models
Viaarxiv icon

So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection

Add code
May 24, 2025
Figure 1 for So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection
Figure 2 for So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection
Figure 3 for So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection
Figure 4 for So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection
Viaarxiv icon

Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought

Add code
May 21, 2025
Viaarxiv icon