Picture for Hao Fei

Hao Fei

JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation

Add code
Dec 28, 2025
Viaarxiv icon

Event Extraction in Large Language Model

Add code
Dec 22, 2025
Viaarxiv icon

Training LLMs with LogicReward for Faithful and Rigorous Reasoning

Add code
Dec 20, 2025
Figure 1 for Training LLMs with LogicReward for Faithful and Rigorous Reasoning
Figure 2 for Training LLMs with LogicReward for Faithful and Rigorous Reasoning
Figure 3 for Training LLMs with LogicReward for Faithful and Rigorous Reasoning
Figure 4 for Training LLMs with LogicReward for Faithful and Rigorous Reasoning
Viaarxiv icon

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

Add code
Nov 11, 2025
Viaarxiv icon

MCM-DPO: Multifaceted Cross-Modal Direct Preference Optimization for Alt-text Generation

Add code
Oct 01, 2025
Viaarxiv icon

Enhancing Hyperbole and Metaphor Detection with Their Bidirectional Dynamic Interaction and Emotion Knowledge

Add code
Jun 18, 2025
Viaarxiv icon

Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models

Add code
Jun 09, 2025
Figure 1 for Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models
Figure 2 for Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models
Figure 3 for Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models
Figure 4 for Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models
Viaarxiv icon

DragNeXt: Rethinking Drag-Based Image Editing

Add code
Jun 09, 2025
Figure 1 for DragNeXt: Rethinking Drag-Based Image Editing
Figure 2 for DragNeXt: Rethinking Drag-Based Image Editing
Figure 3 for DragNeXt: Rethinking Drag-Based Image Editing
Figure 4 for DragNeXt: Rethinking Drag-Based Image Editing
Viaarxiv icon

On the Adaptive Psychological Persuasion of Large Language Models

Add code
Jun 07, 2025
Viaarxiv icon

Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models

Add code
May 30, 2025
Figure 1 for Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
Figure 2 for Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
Figure 3 for Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
Figure 4 for Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
Viaarxiv icon