Picture for Haozhe Zhao

Haozhe Zhao

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Add code
Mar 29, 2026
Viaarxiv icon

Less Data, Faster Convergence: Goal-Driven Data Optimization for Multimodal Instruction Tuning

Add code
Mar 12, 2026
Viaarxiv icon

FaithLens: Detecting and Explaining Faithfulness Hallucination

Add code
Dec 23, 2025
Figure 1 for FaithLens: Detecting and Explaining Faithfulness Hallucination
Figure 2 for FaithLens: Detecting and Explaining Faithfulness Hallucination
Figure 3 for FaithLens: Detecting and Explaining Faithfulness Hallucination
Figure 4 for FaithLens: Detecting and Explaining Faithfulness Hallucination
Viaarxiv icon

MMGR: Multi-Modal Generative Reasoning

Add code
Dec 17, 2025
Figure 1 for MMGR: Multi-Modal Generative Reasoning
Figure 2 for MMGR: Multi-Modal Generative Reasoning
Figure 3 for MMGR: Multi-Modal Generative Reasoning
Figure 4 for MMGR: Multi-Modal Generative Reasoning
Viaarxiv icon

Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning

Add code
May 22, 2025
Viaarxiv icon

Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think

Add code
Feb 27, 2025
Viaarxiv icon

Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering

Add code
Feb 11, 2025
Viaarxiv icon

LongViTU: Instruction Tuning for Long-Form Video Understanding

Add code
Jan 09, 2025
Figure 1 for LongViTU: Instruction Tuning for Long-Form Video Understanding
Figure 2 for LongViTU: Instruction Tuning for Long-Form Video Understanding
Figure 3 for LongViTU: Instruction Tuning for Long-Form Video Understanding
Figure 4 for LongViTU: Instruction Tuning for Long-Form Video Understanding
Viaarxiv icon

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Add code
Dec 30, 2024
Figure 1 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Figure 2 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Figure 3 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Figure 4 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Viaarxiv icon

Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance

Add code
Nov 21, 2024
Viaarxiv icon