Picture for Yuexian Zou

Yuexian Zou

MolSculpt: Sculpting 3D Molecular Geometries from Chemical Syntax

Add code
Dec 09, 2025
Viaarxiv icon

IC-Custom: Diverse Image Customization via In-Context Learning

Add code
Jul 02, 2025
Viaarxiv icon

Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation

Add code
Jun 14, 2025
Figure 1 for Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation
Figure 2 for Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation
Figure 3 for Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation
Figure 4 for Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation
Viaarxiv icon

BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing

Add code
Mar 17, 2025
Viaarxiv icon

ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors

Add code
Feb 22, 2025
Figure 1 for ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors
Figure 2 for ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors
Figure 3 for ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors
Figure 4 for ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors
Viaarxiv icon

Do we really have to filter out random noise in pre-training data for language models?

Add code
Feb 10, 2025
Figure 1 for Do we really have to filter out random noise in pre-training data for language models?
Figure 2 for Do we really have to filter out random noise in pre-training data for language models?
Figure 3 for Do we really have to filter out random noise in pre-training data for language models?
Figure 4 for Do we really have to filter out random noise in pre-training data for language models?
Viaarxiv icon

VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model

Add code
Jan 21, 2025
Figure 1 for VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
Figure 2 for VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
Figure 3 for VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
Figure 4 for VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
Viaarxiv icon

Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding

Add code
Jan 19, 2025
Viaarxiv icon

VASparse: Towards Efficient Visual Hallucination Mitigation for Large Vision-Language Model via Visual-Aware Sparsification

Add code
Jan 11, 2025
Figure 1 for VASparse: Towards Efficient Visual Hallucination Mitigation for Large Vision-Language Model via Visual-Aware Sparsification
Figure 2 for VASparse: Towards Efficient Visual Hallucination Mitigation for Large Vision-Language Model via Visual-Aware Sparsification
Figure 3 for VASparse: Towards Efficient Visual Hallucination Mitigation for Large Vision-Language Model via Visual-Aware Sparsification
Figure 4 for VASparse: Towards Efficient Visual Hallucination Mitigation for Large Vision-Language Model via Visual-Aware Sparsification
Viaarxiv icon

CAR: Controllable Autoregressive Modeling for Visual Generation

Add code
Oct 07, 2024
Figure 1 for CAR: Controllable Autoregressive Modeling for Visual Generation
Figure 2 for CAR: Controllable Autoregressive Modeling for Visual Generation
Figure 3 for CAR: Controllable Autoregressive Modeling for Visual Generation
Figure 4 for CAR: Controllable Autoregressive Modeling for Visual Generation
Viaarxiv icon