Picture for Zhihong Zhu

Zhihong Zhu

MMErroR: A Benchmark for Erroneous Reasoning in Vision-Language Models

Add code
Jan 06, 2026
Viaarxiv icon

UPETrack: Unidirectional Position Estimation for Tracking Occluded Deformable Linear Objects

Add code
Dec 10, 2025
Viaarxiv icon

Partitioner Guided Modal Learning Framework

Add code
Jul 15, 2025
Viaarxiv icon

Turb-L1: Achieving Long-term Turbulence Tracing By Tackling Spectral Bias

Add code
May 25, 2025
Figure 1 for Turb-L1: Achieving Long-term Turbulence Tracing By Tackling Spectral Bias
Figure 2 for Turb-L1: Achieving Long-term Turbulence Tracing By Tackling Spectral Bias
Figure 3 for Turb-L1: Achieving Long-term Turbulence Tracing By Tackling Spectral Bias
Figure 4 for Turb-L1: Achieving Long-term Turbulence Tracing By Tackling Spectral Bias
Viaarxiv icon

CellVerse: Do Large Language Models Really Understand Cell Biology?

Add code
May 09, 2025
Figure 1 for CellVerse: Do Large Language Models Really Understand Cell Biology?
Figure 2 for CellVerse: Do Large Language Models Really Understand Cell Biology?
Figure 3 for CellVerse: Do Large Language Models Really Understand Cell Biology?
Figure 4 for CellVerse: Do Large Language Models Really Understand Cell Biology?
Viaarxiv icon

Enhancing Image Generation Fidelity via Progressive Prompts

Add code
Jan 13, 2025
Figure 1 for Enhancing Image Generation Fidelity via Progressive Prompts
Figure 2 for Enhancing Image Generation Fidelity via Progressive Prompts
Figure 3 for Enhancing Image Generation Fidelity via Progressive Prompts
Figure 4 for Enhancing Image Generation Fidelity via Progressive Prompts
Viaarxiv icon

VASparse: Towards Efficient Visual Hallucination Mitigation for Large Vision-Language Model via Visual-Aware Sparsification

Add code
Jan 11, 2025
Figure 1 for VASparse: Towards Efficient Visual Hallucination Mitigation for Large Vision-Language Model via Visual-Aware Sparsification
Figure 2 for VASparse: Towards Efficient Visual Hallucination Mitigation for Large Vision-Language Model via Visual-Aware Sparsification
Figure 3 for VASparse: Towards Efficient Visual Hallucination Mitigation for Large Vision-Language Model via Visual-Aware Sparsification
Figure 4 for VASparse: Towards Efficient Visual Hallucination Mitigation for Large Vision-Language Model via Visual-Aware Sparsification
Viaarxiv icon

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

Add code
Dec 13, 2024
Figure 1 for DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
Figure 2 for DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
Figure 3 for DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
Figure 4 for DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
Viaarxiv icon

DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval

Add code
Sep 16, 2024
Figure 1 for DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval
Figure 2 for DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval
Figure 3 for DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval
Figure 4 for DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval
Viaarxiv icon

Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation

Add code
Sep 14, 2024
Figure 1 for Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation
Figure 2 for Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation
Figure 3 for Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation
Viaarxiv icon