Picture for Yiwu Zhong

Yiwu Zhong

Rethinking Chain-of-Thought Reasoning for Videos

Add code
Dec 10, 2025
Viaarxiv icon

Webly-Supervised Image Manipulation Localization via Category-Aware Auto-Annotation

Add code
Aug 28, 2025
Figure 1 for Webly-Supervised Image Manipulation Localization via Category-Aware Auto-Annotation
Figure 2 for Webly-Supervised Image Manipulation Localization via Category-Aware Auto-Annotation
Figure 3 for Webly-Supervised Image Manipulation Localization via Category-Aware Auto-Annotation
Figure 4 for Webly-Supervised Image Manipulation Localization via Category-Aware Auto-Annotation
Viaarxiv icon

PAVE: Patching and Adapting Video Large Language Models

Add code
Mar 25, 2025
Viaarxiv icon

AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning

Add code
Dec 04, 2024
Viaarxiv icon

Omni-IML: Towards Unified Image Manipulation Localization

Add code
Nov 22, 2024
Figure 1 for Omni-IML: Towards Unified Image Manipulation Localization
Figure 2 for Omni-IML: Towards Unified Image Manipulation Localization
Figure 3 for Omni-IML: Towards Unified Image Manipulation Localization
Figure 4 for Omni-IML: Towards Unified Image Manipulation Localization
Viaarxiv icon

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

Add code
Oct 15, 2024
Figure 1 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 2 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 3 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 4 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Viaarxiv icon

Enhancing Temporal Modeling of Video LLMs via Time Gating

Add code
Oct 08, 2024
Figure 1 for Enhancing Temporal Modeling of Video LLMs via Time Gating
Figure 2 for Enhancing Temporal Modeling of Video LLMs via Time Gating
Figure 3 for Enhancing Temporal Modeling of Video LLMs via Time Gating
Figure 4 for Enhancing Temporal Modeling of Video LLMs via Time Gating
Viaarxiv icon

Generalized Tampered Scene Text Detection in the era of Generative AI

Add code
Jul 31, 2024
Figure 1 for Generalized Tampered Scene Text Detection in the era of Generative AI
Figure 2 for Generalized Tampered Scene Text Detection in the era of Generative AI
Figure 3 for Generalized Tampered Scene Text Detection in the era of Generative AI
Figure 4 for Generalized Tampered Scene Text Detection in the era of Generative AI
Viaarxiv icon

Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models

Add code
Mar 27, 2024
Figure 1 for Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models
Figure 2 for Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models
Figure 3 for Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models
Figure 4 for Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models
Viaarxiv icon

Towards Learning a Generalist Model for Embodied Navigation

Add code
Dec 06, 2023
Figure 1 for Towards Learning a Generalist Model for Embodied Navigation
Figure 2 for Towards Learning a Generalist Model for Embodied Navigation
Figure 3 for Towards Learning a Generalist Model for Embodied Navigation
Figure 4 for Towards Learning a Generalist Model for Embodied Navigation
Viaarxiv icon