Picture for Yu Zhou

Yu Zhou

National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China, Fanyu AI Laboratory, Zhongke Fanyu Technology Co., Ltd, Beijing, China

When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding

Add code
Jun 05, 2025
Viaarxiv icon

Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts

Add code
Jun 05, 2025
Viaarxiv icon

VidText: Towards Comprehensive Evaluation for Video Text Understanding

Add code
May 28, 2025
Viaarxiv icon

CROP: Contextual Region-Oriented Visual Token Pruning

Add code
May 27, 2025
Viaarxiv icon

RoBiS: Robust Binary Segmentation for High-Resolution Industrial Images

Add code
May 27, 2025
Viaarxiv icon

The Role of Video Generation in Enhancing Data-Limited Action Understanding

Add code
May 26, 2025
Viaarxiv icon

CODE-DITING: A Reasoning-Based Metric for Functional Alignment in Code Evaluation

Add code
May 26, 2025
Viaarxiv icon

An Empirical Study on Configuring In-Context Learning Demonstrations for Unleashing MLLMs' Sentimental Perception Capability

Add code
May 22, 2025
Viaarxiv icon

The Devil is in Fine-tuning and Long-tailed Problems:A New Benchmark for Scene Text Detection

Add code
May 21, 2025
Viaarxiv icon

FedRE: Robust and Effective Federated Learning with Privacy Preference

Add code
May 08, 2025
Viaarxiv icon