Picture for Yu Zhou

Yu Zhou

National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China, Fanyu AI Laboratory, Zhongke Fanyu Technology Co., Ltd, Beijing, China

A Correction for the Paper "Symplectic geometry mode decomposition and its application to rotating machinery compound fault diagnosis"

Add code
Aug 29, 2025
Viaarxiv icon

PathMR: Multimodal Visual Reasoning for Interpretable Pathology Diagnosis

Add code
Aug 28, 2025
Viaarxiv icon

TADoc: Robust Time-Aware Document Image Dewarping

Add code
Aug 09, 2025
Viaarxiv icon

Gather and Trace: Rethinking Video TextVQA from an Instance-oriented Perspective

Add code
Aug 06, 2025
Viaarxiv icon

Uni-DocDiff: A Unified Document Restoration Model Based on Diffusion

Add code
Aug 06, 2025
Viaarxiv icon

Step-Audio 2 Technical Report

Add code
Jul 24, 2025
Viaarxiv icon

Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation

Add code
Jul 10, 2025
Viaarxiv icon

The Four Color Theorem for Cell Instance Segmentation

Add code
Jun 11, 2025
Viaarxiv icon

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

Add code
Jun 10, 2025
Viaarxiv icon

When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding

Add code
Jun 05, 2025
Viaarxiv icon