Picture for Yue Zhang

Yue Zhang

Renmin University of China

PP-OCRv6: From 1.5M to 34.5M Parameters, Surpassing Billion-Scale VLMs on OCR Tasks

Add code
Jun 11, 2026
Viaarxiv icon

AutoMine Solution for AV2 2026 Scenario Mining Challenge

Add code
Jun 10, 2026
Viaarxiv icon

LASA: A Weak Supervision Method for Open-Vocabulary Scene Sketch Semantic Segmentation

Add code
Jun 10, 2026
Viaarxiv icon

EviProp: Seeded Relevance Diffusion on Chunk-Page Graphs for Long Multimodal Document Retrieval

Add code
Jun 08, 2026
Viaarxiv icon

UNIVID: Unified Vision-Language Model for Video Moderation

Add code
Jun 04, 2026
Viaarxiv icon

Beyond Absolute Scores: Relative Edit-induced Difference for Generalizable Image Aesthetic Assessment

Add code
Jun 04, 2026
Viaarxiv icon

PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training

Add code
Jun 02, 2026
Viaarxiv icon

Unified Video-Action Joint Denoising for Dexterous Action and Data Generation

Add code
Jun 02, 2026
Viaarxiv icon

Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

Add code
May 28, 2026
Viaarxiv icon

Active Evidence-Seeking and Diagnostic Reasoning in Large Language Models for Clinical Decision Support

Add code
May 21, 2026
Viaarxiv icon