Picture for Yan Shu

Yan Shu

StyleTextGen: Style-Conditioned Multilingual Scene Text Generation

Add code
May 14, 2026
Viaarxiv icon

Qwen-Image-VAE-2.0 Technical Report

Add code
May 13, 2026
Viaarxiv icon

Qwen-Image-2.0 Technical Report

Add code
May 11, 2026
Viaarxiv icon

The Eleventh NTIRE 2026 Efficient Super-Resolution Challenge Report

Add code
Apr 03, 2026
Viaarxiv icon

BigEarthNet.txt: A Large-Scale Multi-Sensor Image-Text Dataset and Benchmark for Earth Observation

Add code
Apr 01, 2026
Viaarxiv icon

TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation

Add code
Mar 19, 2026
Viaarxiv icon

Video-BrowseComp: Benchmarking Agentic Video Research on Open Web

Add code
Dec 28, 2025
Viaarxiv icon

Fleming-R1: Toward Expert-Level Medical Reasoning via Reinforcement Learning

Add code
Sep 18, 2025
Viaarxiv icon

Video-XL-2: Towards Very Long-Video Understanding Through Task-Aware KV Sparsification

Add code
Jun 24, 2025
Viaarxiv icon

When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding

Add code
Jun 05, 2025
Figure 1 for When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Figure 2 for When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Figure 3 for When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Figure 4 for When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Viaarxiv icon