Picture for Pei Fu

Pei Fu

UniTranslator: A Unified Multi-modal Framework for End-to-end In-Image Machine Translation

Add code
Jun 23, 2026
Viaarxiv icon

ELVA: Exploring Ranking-Driven Universal Multimodal Retrieval

Add code
Jun 18, 2026
Viaarxiv icon

Teaching the Way, Not the Answer: Privileged Tutoring Distillation for Multimodal Policy Optimization

Add code
Jun 05, 2026
Viaarxiv icon

Beyond Binary: Reframing GUI Critique as Continuous Semantic Alignment

Add code
May 14, 2026
Viaarxiv icon

Doc-V*:Coarse-to-Fine Interactive Visual Reasoning for Multi-Page Document VQA

Add code
Apr 15, 2026
Viaarxiv icon

Q-Mask: Query-driven Causal Masks for Text Anchoring in OCR-Oriented Vision-Language Models

Add code
Mar 31, 2026
Viaarxiv icon

IMTBench: A Multi-Scenario Cross-Modal Collaborative Evaluation Benchmark for In-Image Machine Translation

Add code
Mar 11, 2026
Viaarxiv icon

EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models

Add code
Feb 27, 2026
Viaarxiv icon

PositionOCR: Augmenting Positional Awareness in Multi-Modal Models via Hybrid Specialist Integration

Add code
Feb 22, 2026
Viaarxiv icon

GAIA: A Data Flywheel System for Training GUI Test-Time Scaling Critic Models

Add code
Jan 26, 2026
Viaarxiv icon