Picture for Xiaofu Chen

Xiaofu Chen

Wuhan University

VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors

Add code
Apr 02, 2026
Viaarxiv icon

QuarkAudio Technical Report

Add code
Dec 23, 2025
Viaarxiv icon

SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption Evaluation

Add code
Sep 04, 2025
Viaarxiv icon

V-Cloak: Intelligibility-, Naturalness- & Timbre-Preserving Real-Time Voice Anonymization

Add code
Oct 27, 2022
Viaarxiv icon