Picture for Kai Yu

Kai Yu

Sherman

X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning

Add code
May 07, 2026
Viaarxiv icon

FaithfulFaces: Pose-Faithful Facial Identity Preservation for Text-to-Video Generation

Add code
May 06, 2026
Viaarxiv icon

RAS: a Reliability Oriented Metric for Automatic Speech Recognition

Add code
Apr 28, 2026
Viaarxiv icon

Diagnosing CFG Interpretation in LLMs

Add code
Apr 22, 2026
Viaarxiv icon

Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition

Add code
Apr 14, 2026
Viaarxiv icon

X-VC: Zero-shot Streaming Voice Conversion in Codec Space

Add code
Apr 14, 2026
Viaarxiv icon

TASU2: Controllable CTC Simulation for Alignment and Low-Resource Adaptation of Speech LLMs

Add code
Apr 09, 2026
Viaarxiv icon

Does Pass Rate Tell the Whole Story? Evaluating Design Constraint Compliance in LLM-based Issue Resolution

Add code
Apr 07, 2026
Viaarxiv icon

PRIME: Prototype-Driven Multimodal Pretraining for Cancer Prognosis with Missing Modalities

Add code
Apr 05, 2026
Viaarxiv icon

CharTool: Tool-Integrated Visual Reasoning for Chart Understanding

Add code
Apr 03, 2026
Viaarxiv icon