Picture for Kai Yu

Kai Yu

Sherman

IEA: Amateur-Friendly Conversational Image Editing Agent via Three Stages of Multitask Alignment

Add code
Jun 06, 2026
Viaarxiv icon

dots.tts Technical Report

Add code
Jun 05, 2026
Viaarxiv icon

VISA: A Visual Information Strengthened Audio-Reasoning System for the Interspeech 2026 ARC Agent Track

Add code
Jun 05, 2026
Viaarxiv icon

MMAE: A Massive Multitask Audio Editing Benchmark

Add code
Jun 05, 2026
Viaarxiv icon

Read What You Hear: Reference-Free Hypotheses Evaluation with Acoustic Discrepancy

Add code
Jun 03, 2026
Viaarxiv icon

WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling

Add code
Jun 02, 2026
Viaarxiv icon

ProductWebGen: Benchmarking Multimodal Product Webpage Generation

Add code
May 31, 2026
Viaarxiv icon

OpenSTBench: Beyond Semantic Evaluation for Speech Translation

Add code
May 29, 2026
Viaarxiv icon

A Unified and Reproducible Experimentation Framework for Speech Understanding

Add code
May 29, 2026
Viaarxiv icon

DeepSurvey: Enhancing Analytical Depth and Citation Reliability in Automated Survey Generation

Add code
May 28, 2026
Viaarxiv icon