Picture for Kai Yu

Kai Yu

Sherman

DeepSurvey: Enhancing Analytical Depth and Citation Reliability in Automated Survey Generation

Add code
May 28, 2026
Viaarxiv icon

HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding

Add code
May 28, 2026
Viaarxiv icon

Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation

Add code
May 28, 2026
Viaarxiv icon

Audio-Mind: An Auditable Agentic Framework for Audio Understanding

Add code
May 27, 2026
Viaarxiv icon

Good to Go: The LOOP Skill Engine That Hits 99% Success and Slashes Token Usage by 99% via One-Shot Recording and Deterministic Replay

Add code
May 14, 2026
Viaarxiv icon

Artificial Intelligence-Assistant Cardiotocography: Unified Model for Signal Reconstruction, Fetal Heart Rate Analysis, and Variability Assessment

Add code
May 14, 2026
Viaarxiv icon

No Action Without a NOD: A Heterogeneous Multi-Agent Architecture for Reliable Service Agents

Add code
May 12, 2026
Viaarxiv icon

X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning

Add code
May 07, 2026
Viaarxiv icon

FaithfulFaces: Pose-Faithful Facial Identity Preservation for Text-to-Video Generation

Add code
May 06, 2026
Viaarxiv icon

RAS: a Reliability Oriented Metric for Automatic Speech Recognition

Add code
Apr 28, 2026
Viaarxiv icon