Picture for Kai Yu

Kai Yu

Sherman

UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models

Add code
Oct 26, 2025
Viaarxiv icon

MS-BART: Unified Modeling of Mass Spectra and Molecules for Structure Elucidation

Add code
Oct 23, 2025
Viaarxiv icon

DiSRouter: Distributed Self-Routing for LLM Selections

Add code
Oct 22, 2025
Viaarxiv icon

Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception

Add code
Oct 14, 2025
Viaarxiv icon

Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video

Add code
Sep 10, 2025
Viaarxiv icon

POSE: Phased One-Step Adversarial Equilibrium for Video Diffusion Models

Add code
Aug 28, 2025
Figure 1 for POSE: Phased One-Step Adversarial Equilibrium for Video Diffusion Models
Figure 2 for POSE: Phased One-Step Adversarial Equilibrium for Video Diffusion Models
Figure 3 for POSE: Phased One-Step Adversarial Equilibrium for Video Diffusion Models
Figure 4 for POSE: Phased One-Step Adversarial Equilibrium for Video Diffusion Models
Viaarxiv icon

MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR

Add code
Aug 26, 2025
Figure 1 for MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR
Figure 2 for MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR
Figure 3 for MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR
Figure 4 for MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR
Viaarxiv icon

Joint decoding method for controllable contextual speech recognition based on Speech LLM

Add code
Aug 12, 2025
Figure 1 for Joint decoding method for controllable contextual speech recognition based on Speech LLM
Figure 2 for Joint decoding method for controllable contextual speech recognition based on Speech LLM
Figure 3 for Joint decoding method for controllable contextual speech recognition based on Speech LLM
Figure 4 for Joint decoding method for controllable contextual speech recognition based on Speech LLM
Viaarxiv icon

ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge

Add code
Jul 30, 2025
Viaarxiv icon

Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning

Add code
Jul 23, 2025
Viaarxiv icon