Picture for Kai Yu

Kai Yu

Sherman

Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate

Add code
May 22, 2025
Viaarxiv icon

Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism

Add code
May 20, 2025
Viaarxiv icon

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Add code
May 19, 2025
Figure 1 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 2 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 3 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 4 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Viaarxiv icon

An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care

Add code
May 13, 2025
Viaarxiv icon

Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation

Add code
Apr 27, 2025
Figure 1 for Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation
Figure 2 for Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation
Figure 3 for Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation
Viaarxiv icon

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis

Add code
Apr 14, 2025
Figure 1 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Figure 2 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Figure 3 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Figure 4 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Viaarxiv icon

Neuronal Activation States as Sample Embeddings for Data Selection in Task-Specific Instruction Tuning

Add code
Mar 19, 2025
Viaarxiv icon

Alignment for Efficient Tool Calling of Large Language Models

Add code
Mar 09, 2025
Figure 1 for Alignment for Efficient Tool Calling of Large Language Models
Figure 2 for Alignment for Efficient Tool Calling of Large Language Models
Figure 3 for Alignment for Efficient Tool Calling of Large Language Models
Figure 4 for Alignment for Efficient Tool Calling of Large Language Models
Viaarxiv icon

Delusions of Large Language Models

Add code
Mar 09, 2025
Viaarxiv icon

Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling

Add code
Mar 04, 2025
Viaarxiv icon