Picture for Kai Yu

Kai Yu

Sherman

NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question Answering

Add code
May 26, 2025
Viaarxiv icon

ProgRM: Build Better GUI Agents with Progress Rewards

Add code
May 23, 2025
Viaarxiv icon

Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate

Add code
May 22, 2025
Viaarxiv icon

Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism

Add code
May 20, 2025
Viaarxiv icon

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Add code
May 19, 2025
Figure 1 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 2 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 3 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 4 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Viaarxiv icon

An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care

Add code
May 13, 2025
Viaarxiv icon

Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation

Add code
Apr 27, 2025
Figure 1 for Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation
Figure 2 for Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation
Figure 3 for Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation
Viaarxiv icon

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis

Add code
Apr 14, 2025
Figure 1 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Figure 2 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Figure 3 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Figure 4 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Viaarxiv icon

Neuronal Activation States as Sample Embeddings for Data Selection in Task-Specific Instruction Tuning

Add code
Mar 19, 2025
Viaarxiv icon

Delusions of Large Language Models

Add code
Mar 09, 2025
Viaarxiv icon