Picture for Kai Yu

Kai Yu

Sherman

Why Do Speech Language Models Fail to Generate Semantically Coherent Outputs? A Modality Evolving Perspective

Add code
Dec 22, 2024
Figure 1 for Why Do Speech Language Models Fail to Generate Semantically Coherent Outputs? A Modality Evolving Perspective
Figure 2 for Why Do Speech Language Models Fail to Generate Semantically Coherent Outputs? A Modality Evolving Perspective
Figure 3 for Why Do Speech Language Models Fail to Generate Semantically Coherent Outputs? A Modality Evolving Perspective
Figure 4 for Why Do Speech Language Models Fail to Generate Semantically Coherent Outputs? A Modality Evolving Perspective
Viaarxiv icon

SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training

Add code
Dec 20, 2024
Viaarxiv icon

NTC-KWS: Noise-aware CTC for Robust Keyword Spotting

Add code
Dec 17, 2024
Viaarxiv icon

Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency

Add code
Dec 17, 2024
Figure 1 for Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency
Figure 2 for Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency
Figure 3 for Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency
Figure 4 for Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency
Viaarxiv icon

VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization

Add code
Dec 13, 2024
Figure 1 for VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization
Figure 2 for VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization
Figure 3 for VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization
Figure 4 for VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization
Viaarxiv icon

Reducing Tool Hallucination via Reliability Alignment

Add code
Dec 05, 2024
Figure 1 for Reducing Tool Hallucination via Reliability Alignment
Figure 2 for Reducing Tool Hallucination via Reliability Alignment
Figure 3 for Reducing Tool Hallucination via Reliability Alignment
Figure 4 for Reducing Tool Hallucination via Reliability Alignment
Viaarxiv icon

Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity

Add code
Dec 03, 2024
Viaarxiv icon

Unified Pathological Speech Analysis with Prompt Tuning

Add code
Nov 05, 2024
Figure 1 for Unified Pathological Speech Analysis with Prompt Tuning
Figure 2 for Unified Pathological Speech Analysis with Prompt Tuning
Figure 3 for Unified Pathological Speech Analysis with Prompt Tuning
Figure 4 for Unified Pathological Speech Analysis with Prompt Tuning
Viaarxiv icon

Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding

Add code
Oct 29, 2024
Figure 1 for Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding
Figure 2 for Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding
Figure 3 for Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding
Figure 4 for Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding
Viaarxiv icon

A Survey on Speech Large Language Models

Add code
Oct 24, 2024
Figure 1 for A Survey on Speech Large Language Models
Figure 2 for A Survey on Speech Large Language Models
Figure 3 for A Survey on Speech Large Language Models
Figure 4 for A Survey on Speech Large Language Models
Viaarxiv icon