Picture for Yiwen Shao

Yiwen Shao

Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding

Add code
Nov 19, 2025
Viaarxiv icon

TTA: Transcribe, Translate and Alignment for Cross-lingual Speech Representation

Add code
Nov 18, 2025
Viaarxiv icon

Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference

Add code
Aug 27, 2025
Figure 1 for Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference
Figure 2 for Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference
Figure 3 for Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference
Figure 4 for Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference
Viaarxiv icon

DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models

Add code
Aug 12, 2025
Viaarxiv icon

Efficient Scaling for LLM-based ASR

Add code
Aug 06, 2025
Figure 1 for Efficient Scaling for LLM-based ASR
Figure 2 for Efficient Scaling for LLM-based ASR
Figure 3 for Efficient Scaling for LLM-based ASR
Figure 4 for Efficient Scaling for LLM-based ASR
Viaarxiv icon

Advancing Multi-talker ASR Performance with Large Language Models

Add code
Aug 30, 2024
Figure 1 for Advancing Multi-talker ASR Performance with Large Language Models
Figure 2 for Advancing Multi-talker ASR Performance with Large Language Models
Figure 3 for Advancing Multi-talker ASR Performance with Large Language Models
Figure 4 for Advancing Multi-talker ASR Performance with Large Language Models
Viaarxiv icon

Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment

Add code
Jun 17, 2024
Figure 1 for Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment
Figure 2 for Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment
Figure 3 for Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment
Figure 4 for Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment
Viaarxiv icon

RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR

Add code
Oct 31, 2023
Figure 1 for RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR
Figure 2 for RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR
Figure 3 for RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR
Figure 4 for RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR
Viaarxiv icon

UniX-Encoder: A Universal $X$-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing

Add code
Oct 25, 2023
Figure 1 for UniX-Encoder: A Universal $X$-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing
Figure 2 for UniX-Encoder: A Universal $X$-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing
Figure 3 for UniX-Encoder: A Universal $X$-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing
Figure 4 for UniX-Encoder: A Universal $X$-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing
Viaarxiv icon

Challenges and Insights: Exploring 3D Spatial Features and Complex Networks on the MISP Dataset

Add code
Oct 05, 2023
Figure 1 for Challenges and Insights: Exploring 3D Spatial Features and Complex Networks on the MISP Dataset
Figure 2 for Challenges and Insights: Exploring 3D Spatial Features and Complex Networks on the MISP Dataset
Figure 3 for Challenges and Insights: Exploring 3D Spatial Features and Complex Networks on the MISP Dataset
Figure 4 for Challenges and Insights: Exploring 3D Spatial Features and Complex Networks on the MISP Dataset
Viaarxiv icon