Picture for Shiliang Zhang

Shiliang Zhang

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Add code
Dec 23, 2023
Figure 1 for emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Figure 2 for emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Figure 3 for emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Figure 4 for emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Viaarxiv icon

Advancing VAD Systems Based on Multi-Task Learning with Improved Model Structures

Add code
Dec 19, 2023
Viaarxiv icon

Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition

Add code
Dec 14, 2023
Figure 1 for Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition
Figure 2 for Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition
Figure 3 for Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition
Figure 4 for Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition
Viaarxiv icon

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Add code
Nov 14, 2023
Figure 1 for Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Figure 2 for Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Figure 3 for Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Figure 4 for Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Viaarxiv icon

Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token Based ASR

Add code
Nov 08, 2023
Figure 1 for Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token Based ASR
Figure 2 for Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token Based ASR
Figure 3 for Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token Based ASR
Figure 4 for Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token Based ASR
Viaarxiv icon

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

Add code
Oct 11, 2023
Figure 1 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Figure 2 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Figure 3 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Figure 4 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Viaarxiv icon

SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR

Add code
Oct 07, 2023
Figure 1 for SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR
Figure 2 for SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR
Figure 3 for SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR
Figure 4 for SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR
Viaarxiv icon

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

Add code
Oct 01, 2023
Figure 1 for Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
Figure 2 for Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
Figure 3 for Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
Figure 4 for Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
Viaarxiv icon

Exploring RWKV for Memory Efficient and Low Latency Streaming ASR

Add code
Sep 26, 2023
Figure 1 for Exploring RWKV for Memory Efficient and Low Latency Streaming ASR
Figure 2 for Exploring RWKV for Memory Efficient and Low Latency Streaming ASR
Figure 3 for Exploring RWKV for Memory Efficient and Low Latency Streaming ASR
Figure 4 for Exploring RWKV for Memory Efficient and Low Latency Streaming ASR
Viaarxiv icon

The second multi-channel multi-party meeting transcription challenge 2.0): A benchmark for speaker-attributed ASR

Add code
Sep 24, 2023
Figure 1 for The second multi-channel multi-party meeting transcription challenge  2.0): A benchmark for speaker-attributed ASR
Figure 2 for The second multi-channel multi-party meeting transcription challenge  2.0): A benchmark for speaker-attributed ASR
Figure 3 for The second multi-channel multi-party meeting transcription challenge  2.0): A benchmark for speaker-attributed ASR
Figure 4 for The second multi-channel multi-party meeting transcription challenge  2.0): A benchmark for speaker-attributed ASR
Viaarxiv icon