Picture for Shiliang Zhang

Shiliang Zhang

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition

Add code
Sep 19, 2023
Figure 1 for Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
Figure 2 for Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
Figure 3 for Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
Figure 4 for Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
Viaarxiv icon

Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation

Add code
Sep 19, 2023
Figure 1 for Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Figure 2 for Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Figure 3 for Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Figure 4 for Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Viaarxiv icon

Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer

Add code
Sep 14, 2023
Figure 1 for Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer
Figure 2 for Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer
Figure 3 for Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer
Viaarxiv icon

FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec

Add code
Sep 14, 2023
Viaarxiv icon

SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus

Add code
Sep 12, 2023
Viaarxiv icon

SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability

Add code
Aug 16, 2023
Figure 1 for SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability
Figure 2 for SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability
Figure 3 for SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability
Figure 4 for SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability
Viaarxiv icon

MixBCT: Towards Self-Adapting Backward-Compatible Training

Add code
Aug 14, 2023
Figure 1 for MixBCT: Towards Self-Adapting Backward-Compatible Training
Figure 2 for MixBCT: Towards Self-Adapting Backward-Compatible Training
Figure 3 for MixBCT: Towards Self-Adapting Backward-Compatible Training
Figure 4 for MixBCT: Towards Self-Adapting Backward-Compatible Training
Viaarxiv icon

Rethinking the visual cues in audio-visual speaker extraction

Add code
Jun 05, 2023
Figure 1 for Rethinking the visual cues in audio-visual speaker extraction
Figure 2 for Rethinking the visual cues in audio-visual speaker extraction
Figure 3 for Rethinking the visual cues in audio-visual speaker extraction
Viaarxiv icon

speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition

Add code
May 30, 2023
Viaarxiv icon

Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System

Add code
May 25, 2023
Figure 1 for Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System
Figure 2 for Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System
Figure 3 for Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System
Figure 4 for Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System
Viaarxiv icon