Alert button
Picture for Guangzhi Sun

Guangzhi Sun

Alert button

Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation

Feb 19, 2024
Nineli Lashkarashvili, Wen Wu, Guangzhi Sun, Philip C. Woodland

Viaarxiv icon

Speech-based Slot Filling using Large Language Models

Nov 13, 2023
Guangzhi Sun, Shutong Feng, Dongcheng Jiang, Chao Zhang, Milica Gašić, Philip C. Woodland

Figure 1 for Speech-based Slot Filling using Large Language Models
Figure 2 for Speech-based Slot Filling using Large Language Models
Figure 3 for Speech-based Slot Filling using Large Language Models
Figure 4 for Speech-based Slot Filling using Large Language Models
Viaarxiv icon

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

Oct 27, 2023
Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis

Figure 1 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 2 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 3 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 4 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Viaarxiv icon

SALMONN: Towards Generic Hearing Abilities for Large Language Models

Oct 20, 2023
Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

Viaarxiv icon

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models

Oct 10, 2023
Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

Viaarxiv icon

Conditional Diffusion Model for Target Speaker Extraction

Oct 07, 2023
Theodor Nguyen, Guangzhi Sun, Xianrui Zheng, Chao Zhang, Philip C Woodland

Figure 1 for Conditional Diffusion Model for Target Speaker Extraction
Figure 2 for Conditional Diffusion Model for Target Speaker Extraction
Figure 3 for Conditional Diffusion Model for Target Speaker Extraction
Figure 4 for Conditional Diffusion Model for Target Speaker Extraction
Viaarxiv icon

Connecting Speech Encoder and Large Language Model for ASR

Sep 26, 2023
Wenyi Yu, Changli Tang, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

Viaarxiv icon

Affect Recognition in Conversations Using Large Language Models

Sep 22, 2023
Shutong Feng, Guangzhi Sun, Nurul Lubis, Chao Zhang, Milica Gašić

Figure 1 for Affect Recognition in Conversations Using Large Language Models
Figure 2 for Affect Recognition in Conversations Using Large Language Models
Figure 3 for Affect Recognition in Conversations Using Large Language Models
Figure 4 for Affect Recognition in Conversations Using Large Language Models
Viaarxiv icon

Enhancing Quantised End-to-End ASR Models via Personalisation

Sep 17, 2023
Qiuming Zhao, Guangzhi Sun, Chao Zhang, Mingxing Xu, Thomas Fang Zheng

Figure 1 for Enhancing Quantised End-to-End ASR Models via Personalisation
Figure 2 for Enhancing Quantised End-to-End ASR Models via Personalisation
Figure 3 for Enhancing Quantised End-to-End ASR Models via Personalisation
Figure 4 for Enhancing Quantised End-to-End ASR Models via Personalisation
Viaarxiv icon

Cross-Utterance Conditioned VAE for Speech Generation

Sep 08, 2023
Yang Li, Cheng Yu, Guangzhi Sun, Weiqin Zu, Zheng Tian, Ying Wen, Wei Pan, Chao Zhang, Jun Wang, Yang Yang, Fanglei Sun

Figure 1 for Cross-Utterance Conditioned VAE for Speech Generation
Figure 2 for Cross-Utterance Conditioned VAE for Speech Generation
Figure 3 for Cross-Utterance Conditioned VAE for Speech Generation
Figure 4 for Cross-Utterance Conditioned VAE for Speech Generation
Viaarxiv icon