Alert button
Picture for Guangzhi Sun

Guangzhi Sun

Alert button

M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset

Add code
Bookmark button
Alert button
Mar 21, 2024
Zhe Chen, Heyang Liu, Wenyi Yu, Guangzhi Sun, Hongcheng Liu, Ji Wu, Chao Zhang, Yu Wang, Yanfeng Wang

Figure 1 for M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Figure 2 for M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Figure 3 for M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Figure 4 for M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Viaarxiv icon

Large language models surpass human experts in predicting neuroscience results

Add code
Bookmark button
Alert button
Mar 14, 2024
Xiaoliang Luo, Akilles Rechardt, Guangzhi Sun, Kevin K. Nejad, Felipe Yáñez, Bati Yilmaz, Kangjoo Lee, Alexandra O. Cohen, Valentina Borghesani, Anton Pashkov, Daniele Marinazzo, Jonathan Nicholas, Alessandro Salatiello, Ilia Sucholutsky, Pasquale Minervini, Sepehr Razavi, Roberta Rocca, Elkhan Yusifov, Tereza Okalova, Nianlong Gu, Martin Ferianc, Mikail Khona, Kaustubh R. Patil, Pui-Shee Lee, Rui Mata, Nicholas E. Myers, Jennifer K Bizley, Sebastian Musslick, Isil Poyraz Bilgin, Guiomar Niso, Justin M. Ales, Michael Gaebler, N Apurva Ratan Murty, Leyla Loued-Khenissi, Anna Behler, Chloe M. Hall, Jessica Dafflon, Sherry Dongqi Bao, Bradley C. Love

Figure 1 for Large language models surpass human experts in predicting neuroscience results
Figure 2 for Large language models surpass human experts in predicting neuroscience results
Figure 3 for Large language models surpass human experts in predicting neuroscience results
Figure 4 for Large language models surpass human experts in predicting neuroscience results
Viaarxiv icon

Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation

Add code
Bookmark button
Alert button
Feb 19, 2024
Nineli Lashkarashvili, Wen Wu, Guangzhi Sun, Philip C. Woodland

Viaarxiv icon

Speech-based Slot Filling using Large Language Models

Add code
Bookmark button
Alert button
Nov 13, 2023
Guangzhi Sun, Shutong Feng, Dongcheng Jiang, Chao Zhang, Milica Gašić, Philip C. Woodland

Figure 1 for Speech-based Slot Filling using Large Language Models
Figure 2 for Speech-based Slot Filling using Large Language Models
Figure 3 for Speech-based Slot Filling using Large Language Models
Figure 4 for Speech-based Slot Filling using Large Language Models
Viaarxiv icon

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

Add code
Bookmark button
Alert button
Oct 27, 2023
Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis

Figure 1 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 2 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 3 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 4 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Viaarxiv icon

SALMONN: Towards Generic Hearing Abilities for Large Language Models

Add code
Bookmark button
Alert button
Oct 20, 2023
Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

Viaarxiv icon

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models

Add code
Bookmark button
Alert button
Oct 10, 2023
Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

Viaarxiv icon

Conditional Diffusion Model for Target Speaker Extraction

Add code
Bookmark button
Alert button
Oct 07, 2023
Theodor Nguyen, Guangzhi Sun, Xianrui Zheng, Chao Zhang, Philip C Woodland

Figure 1 for Conditional Diffusion Model for Target Speaker Extraction
Figure 2 for Conditional Diffusion Model for Target Speaker Extraction
Figure 3 for Conditional Diffusion Model for Target Speaker Extraction
Figure 4 for Conditional Diffusion Model for Target Speaker Extraction
Viaarxiv icon

Connecting Speech Encoder and Large Language Model for ASR

Add code
Bookmark button
Alert button
Sep 26, 2023
Wenyi Yu, Changli Tang, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

Viaarxiv icon