Alert button
Picture for Puyuan Peng

Puyuan Peng

Alert button

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

Add code
Bookmark button
Alert button
Mar 25, 2024
Puyuan Peng, Po-Yao Huang, Daniel Li, Abdelrahman Mohamed, David Harwath

Viaarxiv icon

SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data

Add code
Bookmark button
Alert button
Feb 10, 2024
Hsuan-Fu Wang, Yi-Jen Shih, Heng-Jui Chang, Layne Berry, Puyuan Peng, Hung-yi Lee, Hsin-Min Wang, David Harwath

Viaarxiv icon

Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model

Add code
Bookmark button
Alert button
Feb 08, 2024
Hung-Chieh Fang, Nai-Xuan Ye, Yi-Jen Shih, Puyuan Peng, Hsuan-Fu Wang, Layne Berry, Hung-yi Lee, David Harwath

Viaarxiv icon

BAT: Learning to Reason about Spatial Sounds with Large Language Models

Add code
Bookmark button
Alert button
Feb 02, 2024
Zhisheng Zheng, Puyuan Peng, Ziyang Ma, Xie Chen, Eunsol Choi, David Harwath

Viaarxiv icon

Audio-Visual Neural Syntax Acquisition

Add code
Bookmark button
Alert button
Oct 11, 2023
Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David Cox, David Harwath, Yang Zhang, Karen Livescu, James Glass

Figure 1 for Audio-Visual Neural Syntax Acquisition
Figure 2 for Audio-Visual Neural Syntax Acquisition
Figure 3 for Audio-Visual Neural Syntax Acquisition
Figure 4 for Audio-Visual Neural Syntax Acquisition
Viaarxiv icon

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models

Add code
Bookmark button
Alert button
Sep 19, 2023
Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi-Luen Feng, Hung-yi Lee

Figure 1 for AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Figure 2 for AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Figure 3 for AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Viaarxiv icon

Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos

Add code
Bookmark button
Alert button
Jun 27, 2023
Chiori Hori, Puyuan Peng, David Harwath, Xinyu Liu, Kei Ota, Siddarth Jain, Radu Corcodel, Devesh Jha, Diego Romeres, Jonathan Le Roux

Figure 1 for Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos
Figure 2 for Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos
Figure 3 for Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos
Figure 4 for Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos
Viaarxiv icon

Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Mode

Add code
Bookmark button
Alert button
May 19, 2023
Puyuan Peng, Shang-Wen Li, Okko Räsänen, Abdelrahman Mohamed, David Harwath

Figure 1 for Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Mode
Figure 2 for Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Mode
Figure 3 for Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Mode
Figure 4 for Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Mode
Viaarxiv icon

Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization

Add code
Bookmark button
Alert button
May 18, 2023
Puyuan Peng, Brian Yan, Shinji Watanabe, David Harwath

Figure 1 for Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization
Figure 2 for Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization
Figure 3 for Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization
Figure 4 for Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization
Viaarxiv icon

Zero-shot Video Moment Retrieval With Off-the-Shelf Models

Add code
Bookmark button
Alert button
Nov 03, 2022
Anuj Diwan, Puyuan Peng, Raymond J. Mooney

Figure 1 for Zero-shot Video Moment Retrieval With Off-the-Shelf Models
Figure 2 for Zero-shot Video Moment Retrieval With Off-the-Shelf Models
Figure 3 for Zero-shot Video Moment Retrieval With Off-the-Shelf Models
Figure 4 for Zero-shot Video Moment Retrieval With Off-the-Shelf Models
Viaarxiv icon