Alert button
Picture for Hung-yi Lee

Hung-yi Lee

Alert button

T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5

Add code
Bookmark button
Alert button
Nov 01, 2022
Chan-Jan Hsu, Ho-Lam Chung, Hung-yi Lee, Yu Tsao

Figure 1 for T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5
Figure 2 for T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5
Figure 3 for T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5
Figure 4 for T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5
Viaarxiv icon

Multimodal Transformer Distillation for Audio-Visual Synchronization

Add code
Bookmark button
Alert button
Oct 27, 2022
Xuanjun Chen, Haibin Wu, Chung-Che Wang, Hung-yi Lee, Jyh-Shing Roger Jang

Figure 1 for Multimodal Transformer Distillation for Audio-Visual Synchronization
Figure 2 for Multimodal Transformer Distillation for Audio-Visual Synchronization
Figure 3 for Multimodal Transformer Distillation for Audio-Visual Synchronization
Figure 4 for Multimodal Transformer Distillation for Audio-Visual Synchronization
Viaarxiv icon

Improving generalizability of distilled self-supervised speech processing models under distorted settings

Add code
Bookmark button
Alert button
Oct 20, 2022
Kuan-Po Huang, Yu-Kuan Fu, Tsu-Yuan Hsu, Fabian Ritter Gutierrez, Fan-Lin Wang, Liang-Hsuan Tseng, Yu Zhang, Hung-yi Lee

Figure 1 for Improving generalizability of distilled self-supervised speech processing models under distorted settings
Figure 2 for Improving generalizability of distilled self-supervised speech processing models under distorted settings
Figure 3 for Improving generalizability of distilled self-supervised speech processing models under distorted settings
Figure 4 for Improving generalizability of distilled self-supervised speech processing models under distorted settings
Viaarxiv icon

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning

Add code
Bookmark button
Alert button
Oct 16, 2022
Tzu-hsun Feng, Annie Dong, Ching-Feng Yeh, Shu-wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe, Abdelrahman Mohamed, Shang-Wen Li, Hung-yi Lee

Figure 1 for SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Figure 2 for SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Figure 3 for SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Figure 4 for SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Viaarxiv icon

On Compressing Sequences for Self-Supervised Speech Models

Add code
Bookmark button
Alert button
Oct 14, 2022
Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe, Paola Garcia, Hung-yi Lee, Hao Tang

Figure 1 for On Compressing Sequences for Self-Supervised Speech Models
Figure 2 for On Compressing Sequences for Self-Supervised Speech Models
Figure 3 for On Compressing Sequences for Self-Supervised Speech Models
Figure 4 for On Compressing Sequences for Self-Supervised Speech Models
Viaarxiv icon

On the Utility of Self-supervised Models for Prosody-related Tasks

Add code
Bookmark button
Alert button
Oct 13, 2022
Guan-Ting Lin, Chi-Luen Feng, Wei-Ping Huang, Yuan Tseng, Tzu-Han Lin, Chen-An Li, Hung-yi Lee, Nigel G. Ward

Figure 1 for On the Utility of Self-supervised Models for Prosody-related Tasks
Figure 2 for On the Utility of Self-supervised Models for Prosody-related Tasks
Figure 3 for On the Utility of Self-supervised Models for Prosody-related Tasks
Figure 4 for On the Utility of Self-supervised Models for Prosody-related Tasks
Viaarxiv icon

Exploring Efficient-tuning Methods in Self-supervised Speech Models

Add code
Bookmark button
Alert button
Oct 10, 2022
Zih-Ching Chen, Chin-Lun Fu, Chih-Ying Liu, Shang-Wen Li, Hung-yi Lee

Figure 1 for Exploring Efficient-tuning Methods in Self-supervised Speech Models
Figure 2 for Exploring Efficient-tuning Methods in Self-supervised Speech Models
Figure 3 for Exploring Efficient-tuning Methods in Self-supervised Speech Models
Figure 4 for Exploring Efficient-tuning Methods in Self-supervised Speech Models
Viaarxiv icon

How Far Are We from Real Synonym Substitution Attacks?

Add code
Bookmark button
Alert button
Oct 06, 2022
Cheng-Han Chiang, Hung-yi Lee

Figure 1 for How Far Are We from Real Synonym Substitution Attacks?
Figure 2 for How Far Are We from Real Synonym Substitution Attacks?
Figure 3 for How Far Are We from Real Synonym Substitution Attacks?
Figure 4 for How Far Are We from Real Synonym Substitution Attacks?
Viaarxiv icon

Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection

Add code
Bookmark button
Alert button
Oct 03, 2022
Xuanjun Chen, Haibin Wu, Helen Meng, Hung-yi Lee, Jyh-Shing Roger Jang

Figure 1 for Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection
Figure 2 for Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection
Figure 3 for Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection
Figure 4 for Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection
Viaarxiv icon

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model

Add code
Bookmark button
Alert button
Oct 03, 2022
Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Layne Berry, Hung-yi Lee, David Harwath

Figure 1 for SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Figure 2 for SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Figure 3 for SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Figure 4 for SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Viaarxiv icon