Alert button
Picture for Yi-Jen Shih

Yi-Jen Shih

Alert button

SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data

Feb 10, 2024
Hsuan-Fu Wang, Yi-Jen Shih, Heng-Jui Chang, Layne Berry, Puyuan Peng, Hung-yi Lee, Hsin-Min Wang, David Harwath

Viaarxiv icon

Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model

Feb 08, 2024
Hung-Chieh Fang, Nai-Xuan Ye, Yi-Jen Shih, Puyuan Peng, Hsuan-Fu Wang, Layne Berry, Hung-yi Lee, David Harwath

Viaarxiv icon

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models

Sep 19, 2023
Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi-Luen Feng, Hung-yi Lee

Figure 1 for AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Figure 2 for AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Figure 3 for AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Viaarxiv icon

M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval

Nov 02, 2022
Layne Berry, Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Hung-yi Lee, David Harwath

Figure 1 for M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Figure 2 for M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Figure 3 for M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Figure 4 for M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Viaarxiv icon

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model

Oct 03, 2022
Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Layne Berry, Hung-yi Lee, David Harwath

Figure 1 for SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Figure 2 for SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Figure 3 for SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Figure 4 for SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Viaarxiv icon

Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer

Nov 07, 2021
Yi-Jen Shih, Shih-Lun Wu, Frank Zalkow, Meinard Müller, Yi-Hsuan Yang

Figure 1 for Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer
Figure 2 for Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer
Figure 3 for Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer
Figure 4 for Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer
Viaarxiv icon