Picture for Yi-Jen Shih

Yi-Jen Shih

Interface Design for Self-Supervised Speech Models

Add code
Jun 18, 2024
Viaarxiv icon

SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data

Add code
Feb 10, 2024
Figure 1 for SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data
Figure 2 for SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data
Figure 3 for SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data
Figure 4 for SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data
Viaarxiv icon

Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model

Add code
Feb 08, 2024
Figure 1 for Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model
Figure 2 for Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model
Figure 3 for Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model
Figure 4 for Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model
Viaarxiv icon

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models

Add code
Sep 19, 2023
Figure 1 for AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Figure 2 for AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Figure 3 for AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Viaarxiv icon

M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval

Add code
Nov 02, 2022
Figure 1 for M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Figure 2 for M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Figure 3 for M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Figure 4 for M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Viaarxiv icon

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model

Add code
Oct 03, 2022
Figure 1 for SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Figure 2 for SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Figure 3 for SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Figure 4 for SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Viaarxiv icon

Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer

Add code
Nov 07, 2021
Figure 1 for Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer
Figure 2 for Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer
Figure 3 for Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer
Figure 4 for Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer
Viaarxiv icon