Alert button
Picture for Shao-Yen Tseng

Shao-Yen Tseng

Alert button

LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models

Add code
Bookmark button
Alert button
Apr 03, 2024
Gabriela Ben Melech Stan, Raanan Yehezkel Rohekar, Yaniv Gurwicz, Matthew Lyle Olson, Anahita Bhiwandiwalla, Estelle Aflalo, Chenfei Wu, Nan Duan, Shao-Yen Tseng, Vasudev Lal

Viaarxiv icon

LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

Add code
Bookmark button
Alert button
Mar 29, 2024
Musashi Hinck, Matthew L. Olson, David Cobbley, Shao-Yen Tseng, Vasudev Lal

Viaarxiv icon

LDM3D-VR: Latent Diffusion Model for 3D VR

Add code
Bookmark button
Alert button
Nov 06, 2023
Gabriela Ben Melech Stan, Diana Wofk, Estelle Aflalo, Shao-Yen Tseng, Zhipeng Cai, Michael Paulitsch, Vasudev Lal

Viaarxiv icon

ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

Add code
Bookmark button
Alert button
May 31, 2023
Xiao Xu, Bei Li, Chenfei Wu, Shao-Yen Tseng, Anahita Bhiwandiwalla, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan

Figure 1 for ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Figure 2 for ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Figure 3 for ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Figure 4 for ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Viaarxiv icon

LDM3D: Latent Diffusion Model for 3D

Add code
Bookmark button
Alert button
May 21, 2023
Gabriela Ben Melech Stan, Diana Wofk, Scottie Fox, Alex Redden, Will Saxton, Jean Yu, Estelle Aflalo, Shao-Yen Tseng, Fabio Nonato, Matthias Muller, Vasudev Lal

Figure 1 for LDM3D: Latent Diffusion Model for 3D
Figure 2 for LDM3D: Latent Diffusion Model for 3D
Figure 3 for LDM3D: Latent Diffusion Model for 3D
Figure 4 for LDM3D: Latent Diffusion Model for 3D
Viaarxiv icon

Improving video retrieval using multilingual knowledge transfer

Add code
Bookmark button
Alert button
Aug 28, 2022
Avinash Madasu, Estelle Aflalo, Gabriela Ben Melech Stan, Shao-Yen Tseng, Gedas Bertasius, Vasudev Lal

Figure 1 for Improving video retrieval using multilingual knowledge transfer
Figure 2 for Improving video retrieval using multilingual knowledge transfer
Figure 3 for Improving video retrieval using multilingual knowledge transfer
Figure 4 for Improving video retrieval using multilingual knowledge transfer
Viaarxiv icon

VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers

Add code
Bookmark button
Alert button
Mar 30, 2022
Estelle Aflalo, Meng Du, Shao-Yen Tseng, Yongfei Liu, Chenfei Wu, Nan Duan, Vasudev Lal

Figure 1 for VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers
Figure 2 for VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers
Figure 3 for VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers
Figure 4 for VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers
Viaarxiv icon

CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations

Add code
Bookmark button
Alert button
Feb 08, 2022
Vin Sachidananda, Shao-Yen Tseng, Erik Marchi, Sachin Kajarekar, Panayiotis Georgiou

Figure 1 for CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations
Figure 2 for CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations
Figure 3 for CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations
Figure 4 for CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations
Viaarxiv icon

Multimodal Embeddings from Language Models

Add code
Bookmark button
Alert button
Sep 10, 2019
Shao-Yen Tseng, Panayiotis Georgiou, Shrikanth Narayanan

Figure 1 for Multimodal Embeddings from Language Models
Figure 2 for Multimodal Embeddings from Language Models
Viaarxiv icon