Picture for Shao-Yen Tseng

Shao-Yen Tseng

L-MAGIC: Language Model Assisted Generation of Images with Coherence

Add code
Jun 03, 2024
Viaarxiv icon

LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models

Apr 03, 2024
Figure 1 for LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
Figure 2 for LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
Figure 3 for LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
Figure 4 for LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
Viaarxiv icon

LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

Mar 29, 2024
Figure 1 for LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Figure 2 for LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Figure 3 for LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Viaarxiv icon

LDM3D-VR: Latent Diffusion Model for 3D VR

Nov 06, 2023
Viaarxiv icon

ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

Add code
May 31, 2023
Figure 1 for ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Figure 2 for ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Figure 3 for ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Figure 4 for ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Viaarxiv icon

LDM3D: Latent Diffusion Model for 3D

May 21, 2023
Figure 1 for LDM3D: Latent Diffusion Model for 3D
Figure 2 for LDM3D: Latent Diffusion Model for 3D
Figure 3 for LDM3D: Latent Diffusion Model for 3D
Figure 4 for LDM3D: Latent Diffusion Model for 3D
Viaarxiv icon

Improving video retrieval using multilingual knowledge transfer

Add code
Aug 28, 2022
Figure 1 for Improving video retrieval using multilingual knowledge transfer
Figure 2 for Improving video retrieval using multilingual knowledge transfer
Figure 3 for Improving video retrieval using multilingual knowledge transfer
Figure 4 for Improving video retrieval using multilingual knowledge transfer
Viaarxiv icon

VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers

Add code
Mar 30, 2022
Figure 1 for VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers
Figure 2 for VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers
Figure 3 for VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers
Figure 4 for VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers
Viaarxiv icon

CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations

Feb 08, 2022
Figure 1 for CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations
Figure 2 for CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations
Figure 3 for CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations
Figure 4 for CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations
Viaarxiv icon

Multimodal Embeddings from Language Models

Add code
Sep 10, 2019
Figure 1 for Multimodal Embeddings from Language Models
Figure 2 for Multimodal Embeddings from Language Models
Viaarxiv icon