Alert button
Picture for Zhecan Wang

Zhecan Wang

Alert button

Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond

Add code
Bookmark button
Alert button
Oct 31, 2023
Zhecan Wang, Long Chen, Haoxuan You, Keyang Xu, Yicheng He, Wenhao Li, Noel Codella, Kai-Wei Chang, Shih-Fu Chang

Figure 1 for Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Figure 2 for Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Figure 3 for Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Figure 4 for Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Viaarxiv icon

UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding

Add code
Bookmark button
Alert button
Jul 03, 2023
Rui Sun, Zhecan Wang, Haoxuan You, Noel Codella, Kai-Wei Chang, Shih-Fu Chang

Figure 1 for UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Figure 2 for UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Figure 3 for UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Figure 4 for UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Viaarxiv icon

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models

Add code
Bookmark button
Alert button
May 24, 2023
Haoxuan You, Rui Sun, Zhecan Wang, Long Chen, Gengyu Wang, Hammad A. Ayyubi, Kai-Wei Chang, Shih-Fu Chang

Figure 1 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Figure 2 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Figure 3 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Figure 4 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Viaarxiv icon

CoBIT: A Contrastive Bi-directional Image-Text Generation Model

Add code
Bookmark button
Alert button
Mar 23, 2023
Haoxuan You, Mandy Guo, Zhecan Wang, Kai-Wei Chang, Jason Baldridge, Jiahui Yu

Figure 1 for CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Figure 2 for CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Figure 3 for CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Figure 4 for CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Viaarxiv icon

Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding

Add code
Bookmark button
Alert button
Dec 14, 2022
Haoxuan You, Rui Sun, Zhecan Wang, Kai-Wei Chang, Shih-Fu Chang

Figure 1 for Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Figure 2 for Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Figure 3 for Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Figure 4 for Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Viaarxiv icon

Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense

Add code
Bookmark button
Alert button
Nov 10, 2022
Zhecan Wang, Haoxuan You, Yicheng He, Wenhao Li, Kai-Wei Chang, Shih-Fu Chang

Figure 1 for Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
Figure 2 for Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
Figure 3 for Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
Figure 4 for Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
Viaarxiv icon

Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks

Add code
Bookmark button
Alert button
Apr 28, 2022
Zhecan Wang, Noel Codella, Yen-Chun Chen, Luowei Zhou, Xiyang Dai, Bin Xiao, Jianwei Yang, Haoxuan You, Kai-Wei Chang, Shih-fu Chang, Lu Yuan

Figure 1 for Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Figure 2 for Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Figure 3 for Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Figure 4 for Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Viaarxiv icon

CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks

Add code
Bookmark button
Alert button
Jan 15, 2022
Zhecan Wang, Noel Codella, Yen-Chun Chen, Luowei Zhou, Jianwei Yang, Xiyang Dai, Bin Xiao, Haoxuan You, Shih-Fu Chang, Lu Yuan

Figure 1 for CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Figure 2 for CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Figure 3 for CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Figure 4 for CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Viaarxiv icon

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Add code
Bookmark button
Alert button
Dec 16, 2021
Zhecan Wang, Haoxuan You, Liunian Harold Li, Alireza Zareian, Suji Park, Yiqing Liang, Kai-Wei Chang, Shih-Fu Chang

Figure 1 for SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
Figure 2 for SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
Figure 3 for SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
Figure 4 for SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
Viaarxiv icon