Alert button
Picture for Haoxuan You

Haoxuan You

Alert button

Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond

Oct 31, 2023
Zhecan Wang, Long Chen, Haoxuan You, Keyang Xu, Yicheng He, Wenhao Li, Noel Codella, Kai-Wei Chang, Shih-Fu Chang

Figure 1 for Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Figure 2 for Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Figure 3 for Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Figure 4 for Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Viaarxiv icon

Ferret: Refer and Ground Anything Anywhere at Any Granularity

Oct 11, 2023
Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang

Figure 1 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Figure 2 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Figure 3 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Figure 4 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Viaarxiv icon

UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding

Jul 03, 2023
Rui Sun, Zhecan Wang, Haoxuan You, Noel Codella, Kai-Wei Chang, Shih-Fu Chang

Figure 1 for UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Figure 2 for UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Figure 3 for UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Figure 4 for UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Viaarxiv icon

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models

May 24, 2023
Haoxuan You, Rui Sun, Zhecan Wang, Long Chen, Gengyu Wang, Hammad A. Ayyubi, Kai-Wei Chang, Shih-Fu Chang

Figure 1 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Figure 2 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Figure 3 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Figure 4 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Viaarxiv icon

CoBIT: A Contrastive Bi-directional Image-Text Generation Model

Mar 23, 2023
Haoxuan You, Mandy Guo, Zhecan Wang, Kai-Wei Chang, Jason Baldridge, Jiahui Yu

Figure 1 for CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Figure 2 for CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Figure 3 for CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Figure 4 for CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Viaarxiv icon

Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding

Dec 14, 2022
Haoxuan You, Rui Sun, Zhecan Wang, Kai-Wei Chang, Shih-Fu Chang

Figure 1 for Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Figure 2 for Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Figure 3 for Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Figure 4 for Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Viaarxiv icon

Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense

Nov 10, 2022
Zhecan Wang, Haoxuan You, Yicheng He, Wenhao Li, Kai-Wei Chang, Shih-Fu Chang

Figure 1 for Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
Figure 2 for Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
Figure 3 for Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
Figure 4 for Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
Viaarxiv icon

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training

Jul 26, 2022
Haoxuan You, Luowei Zhou, Bin Xiao, Noel Codella, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan

Figure 1 for Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Figure 2 for Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Figure 3 for Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Figure 4 for Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Viaarxiv icon

Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks

Apr 28, 2022
Zhecan Wang, Noel Codella, Yen-Chun Chen, Luowei Zhou, Xiyang Dai, Bin Xiao, Jianwei Yang, Haoxuan You, Kai-Wei Chang, Shih-fu Chang, Lu Yuan

Figure 1 for Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Figure 2 for Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Figure 3 for Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Figure 4 for Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Viaarxiv icon