Picture for Shih-Fu Chang

Shih-Fu Chang

Columbia University

Characterizing Video Question Answering with Sparsified Inputs

Add code
Nov 27, 2023
Figure 1 for Characterizing Video Question Answering with Sparsified Inputs
Figure 2 for Characterizing Video Question Answering with Sparsified Inputs
Figure 3 for Characterizing Video Question Answering with Sparsified Inputs
Figure 4 for Characterizing Video Question Answering with Sparsified Inputs
Viaarxiv icon

Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond

Add code
Oct 31, 2023
Viaarxiv icon

Ferret: Refer and Ground Anything Anywhere at Any Granularity

Add code
Oct 11, 2023
Figure 1 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Figure 2 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Figure 3 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Figure 4 for Ferret: Refer and Ground Anything Anywhere at Any Granularity
Viaarxiv icon

UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding

Add code
Jul 03, 2023
Figure 1 for UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Figure 2 for UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Figure 3 for UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Figure 4 for UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Viaarxiv icon

Learning from Children: Improving Image-Caption Pretraining via Curriculum

Add code
May 30, 2023
Figure 1 for Learning from Children: Improving Image-Caption Pretraining via Curriculum
Figure 2 for Learning from Children: Improving Image-Caption Pretraining via Curriculum
Figure 3 for Learning from Children: Improving Image-Caption Pretraining via Curriculum
Figure 4 for Learning from Children: Improving Image-Caption Pretraining via Curriculum
Viaarxiv icon

Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs

Add code
May 29, 2023
Figure 1 for Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs
Figure 2 for Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs
Figure 3 for Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs
Figure 4 for Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs
Viaarxiv icon

Non-Sequential Graph Script Induction via Multimedia Grounding

Add code
May 27, 2023
Viaarxiv icon

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models

Add code
May 24, 2023
Figure 1 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Figure 2 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Figure 3 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Figure 4 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Viaarxiv icon

Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering

Add code
Apr 07, 2023
Figure 1 for Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Figure 2 for Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Figure 3 for Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Figure 4 for Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Viaarxiv icon

What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

Add code
Mar 29, 2023
Viaarxiv icon