Picture for Zhecan Wang

Zhecan Wang

ENTER: Event Based Interpretable Reasoning for VideoQA

Add code
Jan 24, 2025
Figure 1 for ENTER: Event Based Interpretable Reasoning for VideoQA
Figure 2 for ENTER: Event Based Interpretable Reasoning for VideoQA
Figure 3 for ENTER: Event Based Interpretable Reasoning for VideoQA
Figure 4 for ENTER: Event Based Interpretable Reasoning for VideoQA
Viaarxiv icon

PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction

Add code
Jan 24, 2025
Figure 1 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Figure 2 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Figure 3 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Figure 4 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Viaarxiv icon

HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning

Add code
Jul 22, 2024
Figure 1 for HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Figure 2 for HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Figure 3 for HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Figure 4 for HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Viaarxiv icon

Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions

Add code
May 23, 2024
Figure 1 for Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions
Figure 2 for Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions
Figure 3 for Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions
Figure 4 for Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions
Viaarxiv icon

Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond

Add code
Oct 31, 2023
Viaarxiv icon

UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding

Add code
Jul 03, 2023
Figure 1 for UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Figure 2 for UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Figure 3 for UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Figure 4 for UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Viaarxiv icon

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models

Add code
May 24, 2023
Figure 1 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Figure 2 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Figure 3 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Figure 4 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Viaarxiv icon

CoBIT: A Contrastive Bi-directional Image-Text Generation Model

Add code
Mar 23, 2023
Figure 1 for CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Figure 2 for CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Figure 3 for CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Figure 4 for CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Viaarxiv icon

Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding

Add code
Dec 14, 2022
Figure 1 for Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Figure 2 for Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Figure 3 for Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Figure 4 for Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Viaarxiv icon

Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense

Add code
Nov 10, 2022
Figure 1 for Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
Figure 2 for Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
Figure 3 for Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
Figure 4 for Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
Viaarxiv icon