Picture for Shih-Fu Chang

Shih-Fu Chang

Columbia University

ENTER: Event Based Interpretable Reasoning for VideoQA

Add code
Jan 24, 2025
Figure 1 for ENTER: Event Based Interpretable Reasoning for VideoQA
Figure 2 for ENTER: Event Based Interpretable Reasoning for VideoQA
Figure 3 for ENTER: Event Based Interpretable Reasoning for VideoQA
Figure 4 for ENTER: Event Based Interpretable Reasoning for VideoQA
Viaarxiv icon

PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction

Add code
Jan 24, 2025
Figure 1 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Figure 2 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Figure 3 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Figure 4 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Viaarxiv icon

WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization

Add code
May 28, 2024
Figure 1 for WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization
Figure 2 for WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization
Figure 3 for WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization
Figure 4 for WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization
Viaarxiv icon

Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions

Add code
May 23, 2024
Figure 1 for Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions
Figure 2 for Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions
Figure 3 for Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions
Figure 4 for Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions
Viaarxiv icon

MoDE: CLIP Data Experts via Clustering

Add code
Apr 24, 2024
Figure 1 for MoDE: CLIP Data Experts via Clustering
Figure 2 for MoDE: CLIP Data Experts via Clustering
Figure 3 for MoDE: CLIP Data Experts via Clustering
Figure 4 for MoDE: CLIP Data Experts via Clustering
Viaarxiv icon

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Add code
Apr 11, 2024
Figure 1 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Figure 2 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Figure 3 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Figure 4 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Viaarxiv icon

From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

Add code
Mar 25, 2024
Viaarxiv icon

SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos

Add code
Mar 03, 2024
Figure 1 for SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos
Figure 2 for SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos
Figure 3 for SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos
Figure 4 for SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos
Viaarxiv icon

Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning

Add code
Dec 15, 2023
Figure 1 for Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning
Figure 2 for Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning
Figure 3 for Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning
Figure 4 for Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning
Viaarxiv icon

Video Summarization: Towards Entity-Aware Captions

Add code
Dec 01, 2023
Figure 1 for Video Summarization: Towards Entity-Aware Captions
Figure 2 for Video Summarization: Towards Entity-Aware Captions
Figure 3 for Video Summarization: Towards Entity-Aware Captions
Figure 4 for Video Summarization: Towards Entity-Aware Captions
Viaarxiv icon