Alert button
Picture for Shih-Fu Chang

Shih-Fu Chang

Alert button

UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding

Add code
Bookmark button
Alert button
Jul 03, 2023
Rui Sun, Zhecan Wang, Haoxuan You, Noel Codella, Kai-Wei Chang, Shih-Fu Chang

Figure 1 for UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Figure 2 for UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Figure 3 for UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Figure 4 for UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Viaarxiv icon

Learning from Children: Improving Image-Caption Pretraining via Curriculum

Add code
Bookmark button
Alert button
May 30, 2023
Hammad A. Ayyubi, Rahul Lokesh, Alireza Zareian, Bo Wu, Shih-Fu Chang

Figure 1 for Learning from Children: Improving Image-Caption Pretraining via Curriculum
Figure 2 for Learning from Children: Improving Image-Caption Pretraining via Curriculum
Figure 3 for Learning from Children: Improving Image-Caption Pretraining via Curriculum
Figure 4 for Learning from Children: Improving Image-Caption Pretraining via Curriculum
Viaarxiv icon

Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs

Add code
Bookmark button
Alert button
May 29, 2023
Mingyang Zhou, Yi R. Fung, Long Chen, Christopher Thomas, Heng Ji, Shih-Fu Chang

Figure 1 for Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs
Figure 2 for Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs
Figure 3 for Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs
Figure 4 for Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs
Viaarxiv icon

Non-Sequential Graph Script Induction via Multimedia Grounding

Add code
Bookmark button
Alert button
May 27, 2023
Yu Zhou, Sha Li, Manling Li, Xudong Lin, Shih-Fu Chang, Mohit Bansal, Heng Ji

Figure 1 for Non-Sequential Graph Script Induction via Multimedia Grounding
Figure 2 for Non-Sequential Graph Script Induction via Multimedia Grounding
Figure 3 for Non-Sequential Graph Script Induction via Multimedia Grounding
Figure 4 for Non-Sequential Graph Script Induction via Multimedia Grounding
Viaarxiv icon

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models

Add code
Bookmark button
Alert button
May 24, 2023
Haoxuan You, Rui Sun, Zhecan Wang, Long Chen, Gengyu Wang, Hammad A. Ayyubi, Kai-Wei Chang, Shih-Fu Chang

Figure 1 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Figure 2 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Figure 3 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Figure 4 for IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Viaarxiv icon

Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering

Add code
Bookmark button
Alert button
Apr 07, 2023
Hung-Ting Su, Yulei Niu, Xudong Lin, Winston H. Hsu, Shih-Fu Chang

Figure 1 for Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Figure 2 for Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Figure 3 for Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Figure 4 for Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Viaarxiv icon

What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

Add code
Bookmark button
Alert button
Mar 29, 2023
Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Daniel Kondermann, Samuel Thomas, Shih-Fu Chang, Rogerio Feris, James Glass, Hilde Kuehne

Figure 1 for What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Figure 2 for What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Figure 3 for What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Figure 4 for What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Viaarxiv icon

Supervised Masked Knowledge Distillation for Few-Shot Transformers

Add code
Bookmark button
Alert button
Mar 29, 2023
Han Lin, Guangxing Han, Jiawei Ma, Shiyuan Huang, Xudong Lin, Shih-Fu Chang

Figure 1 for Supervised Masked Knowledge Distillation for Few-Shot Transformers
Figure 2 for Supervised Masked Knowledge Distillation for Few-Shot Transformers
Figure 3 for Supervised Masked Knowledge Distillation for Few-Shot Transformers
Figure 4 for Supervised Masked Knowledge Distillation for Few-Shot Transformers
Viaarxiv icon

DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection

Add code
Bookmark button
Alert button
Mar 16, 2023
Jiawei Ma, Yulei Niu, Jincheng Xu, Shiyuan Huang, Guangxing Han, Shih-Fu Chang

Figure 1 for DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection
Figure 2 for DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection
Figure 3 for DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection
Figure 4 for DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection
Viaarxiv icon

In Defense of Structural Symbolic Representation for Video Event-Relation Prediction

Add code
Bookmark button
Alert button
Jan 06, 2023
Andrew Lu, Xudong Lin, Yulei Niu, Shih-Fu Chang

Figure 1 for In Defense of Structural Symbolic Representation for Video Event-Relation Prediction
Figure 2 for In Defense of Structural Symbolic Representation for Video Event-Relation Prediction
Figure 3 for In Defense of Structural Symbolic Representation for Video Event-Relation Prediction
Figure 4 for In Defense of Structural Symbolic Representation for Video Event-Relation Prediction
Viaarxiv icon