Alert button
Picture for Zhe Gan

Zhe Gan

Alert button

K-LITE: Learning Transferable Visual Models with External Knowledge

Add code
Bookmark button
Alert button
Apr 20, 2022
Sheng Shen, Chunyuan Li, Xiaowei Hu, Yujia Xie, Jianwei Yang, Pengchuan Zhang, Anna Rohrbach, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Jianfeng Gao

Figure 1 for K-LITE: Learning Transferable Visual Models with External Knowledge
Figure 2 for K-LITE: Learning Transferable Visual Models with External Knowledge
Figure 3 for K-LITE: Learning Transferable Visual Models with External Knowledge
Figure 4 for K-LITE: Learning Transferable Visual Models with External Knowledge
Viaarxiv icon

Injecting Semantic Concepts into End-to-End Image Captioning

Add code
Bookmark button
Alert button
Dec 09, 2021
Zhiyuan Fang, Jianfeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang, Zicheng Liu

Figure 1 for Injecting Semantic Concepts into End-to-End Image Captioning
Figure 2 for Injecting Semantic Concepts into End-to-End Image Captioning
Figure 3 for Injecting Semantic Concepts into End-to-End Image Captioning
Figure 4 for Injecting Semantic Concepts into End-to-End Image Captioning
Viaarxiv icon

MLP Architectures for Vision-and-Language Modeling: An Empirical Study

Add code
Bookmark button
Alert button
Dec 08, 2021
Yixin Nie, Linjie Li, Zhe Gan, Shuohang Wang, Chenguang Zhu, Michael Zeng, Zicheng Liu, Mohit Bansal, Lijuan Wang

Figure 1 for MLP Architectures for Vision-and-Language Modeling: An Empirical Study
Figure 2 for MLP Architectures for Vision-and-Language Modeling: An Empirical Study
Figure 3 for MLP Architectures for Vision-and-Language Modeling: An Empirical Study
Figure 4 for MLP Architectures for Vision-and-Language Modeling: An Empirical Study
Viaarxiv icon

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning

Add code
Bookmark button
Alert button
Nov 25, 2021
Kevin Lin, Linjie Li, Chung-Ching Lin, Faisal Ahmed, Zhe Gan, Zicheng Liu, Yumao Lu, Lijuan Wang

Figure 1 for SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Figure 2 for SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Figure 3 for SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Figure 4 for SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Viaarxiv icon

An Empirical Study of Training End-to-End Vision-and-Language Transformers

Add code
Bookmark button
Alert button
Nov 25, 2021
Zi-Yi Dou, Yichong Xu, Zhe Gan, Jianfeng Wang, Shuohang Wang, Lijuan Wang, Chenguang Zhu, Pengchuan Zhang, Lu Yuan, Nanyun Peng, Zicheng Liu, Michael Zeng

Figure 1 for An Empirical Study of Training End-to-End Vision-and-Language Transformers
Figure 2 for An Empirical Study of Training End-to-End Vision-and-Language Transformers
Figure 3 for An Empirical Study of Training End-to-End Vision-and-Language Transformers
Figure 4 for An Empirical Study of Training End-to-End Vision-and-Language Transformers
Viaarxiv icon

VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling

Add code
Bookmark button
Alert button
Nov 24, 2021
Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu

Figure 1 for VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Figure 2 for VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Figure 3 for VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Figure 4 for VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Viaarxiv icon

Scaling Up Vision-Language Pre-training for Image Captioning

Add code
Bookmark button
Alert button
Nov 24, 2021
Xiaowei Hu, Zhe Gan, Jianfeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu, Lijuan Wang

Figure 1 for Scaling Up Vision-Language Pre-training for Image Captioning
Figure 2 for Scaling Up Vision-Language Pre-training for Image Captioning
Figure 3 for Scaling Up Vision-Language Pre-training for Image Captioning
Figure 4 for Scaling Up Vision-Language Pre-training for Image Captioning
Viaarxiv icon

Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling

Add code
Bookmark button
Alert button
Nov 23, 2021
Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang

Figure 1 for Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling
Figure 2 for Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling
Figure 3 for Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling
Figure 4 for Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling
Viaarxiv icon

UFO: A UniFied TransfOrmer for Vision-Language Representation Learning

Add code
Bookmark button
Alert button
Nov 19, 2021
Jianfeng Wang, Xiaowei Hu, Zhe Gan, Zhengyuan Yang, Xiyang Dai, Zicheng Liu, Yumao Lu, Lijuan Wang

Figure 1 for UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Figure 2 for UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Figure 3 for UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Figure 4 for UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Viaarxiv icon

Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models

Add code
Bookmark button
Alert button
Nov 04, 2021
Boxin Wang, Chejian Xu, Shuohang Wang, Zhe Gan, Yu Cheng, Jianfeng Gao, Ahmed Hassan Awadallah, Bo Li

Figure 1 for Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models
Figure 2 for Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models
Figure 3 for Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models
Figure 4 for Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models
Viaarxiv icon