Alert button
Picture for Zhe Gan

Zhe Gan

Alert button

ReCo: Region-Controlled Text-to-Image Generation

Add code
Bookmark button
Alert button
Nov 23, 2022
Zhengyuan Yang, Jianfeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang

Figure 1 for ReCo: Region-Controlled Text-to-Image Generation
Figure 2 for ReCo: Region-Controlled Text-to-Image Generation
Figure 3 for ReCo: Region-Controlled Text-to-Image Generation
Figure 4 for ReCo: Region-Controlled Text-to-Image Generation
Viaarxiv icon

Exploring Discrete Diffusion Models for Image Captioning

Add code
Bookmark button
Alert button
Nov 21, 2022
Zixin Zhu, Yixuan Wei, Jianfeng Wang, Zhe Gan, Zheng Zhang, Le Wang, Gang Hua, Lijuan Wang, Zicheng Liu, Han Hu

Figure 1 for Exploring Discrete Diffusion Models for Image Captioning
Figure 2 for Exploring Discrete Diffusion Models for Image Captioning
Figure 3 for Exploring Discrete Diffusion Models for Image Captioning
Figure 4 for Exploring Discrete Diffusion Models for Image Captioning
Viaarxiv icon

Non-Contrastive Learning Meets Language-Image Pre-Training

Add code
Bookmark button
Alert button
Oct 17, 2022
Jinghao Zhou, Li Dong, Zhe Gan, Lijuan Wang, Furu Wei

Figure 1 for Non-Contrastive Learning Meets Language-Image Pre-Training
Figure 2 for Non-Contrastive Learning Meets Language-Image Pre-Training
Figure 3 for Non-Contrastive Learning Meets Language-Image Pre-Training
Figure 4 for Non-Contrastive Learning Meets Language-Image Pre-Training
Viaarxiv icon

Vision-Language Pre-training: Basics, Recent Advances, and Future Trends

Add code
Bookmark button
Alert button
Oct 17, 2022
Zhe Gan, Linjie Li, Chunyuan Li, Lijuan Wang, Zicheng Liu, Jianfeng Gao

Figure 1 for Vision-Language Pre-training: Basics, Recent Advances, and Future Trends
Figure 2 for Vision-Language Pre-training: Basics, Recent Advances, and Future Trends
Figure 3 for Vision-Language Pre-training: Basics, Recent Advances, and Future Trends
Figure 4 for Vision-Language Pre-training: Basics, Recent Advances, and Future Trends
Viaarxiv icon

Prompting GPT-3 To Be Reliable

Add code
Bookmark button
Alert button
Oct 17, 2022
Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, Jianfeng Wang, Jordan Boyd-Graber, Lijuan Wang

Figure 1 for Prompting GPT-3 To Be Reliable
Figure 2 for Prompting GPT-3 To Be Reliable
Figure 3 for Prompting GPT-3 To Be Reliable
Figure 4 for Prompting GPT-3 To Be Reliable
Viaarxiv icon

An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling

Add code
Bookmark button
Alert button
Sep 04, 2022
Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu

Figure 1 for An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Figure 2 for An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Figure 3 for An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Figure 4 for An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Viaarxiv icon

NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis

Add code
Bookmark button
Alert button
Jul 20, 2022
Chenfei Wu, Jian Liang, Xiaowei Hu, Zhe Gan, Jianfeng Wang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan

Figure 1 for NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
Figure 2 for NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
Figure 3 for NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
Figure 4 for NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
Viaarxiv icon

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

Add code
Bookmark button
Alert button
Jun 15, 2022
Zi-Yi Dou, Aishwarya Kamath, Zhe Gan, Pengchuan Zhang, Jianfeng Wang, Linjie Li, Zicheng Liu, Ce Liu, Yann LeCun, Nanyun Peng, Jianfeng Gao, Lijuan Wang

Figure 1 for Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Figure 2 for Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Figure 3 for Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Figure 4 for Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Viaarxiv icon

LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling

Add code
Bookmark button
Alert button
Jun 14, 2022
Linjie Li, Zhe Gan, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Ce Liu, Lijuan Wang

Figure 1 for LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
Figure 2 for LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
Figure 3 for LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
Figure 4 for LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
Viaarxiv icon

GIT: A Generative Image-to-text Transformer for Vision and Language

Add code
Bookmark button
Alert button
May 31, 2022
Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang

Figure 1 for GIT: A Generative Image-to-text Transformer for Vision and Language
Figure 2 for GIT: A Generative Image-to-text Transformer for Vision and Language
Figure 3 for GIT: A Generative Image-to-text Transformer for Vision and Language
Figure 4 for GIT: A Generative Image-to-text Transformer for Vision and Language
Viaarxiv icon