Alert button
Picture for Zi-Yi Dou

Zi-Yi Dou

Alert button

Violet

ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life Videos

Nov 02, 2023
Te-Lin Wu, Zi-Yi Dou, Qingyuan Hu, Yu Hou, Nischal Reddy Chandra, Marjorie Freedman, Ralph M. Weischedel, Nanyun Peng

Viaarxiv icon

DesCo: Learning Object Recognition with Rich Language Descriptions

Jun 24, 2023
Liunian Harold Li, Zi-Yi Dou, Nanyun Peng, Kai-Wei Chang

Figure 1 for DesCo: Learning Object Recognition with Rich Language Descriptions
Figure 2 for DesCo: Learning Object Recognition with Rich Language Descriptions
Figure 3 for DesCo: Learning Object Recognition with Rich Language Descriptions
Figure 4 for DesCo: Learning Object Recognition with Rich Language Descriptions
Viaarxiv icon

Gender Biases in Automatic Evaluation Metrics: A Case Study on Image Captioning

May 24, 2023
Haoyi Qiu, Zi-Yi Dou, Tianlu Wang, Asli Celikyilmaz, Nanyun Peng

Figure 1 for Gender Biases in Automatic Evaluation Metrics: A Case Study on Image Captioning
Figure 2 for Gender Biases in Automatic Evaluation Metrics: A Case Study on Image Captioning
Figure 3 for Gender Biases in Automatic Evaluation Metrics: A Case Study on Image Captioning
Figure 4 for Gender Biases in Automatic Evaluation Metrics: A Case Study on Image Captioning
Viaarxiv icon

Masked Path Modeling for Vision-and-Language Navigation

May 23, 2023
Zi-Yi Dou, Feng Gao, Nanyun Peng

Figure 1 for Masked Path Modeling for Vision-and-Language Navigation
Figure 2 for Masked Path Modeling for Vision-and-Language Navigation
Figure 3 for Masked Path Modeling for Vision-and-Language Navigation
Figure 4 for Masked Path Modeling for Vision-and-Language Navigation
Viaarxiv icon

Generalized Decoding for Pixel, Image, and Language

Dec 21, 2022
Xueyan Zou, Zi-Yi Dou, Jianwei Yang, Zhe Gan, Linjie Li, Chunyuan Li, Xiyang Dai, Harkirat Behl, Jianfeng Wang, Lu Yuan, Nanyun Peng, Lijuan Wang, Yong Jae Lee, Jianfeng Gao

Figure 1 for Generalized Decoding for Pixel, Image, and Language
Figure 2 for Generalized Decoding for Pixel, Image, and Language
Figure 3 for Generalized Decoding for Pixel, Image, and Language
Figure 4 for Generalized Decoding for Pixel, Image, and Language
Viaarxiv icon

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

Jun 15, 2022
Zi-Yi Dou, Aishwarya Kamath, Zhe Gan, Pengchuan Zhang, Jianfeng Wang, Linjie Li, Zicheng Liu, Ce Liu, Yann LeCun, Nanyun Peng, Jianfeng Gao, Lijuan Wang

Figure 1 for Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Figure 2 for Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Figure 3 for Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Figure 4 for Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Viaarxiv icon

FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation

Jun 09, 2022
Zi-Yi Dou, Nanyun Peng

Figure 1 for FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation
Figure 2 for FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation
Figure 3 for FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation
Figure 4 for FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation
Viaarxiv icon

Zero-shot Commonsense Question Answering with Cloze Translation and Consistency Optimization

Jan 01, 2022
Zi-Yi Dou, Nanyun Peng

Figure 1 for Zero-shot Commonsense Question Answering with Cloze Translation and Consistency Optimization
Figure 2 for Zero-shot Commonsense Question Answering with Cloze Translation and Consistency Optimization
Figure 3 for Zero-shot Commonsense Question Answering with Cloze Translation and Consistency Optimization
Figure 4 for Zero-shot Commonsense Question Answering with Cloze Translation and Consistency Optimization
Viaarxiv icon

An Empirical Study of Training End-to-End Vision-and-Language Transformers

Nov 25, 2021
Zi-Yi Dou, Yichong Xu, Zhe Gan, Jianfeng Wang, Shuohang Wang, Lijuan Wang, Chenguang Zhu, Pengchuan Zhang, Lu Yuan, Nanyun Peng, Zicheng Liu, Michael Zeng

Figure 1 for An Empirical Study of Training End-to-End Vision-and-Language Transformers
Figure 2 for An Empirical Study of Training End-to-End Vision-and-Language Transformers
Figure 3 for An Empirical Study of Training End-to-End Vision-and-Language Transformers
Figure 4 for An Empirical Study of Training End-to-End Vision-and-Language Transformers
Viaarxiv icon