Alert button
Picture for Luowei Zhou

Luowei Zhou

Alert button

Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning

Add code
Bookmark button
Alert button
Jun 03, 2022
Yujia Xie, Luowei Zhou, Xiyang Dai, Lu Yuan, Nguyen Bach, Ce Liu, Michael Zeng

Figure 1 for Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Figure 2 for Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Figure 3 for Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Figure 4 for Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Viaarxiv icon

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

Add code
Bookmark button
Alert button
May 29, 2022
Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, Ziyi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji

Figure 1 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Figure 2 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Figure 3 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Figure 4 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Viaarxiv icon

Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks

Add code
Bookmark button
Alert button
Apr 28, 2022
Zhecan Wang, Noel Codella, Yen-Chun Chen, Luowei Zhou, Xiyang Dai, Bin Xiao, Jianwei Yang, Haoxuan You, Kai-Wei Chang, Shih-fu Chang, Lu Yuan

Figure 1 for Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Figure 2 for Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Figure 3 for Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Figure 4 for Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Viaarxiv icon

CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks

Add code
Bookmark button
Alert button
Jan 15, 2022
Zhecan Wang, Noel Codella, Yen-Chun Chen, Luowei Zhou, Jianwei Yang, Xiyang Dai, Bin Xiao, Haoxuan You, Shih-Fu Chang, Lu Yuan

Figure 1 for CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Figure 2 for CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Figure 3 for CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Figure 4 for CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Viaarxiv icon

CLIP-Event: Connecting Text and Images with Event Structures

Add code
Bookmark button
Alert button
Jan 13, 2022
Manling Li, Ruochen Xu, Shuohang Wang, Luowei Zhou, Xudong Lin, Chenguang Zhu, Michael Zeng, Heng Ji, Shih-Fu Chang

Figure 1 for CLIP-Event: Connecting Text and Images with Event Structures
Figure 2 for CLIP-Event: Connecting Text and Images with Event Structures
Figure 3 for CLIP-Event: Connecting Text and Images with Event Structures
Figure 4 for CLIP-Event: Connecting Text and Images with Event Structures
Viaarxiv icon

RegionCLIP: Region-based Language-Image Pretraining

Add code
Bookmark button
Alert button
Dec 16, 2021
Yiwu Zhong, Jianwei Yang, Pengchuan Zhang, Chunyuan Li, Noel Codella, Liunian Harold Li, Luowei Zhou, Xiyang Dai, Lu Yuan, Yin Li, Jianfeng Gao

Figure 1 for RegionCLIP: Region-based Language-Image Pretraining
Figure 2 for RegionCLIP: Region-based Language-Image Pretraining
Figure 3 for RegionCLIP: Region-based Language-Image Pretraining
Figure 4 for RegionCLIP: Region-based Language-Image Pretraining
Viaarxiv icon

BEVT: BERT Pretraining of Video Transformers

Add code
Bookmark button
Alert button
Dec 02, 2021
Rui Wang, Dongdong Chen, Zuxuan Wu, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Yu-Gang Jiang, Luowei Zhou, Lu Yuan

Figure 1 for BEVT: BERT Pretraining of Video Transformers
Figure 2 for BEVT: BERT Pretraining of Video Transformers
Figure 3 for BEVT: BERT Pretraining of Video Transformers
Figure 4 for BEVT: BERT Pretraining of Video Transformers
Viaarxiv icon

Florence: A New Foundation Model for Computer Vision

Add code
Bookmark button
Alert button
Nov 22, 2021
Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, Jianfeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang

Figure 1 for Florence: A New Foundation Model for Computer Vision
Figure 2 for Florence: A New Foundation Model for Computer Vision
Figure 3 for Florence: A New Foundation Model for Computer Vision
Figure 4 for Florence: A New Foundation Model for Computer Vision
Viaarxiv icon

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation

Add code
Bookmark button
Alert button
Jun 08, 2021
Linjie Li, Jie Lei, Zhe Gan, Licheng Yu, Yen-Chun Chen, Rohit Pillai, Yu Cheng, Luowei Zhou, Xin Eric Wang, William Yang Wang, Tamara Lee Berg, Mohit Bansal, Jingjing Liu, Lijuan Wang, Zicheng Liu

Figure 1 for VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Figure 2 for VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Figure 3 for VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Figure 4 for VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Viaarxiv icon