Alert button
Picture for Yumao Lu

Yumao Lu

Alert button

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Add code
Bookmark button
Alert button
Nov 10, 2023
Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan

Figure 1 for Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Figure 2 for Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Figure 3 for Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Figure 4 for Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Viaarxiv icon

MM-VID: Advancing Video Understanding with GPT-4V(ision)

Add code
Bookmark button
Alert button
Oct 30, 2023
Kevin Lin, Faisal Ahmed, Linjie Li, Chung-Ching Lin, Ehsan Azarnasab, Zhengyuan Yang, Jianfeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang

Figure 1 for MM-VID: Advancing Video Understanding with GPT-4V(ision)
Figure 2 for MM-VID: Advancing Video Understanding with GPT-4V(ision)
Figure 3 for MM-VID: Advancing Video Understanding with GPT-4V(ision)
Figure 4 for MM-VID: Advancing Video Understanding with GPT-4V(ision)
Viaarxiv icon

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning

Add code
Bookmark button
Alert button
Nov 25, 2021
Kevin Lin, Linjie Li, Chung-Ching Lin, Faisal Ahmed, Zhe Gan, Zicheng Liu, Yumao Lu, Lijuan Wang

Figure 1 for SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Figure 2 for SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Figure 3 for SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Figure 4 for SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Viaarxiv icon

Scaling Up Vision-Language Pre-training for Image Captioning

Add code
Bookmark button
Alert button
Nov 24, 2021
Xiaowei Hu, Zhe Gan, Jianfeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu, Lijuan Wang

Figure 1 for Scaling Up Vision-Language Pre-training for Image Captioning
Figure 2 for Scaling Up Vision-Language Pre-training for Image Captioning
Figure 3 for Scaling Up Vision-Language Pre-training for Image Captioning
Figure 4 for Scaling Up Vision-Language Pre-training for Image Captioning
Viaarxiv icon

Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling

Add code
Bookmark button
Alert button
Nov 23, 2021
Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang

Figure 1 for Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling
Figure 2 for Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling
Figure 3 for Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling
Figure 4 for Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling
Viaarxiv icon

Florence: A New Foundation Model for Computer Vision

Add code
Bookmark button
Alert button
Nov 22, 2021
Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, Jianfeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang

Figure 1 for Florence: A New Foundation Model for Computer Vision
Figure 2 for Florence: A New Foundation Model for Computer Vision
Figure 3 for Florence: A New Foundation Model for Computer Vision
Figure 4 for Florence: A New Foundation Model for Computer Vision
Viaarxiv icon

UFO: A UniFied TransfOrmer for Vision-Language Representation Learning

Add code
Bookmark button
Alert button
Nov 19, 2021
Jianfeng Wang, Xiaowei Hu, Zhe Gan, Zhengyuan Yang, Xiyang Dai, Zicheng Liu, Yumao Lu, Lijuan Wang

Figure 1 for UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Figure 2 for UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Figure 3 for UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Figure 4 for UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Viaarxiv icon

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA

Add code
Bookmark button
Alert button
Sep 10, 2021
Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu, Lijuan Wang

Figure 1 for An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Figure 2 for An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Figure 3 for An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Figure 4 for An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Viaarxiv icon