Alert button
Picture for Kevin Qinghong Lin

Kevin Qinghong Lin

Alert button

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Add code
Bookmark button
Alert button
Jan 01, 2024
Alex Jinpeng Wang, Linjie Li, Kevin Qinghong Lin, Jianfeng Wang, Kevin Lin, Zhengyuan Yang, Lijuan Wang, Mike Zheng Shou

Viaarxiv icon

Bootstrapping SparseFormers from Vision Foundation Models

Add code
Bookmark button
Alert button
Dec 04, 2023
Ziteng Gao, Zhan Tong, Kevin Qinghong Lin, Joya Chen, Mike Zheng Shou

Viaarxiv icon

DiffusionVMR: Diffusion Model for Video Moment Retrieval

Add code
Bookmark button
Alert button
Aug 29, 2023
Henghao Zhao, Kevin Qinghong Lin, Rui Yan, Zechao Li

Figure 1 for DiffusionVMR: Diffusion Model for Video Moment Retrieval
Figure 2 for DiffusionVMR: Diffusion Model for Video Moment Retrieval
Figure 3 for DiffusionVMR: Diffusion Model for Video Moment Retrieval
Figure 4 for DiffusionVMR: Diffusion Model for Video Moment Retrieval
Viaarxiv icon

UniVTG: Towards Unified Video-Language Temporal Grounding

Add code
Bookmark button
Alert button
Aug 18, 2023
Kevin Qinghong Lin, Pengchuan Zhang, Joya Chen, Shraman Pramanick, Difei Gao, Alex Jinpeng Wang, Rui Yan, Mike Zheng Shou

Figure 1 for UniVTG: Towards Unified Video-Language Temporal Grounding
Figure 2 for UniVTG: Towards Unified Video-Language Temporal Grounding
Figure 3 for UniVTG: Towards Unified Video-Language Temporal Grounding
Figure 4 for UniVTG: Towards Unified Video-Language Temporal Grounding
Viaarxiv icon

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

Add code
Bookmark button
Alert button
Jul 11, 2023
Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Zheng Shou, Rama Chellappa, Pengchuan Zhang

Figure 1 for EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Figure 2 for EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Figure 3 for EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Figure 4 for EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Viaarxiv icon

AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn

Add code
Bookmark button
Alert button
Jun 28, 2023
Difei Gao, Lei Ji, Luowei Zhou, Kevin Qinghong Lin, Joya Chen, Zihan Fan, Mike Zheng Shou

Figure 1 for AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Figure 2 for AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Figure 3 for AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Figure 4 for AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Viaarxiv icon

Too Large; Data Reduction for Vision-Language Pre-Training

Add code
Bookmark button
Alert button
Jun 01, 2023
Alex Jinpeng Wang, Kevin Qinghong Lin, David Junhao Zhang, Stan Weixian Lei, Mike Zheng Shou

Figure 1 for Too Large; Data Reduction for Vision-Language Pre-Training
Figure 2 for Too Large; Data Reduction for Vision-Language Pre-Training
Figure 3 for Too Large; Data Reduction for Vision-Language Pre-Training
Figure 4 for Too Large; Data Reduction for Vision-Language Pre-Training
Viaarxiv icon

VisorGPT: Learning Visual Prior via Generative Pre-Training

Add code
Bookmark button
Alert button
May 24, 2023
Jinheng Xie, Kai Ye, Yudong Li, Yuexiang Li, Kevin Qinghong Lin, Yefeng Zheng, Linlin Shen, Mike Zheng Shou

Figure 1 for VisorGPT: Learning Visual Prior via Generative Pre-Training
Figure 2 for VisorGPT: Learning Visual Prior via Generative Pre-Training
Figure 3 for VisorGPT: Learning Visual Prior via Generative Pre-Training
Figure 4 for VisorGPT: Learning Visual Prior via Generative Pre-Training
Viaarxiv icon