Alert button
Picture for Yaya Shi

Yaya Shi

Alert button

Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training

Add code
Bookmark button
Alert button
Mar 01, 2024
Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu

Figure 1 for Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
Figure 2 for Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
Figure 3 for Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
Figure 4 for Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
Viaarxiv icon

Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval

Add code
Bookmark button
Alert button
Feb 26, 2024
Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu

Viaarxiv icon

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model

Add code
Bookmark button
Alert button
Nov 30, 2023
Anwen Hu, Yaya Shi, Haiyang Xu, Jiabo Ye, Qinghao Ye, Ming Yan, Chenliang Li, Qi Qian, Ji Zhang, Fei Huang

Viaarxiv icon

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks

Add code
Bookmark button
Alert button
Jun 07, 2023
Haiyang Xu, Qinghao Ye, Xuan Wu, Ming Yan, Yuan Miao, Jiabo Ye, Guohai Xu, Anwen Hu, Yaya Shi, Guangwei Xu, Chenliang Li, Qi Qian, Maofei Que, Ji Zhang, Xiao Zeng, Fei Huang

Figure 1 for Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Figure 2 for Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Figure 3 for Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Figure 4 for Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Viaarxiv icon

mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality

Add code
Bookmark button
Alert button
Apr 27, 2023
Qinghao Ye, Haiyang Xu, Guohai Xu, Jiabo Ye, Ming Yan, Yiyang Zhou, Junyang Wang, Anwen Hu, Pengcheng Shi, Yaya Shi, Chenliang Li, Yuanhong Xu, Hehong Chen, Junfeng Tian, Qian Qi, Ji Zhang, Fei Huang

Figure 1 for mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Figure 2 for mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Figure 3 for mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Figure 4 for mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Viaarxiv icon

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

Add code
Bookmark button
Alert button
Feb 01, 2023
Haiyang Xu, Qinghao Ye, Ming Yan, Yaya Shi, Jiabo Ye, Yuanhong Xu, Chenliang Li, Bin Bi, Qi Qian, Wei Wang, Guohai Xu, Ji Zhang, Songfang Huang, Fei Huang, Jingren Zhou

Figure 1 for mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Figure 2 for mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Figure 3 for mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Figure 4 for mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Viaarxiv icon

EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

Add code
Bookmark button
Alert button
Nov 17, 2021
Yaya Shi, Xu Yang, Haiyang Xu, Chunfeng Yuan, Bing Li, Weiming Hu, Zheng-Jun Zha

Figure 1 for EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Figure 2 for EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Figure 3 for EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Figure 4 for EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Viaarxiv icon

A Simple and Strong Baseline for Universal Targeted Attacks on Siamese Visual Tracking

Add code
Bookmark button
Alert button
May 06, 2021
Zhenbang Li, Yaya Shi, Jin Gao, Shaoru Wang, Bing Li, Pengpeng Liang, Weiming Hu

Figure 1 for A Simple and Strong Baseline for Universal Targeted Attacks on Siamese Visual Tracking
Figure 2 for A Simple and Strong Baseline for Universal Targeted Attacks on Siamese Visual Tracking
Figure 3 for A Simple and Strong Baseline for Universal Targeted Attacks on Siamese Visual Tracking
Figure 4 for A Simple and Strong Baseline for Universal Targeted Attacks on Siamese Visual Tracking
Viaarxiv icon

Object Relational Graph with Teacher-Recommended Learning for Video Captioning

Add code
Bookmark button
Alert button
Feb 26, 2020
Ziqi Zhang, Yaya Shi, Chunfeng Yuan, Bing Li, Peijin Wang, Weiming Hu, Zhengjun Zha

Figure 1 for Object Relational Graph with Teacher-Recommended Learning for Video Captioning
Figure 2 for Object Relational Graph with Teacher-Recommended Learning for Video Captioning
Figure 3 for Object Relational Graph with Teacher-Recommended Learning for Video Captioning
Figure 4 for Object Relational Graph with Teacher-Recommended Learning for Video Captioning
Viaarxiv icon

VATEX Captioning Challenge 2019: Multi-modal Information Fusion and Multi-stage Training Strategy for Video Captioning

Add code
Bookmark button
Alert button
Oct 13, 2019
Ziqi Zhang, Yaya Shi, Jiutong Wei, Chunfeng Yuan, Bing Li, Weiming Hu

Figure 1 for VATEX Captioning Challenge 2019: Multi-modal Information Fusion and Multi-stage Training Strategy for Video Captioning
Figure 2 for VATEX Captioning Challenge 2019: Multi-modal Information Fusion and Multi-stage Training Strategy for Video Captioning
Viaarxiv icon