Picture for Yehao Li

Yehao Li

Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning

Add code
Jul 11, 2022
Figure 1 for Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning
Figure 2 for Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning
Figure 3 for Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning
Figure 4 for Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning
Viaarxiv icon

Comprehending and Ordering Semantics for Image Captioning

Add code
Jun 14, 2022
Figure 1 for Comprehending and Ordering Semantics for Image Captioning
Figure 2 for Comprehending and Ordering Semantics for Image Captioning
Figure 3 for Comprehending and Ordering Semantics for Image Captioning
Figure 4 for Comprehending and Ordering Semantics for Image Captioning
Viaarxiv icon

Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation

Add code
Jun 13, 2022
Figure 1 for Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation
Figure 2 for Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation
Figure 3 for Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation
Figure 4 for Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation
Viaarxiv icon

Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training

Add code
Jan 11, 2022
Figure 1 for Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training
Figure 2 for Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training
Figure 3 for Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training
Figure 4 for Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training
Viaarxiv icon

CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising

Add code
Dec 14, 2021
Figure 1 for CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising
Figure 2 for CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising
Figure 3 for CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising
Figure 4 for CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising
Viaarxiv icon

X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics

Add code
Aug 18, 2021
Figure 1 for X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
Figure 2 for X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
Figure 3 for X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
Figure 4 for X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
Viaarxiv icon

Contextual Transformer Networks for Visual Recognition

Add code
Jul 26, 2021
Figure 1 for Contextual Transformer Networks for Visual Recognition
Figure 2 for Contextual Transformer Networks for Visual Recognition
Figure 3 for Contextual Transformer Networks for Visual Recognition
Figure 4 for Contextual Transformer Networks for Visual Recognition
Viaarxiv icon

Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network

Add code
Jan 27, 2021
Figure 1 for Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network
Figure 2 for Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network
Figure 3 for Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network
Figure 4 for Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network
Viaarxiv icon

Pre-training for Video Captioning Challenge 2020 Summary

Add code
Jul 27, 2020
Figure 1 for Pre-training for Video Captioning Challenge 2020 Summary
Figure 2 for Pre-training for Video Captioning Challenge 2020 Summary
Figure 3 for Pre-training for Video Captioning Challenge 2020 Summary
Viaarxiv icon

Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training

Add code
Jul 05, 2020
Figure 1 for Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training
Figure 2 for Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training
Figure 3 for Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training
Figure 4 for Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training
Viaarxiv icon