Picture for Shizhe Chen

Shizhe Chen

INRIA

Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation

Add code
Feb 23, 2022
Figure 1 for Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation
Figure 2 for Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation
Figure 3 for Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation
Figure 4 for Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation
Viaarxiv icon

History Aware Multimodal Transformer for Vision-and-Language Navigation

Add code
Oct 25, 2021
Figure 1 for History Aware Multimodal Transformer for Vision-and-Language Navigation
Figure 2 for History Aware Multimodal Transformer for Vision-and-Language Navigation
Figure 3 for History Aware Multimodal Transformer for Vision-and-Language Navigation
Figure 4 for History Aware Multimodal Transformer for Vision-and-Language Navigation
Viaarxiv icon

Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training

Add code
Aug 25, 2021
Figure 1 for Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training
Figure 2 for Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training
Figure 3 for Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training
Figure 4 for Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training
Viaarxiv icon

Airbert: In-domain Pretraining for Vision-and-Language Navigation

Add code
Aug 20, 2021
Figure 1 for Airbert: In-domain Pretraining for Vision-and-Language Navigation
Figure 2 for Airbert: In-domain Pretraining for Vision-and-Language Navigation
Figure 3 for Airbert: In-domain Pretraining for Vision-and-Language Navigation
Figure 4 for Airbert: In-domain Pretraining for Vision-and-Language Navigation
Viaarxiv icon

Elaborative Rehearsal for Zero-shot Action Recognition

Add code
Aug 18, 2021
Figure 1 for Elaborative Rehearsal for Zero-shot Action Recognition
Figure 2 for Elaborative Rehearsal for Zero-shot Action Recognition
Figure 3 for Elaborative Rehearsal for Zero-shot Action Recognition
Figure 4 for Elaborative Rehearsal for Zero-shot Action Recognition
Viaarxiv icon

Question-controlled Text-aware Image Captioning

Add code
Aug 04, 2021
Figure 1 for Question-controlled Text-aware Image Captioning
Figure 2 for Question-controlled Text-aware Image Captioning
Figure 3 for Question-controlled Text-aware Image Captioning
Figure 4 for Question-controlled Text-aware Image Captioning
Viaarxiv icon

ICECAP: Information Concentrated Entity-aware Image Captioning

Add code
Aug 04, 2021
Figure 1 for ICECAP: Information Concentrated Entity-aware Image Captioning
Figure 2 for ICECAP: Information Concentrated Entity-aware Image Captioning
Figure 3 for ICECAP: Information Concentrated Entity-aware Image Captioning
Figure 4 for ICECAP: Information Concentrated Entity-aware Image Captioning
Viaarxiv icon

Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization

Add code
Jun 11, 2021
Figure 1 for Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization
Figure 2 for Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization
Figure 3 for Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization
Figure 4 for Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization
Viaarxiv icon

Towards Diverse Paragraph Captioning for Untrimmed Videos

Add code
May 30, 2021
Figure 1 for Towards Diverse Paragraph Captioning for Untrimmed Videos
Figure 2 for Towards Diverse Paragraph Captioning for Untrimmed Videos
Figure 3 for Towards Diverse Paragraph Captioning for Untrimmed Videos
Figure 4 for Towards Diverse Paragraph Captioning for Untrimmed Videos
Viaarxiv icon

WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training

Add code
Mar 19, 2021
Figure 1 for WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training
Figure 2 for WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training
Figure 3 for WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training
Figure 4 for WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training
Viaarxiv icon