Picture for Zhe Gan

Zhe Gan

An Empirical Study of Training End-to-End Vision-and-Language Transformers

Add code
Nov 25, 2021
Figure 1 for An Empirical Study of Training End-to-End Vision-and-Language Transformers
Figure 2 for An Empirical Study of Training End-to-End Vision-and-Language Transformers
Figure 3 for An Empirical Study of Training End-to-End Vision-and-Language Transformers
Figure 4 for An Empirical Study of Training End-to-End Vision-and-Language Transformers
Viaarxiv icon

VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling

Add code
Nov 24, 2021
Figure 1 for VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Figure 2 for VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Figure 3 for VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Figure 4 for VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Viaarxiv icon

Scaling Up Vision-Language Pre-training for Image Captioning

Add code
Nov 24, 2021
Figure 1 for Scaling Up Vision-Language Pre-training for Image Captioning
Figure 2 for Scaling Up Vision-Language Pre-training for Image Captioning
Figure 3 for Scaling Up Vision-Language Pre-training for Image Captioning
Figure 4 for Scaling Up Vision-Language Pre-training for Image Captioning
Viaarxiv icon

Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling

Add code
Nov 23, 2021
Figure 1 for Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling
Figure 2 for Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling
Figure 3 for Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling
Figure 4 for Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling
Viaarxiv icon

UFO: A UniFied TransfOrmer for Vision-Language Representation Learning

Add code
Nov 19, 2021
Figure 1 for UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Figure 2 for UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Figure 3 for UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Figure 4 for UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Viaarxiv icon

Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models

Add code
Nov 04, 2021
Figure 1 for Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models
Figure 2 for Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models
Figure 3 for Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models
Figure 4 for Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models
Viaarxiv icon

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA

Add code
Sep 10, 2021
Figure 1 for An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Figure 2 for An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Figure 3 for An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Figure 4 for An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Viaarxiv icon

Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE

Add code
Jul 02, 2021
Figure 1 for Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE
Figure 2 for Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE
Figure 3 for Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE
Figure 4 for Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE
Viaarxiv icon

Chasing Sparsity in Vision Transformers: An End-to-End Exploration

Add code
Jun 09, 2021
Figure 1 for Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Figure 2 for Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Figure 3 for Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Figure 4 for Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Viaarxiv icon

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation

Add code
Jun 08, 2021
Figure 1 for VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Figure 2 for VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Figure 3 for VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Figure 4 for VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Viaarxiv icon