Picture for Omer Levy

Omer Levy

Shammie

Improving Transformer Models by Reordering their Sublayers

Add code
Nov 10, 2019
Figure 1 for Improving Transformer Models by Reordering their Sublayers
Figure 2 for Improving Transformer Models by Reordering their Sublayers
Figure 3 for Improving Transformer Models by Reordering their Sublayers
Figure 4 for Improving Transformer Models by Reordering their Sublayers
Viaarxiv icon

Blockwise Self-Attention for Long Document Understanding

Add code
Nov 07, 2019
Figure 1 for Blockwise Self-Attention for Long Document Understanding
Figure 2 for Blockwise Self-Attention for Long Document Understanding
Figure 3 for Blockwise Self-Attention for Long Document Understanding
Figure 4 for Blockwise Self-Attention for Long Document Understanding
Viaarxiv icon

Generalization through Memorization: Nearest Neighbor Language Models

Add code
Nov 01, 2019
Figure 1 for Generalization through Memorization: Nearest Neighbor Language Models
Figure 2 for Generalization through Memorization: Nearest Neighbor Language Models
Figure 3 for Generalization through Memorization: Nearest Neighbor Language Models
Figure 4 for Generalization through Memorization: Nearest Neighbor Language Models
Viaarxiv icon

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Add code
Oct 29, 2019
Figure 1 for BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Figure 2 for BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Figure 3 for BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Figure 4 for BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Viaarxiv icon

Structural Language Models for Any-Code Generation

Add code
Sep 30, 2019
Figure 1 for Structural Language Models for Any-Code Generation
Figure 2 for Structural Language Models for Any-Code Generation
Figure 3 for Structural Language Models for Any-Code Generation
Figure 4 for Structural Language Models for Any-Code Generation
Viaarxiv icon

BERT for Coreference Resolution: Baselines and Analysis

Add code
Sep 01, 2019
Figure 1 for BERT for Coreference Resolution: Baselines and Analysis
Figure 2 for BERT for Coreference Resolution: Baselines and Analysis
Figure 3 for BERT for Coreference Resolution: Baselines and Analysis
Figure 4 for BERT for Coreference Resolution: Baselines and Analysis
Viaarxiv icon

SpanBERT: Improving Pre-training by Representing and Predicting Spans

Add code
Jul 31, 2019
Figure 1 for SpanBERT: Improving Pre-training by Representing and Predicting Spans
Figure 2 for SpanBERT: Improving Pre-training by Representing and Predicting Spans
Figure 3 for SpanBERT: Improving Pre-training by Representing and Predicting Spans
Figure 4 for SpanBERT: Improving Pre-training by Representing and Predicting Spans
Viaarxiv icon

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Add code
Jul 26, 2019
Figure 1 for RoBERTa: A Robustly Optimized BERT Pretraining Approach
Figure 2 for RoBERTa: A Robustly Optimized BERT Pretraining Approach
Figure 3 for RoBERTa: A Robustly Optimized BERT Pretraining Approach
Figure 4 for RoBERTa: A Robustly Optimized BERT Pretraining Approach
Viaarxiv icon

What Does BERT Look At? An Analysis of BERT's Attention

Add code
Jun 11, 2019
Figure 1 for What Does BERT Look At? An Analysis of BERT's Attention
Figure 2 for What Does BERT Look At? An Analysis of BERT's Attention
Figure 3 for What Does BERT Look At? An Analysis of BERT's Attention
Figure 4 for What Does BERT Look At? An Analysis of BERT's Attention
Viaarxiv icon

Are Sixteen Heads Really Better than One?

Add code
May 25, 2019
Figure 1 for Are Sixteen Heads Really Better than One?
Figure 2 for Are Sixteen Heads Really Better than One?
Figure 3 for Are Sixteen Heads Really Better than One?
Figure 4 for Are Sixteen Heads Really Better than One?
Viaarxiv icon