Picture for Alexander I. Rudnicky

Alexander I. Rudnicky

Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation

Add code
Nov 15, 2023
Figure 1 for Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation
Figure 2 for Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation
Figure 3 for Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation
Figure 4 for Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation
Viaarxiv icon

Advancing Regular Language Reasoning in Linear Recurrent Neural Networks

Add code
Sep 14, 2023
Figure 1 for Advancing Regular Language Reasoning in Linear Recurrent Neural Networks
Figure 2 for Advancing Regular Language Reasoning in Linear Recurrent Neural Networks
Viaarxiv icon

Structured Dialogue Discourse Parsing

Add code
Jun 26, 2023
Figure 1 for Structured Dialogue Discourse Parsing
Figure 2 for Structured Dialogue Discourse Parsing
Figure 3 for Structured Dialogue Discourse Parsing
Figure 4 for Structured Dialogue Discourse Parsing
Viaarxiv icon

Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings

Add code
May 23, 2023
Figure 1 for Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Figure 2 for Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Figure 3 for Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Figure 4 for Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Viaarxiv icon

Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation

Add code
May 05, 2023
Figure 1 for Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
Figure 2 for Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
Figure 3 for Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
Figure 4 for Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
Viaarxiv icon

Receptive Field Alignment Enables Transformer Length Extrapolation

Add code
Dec 20, 2022
Figure 1 for Receptive Field Alignment Enables Transformer Length Extrapolation
Figure 2 for Receptive Field Alignment Enables Transformer Length Extrapolation
Figure 3 for Receptive Field Alignment Enables Transformer Length Extrapolation
Figure 4 for Receptive Field Alignment Enables Transformer Length Extrapolation
Viaarxiv icon

Training Discrete Deep Generative Models via Gapped Straight-Through Estimator

Add code
Jun 15, 2022
Figure 1 for Training Discrete Deep Generative Models via Gapped Straight-Through Estimator
Figure 2 for Training Discrete Deep Generative Models via Gapped Straight-Through Estimator
Figure 3 for Training Discrete Deep Generative Models via Gapped Straight-Through Estimator
Figure 4 for Training Discrete Deep Generative Models via Gapped Straight-Through Estimator
Viaarxiv icon

KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation

Add code
May 20, 2022
Figure 1 for KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation
Figure 2 for KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation
Figure 3 for KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation
Figure 4 for KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation
Viaarxiv icon

Zero-Shot Dialogue Disentanglement by Self-Supervised Entangled Response Selection

Add code
Oct 25, 2021
Figure 1 for Zero-Shot Dialogue Disentanglement by Self-Supervised Entangled Response Selection
Figure 2 for Zero-Shot Dialogue Disentanglement by Self-Supervised Entangled Response Selection
Figure 3 for Zero-Shot Dialogue Disentanglement by Self-Supervised Entangled Response Selection
Figure 4 for Zero-Shot Dialogue Disentanglement by Self-Supervised Entangled Response Selection
Viaarxiv icon

Learning Conversational Systems that Interleave Task and Non-Task Content

Add code
Mar 01, 2017
Figure 1 for Learning Conversational Systems that Interleave Task and Non-Task Content
Figure 2 for Learning Conversational Systems that Interleave Task and Non-Task Content
Figure 3 for Learning Conversational Systems that Interleave Task and Non-Task Content
Figure 4 for Learning Conversational Systems that Interleave Task and Non-Task Content
Viaarxiv icon