Picture for Brian DuSell

Brian DuSell

Language Models over Canonical Byte-Pair Encodings

Add code
Jun 09, 2025
Viaarxiv icon

Information Locality as an Inductive Bias for Neural Language Models

Add code
Jun 05, 2025
Viaarxiv icon

From Language Models over Tokens to Language Models over Characters

Add code
Dec 04, 2024
Viaarxiv icon

Training Neural Networks as Recognizers of Formal Languages

Add code
Nov 11, 2024
Viaarxiv icon

On the Proper Treatment of Tokenization in Psycholinguistics

Add code
Oct 03, 2024
Figure 1 for On the Proper Treatment of Tokenization in Psycholinguistics
Figure 2 for On the Proper Treatment of Tokenization in Psycholinguistics
Figure 3 for On the Proper Treatment of Tokenization in Psycholinguistics
Figure 4 for On the Proper Treatment of Tokenization in Psycholinguistics
Viaarxiv icon

The Foundations of Tokenization: Statistical and Computational Concerns

Add code
Jul 16, 2024
Viaarxiv icon

PILA: A Historical-Linguistic Dataset of Proto-Italic and Latin

Add code
Apr 25, 2024
Viaarxiv icon

Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns

Add code
Oct 03, 2023
Viaarxiv icon

Nondeterministic Stacks in Neural Networks

Add code
Apr 25, 2023
Viaarxiv icon

Algorithms for Weighted Pushdown Automata

Add code
Oct 19, 2022
Figure 1 for Algorithms for Weighted Pushdown Automata
Figure 2 for Algorithms for Weighted Pushdown Automata
Figure 3 for Algorithms for Weighted Pushdown Automata
Figure 4 for Algorithms for Weighted Pushdown Automata
Viaarxiv icon