Picture for Aleksey Komissarov

Aleksey Komissarov

SampoNLP: A Self-Referential Toolkit for Morphological Analysis of Subword Tokenizers

Add code
Jan 08, 2026
Viaarxiv icon

Compressed code: the hidden effects of quantization and distillation on programming tokens

Add code
Jan 05, 2026
Viaarxiv icon

When repeats drive the vocabulary: a Byte-Pair Encoding analysis of T2T primate genomes

Add code
May 13, 2025
Viaarxiv icon

Qtok: A Comprehensive Framework for Evaluating Multilingual Tokenizer Quality in Large Language Models

Add code
Oct 16, 2024
Figure 1 for Qtok: A Comprehensive Framework for Evaluating Multilingual Tokenizer Quality in Large Language Models
Figure 2 for Qtok: A Comprehensive Framework for Evaluating Multilingual Tokenizer Quality in Large Language Models
Figure 3 for Qtok: A Comprehensive Framework for Evaluating Multilingual Tokenizer Quality in Large Language Models
Figure 4 for Qtok: A Comprehensive Framework for Evaluating Multilingual Tokenizer Quality in Large Language Models
Viaarxiv icon