Picture for Tatsuya Hiraoka

Tatsuya Hiraoka

Spelling-out is not Straightforward: LLMs' Capability of Tokenization from Token to Characters

Add code
Jun 12, 2025
Viaarxiv icon

Bit-level BPE: Below the byte boundary

Add code
Jun 09, 2025
Viaarxiv icon

Do LLMs Need to Think in One Language? Correlation between Latent Language and Task Performance

Add code
May 27, 2025
Viaarxiv icon

Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors

Add code
Feb 27, 2025
Viaarxiv icon

Number Representations in LLMs: A Computational Parallel to Human Perception

Add code
Feb 22, 2025
Viaarxiv icon

Repetition Neurons: How Do Language Models Produce Repetitions?

Add code
Oct 17, 2024
Viaarxiv icon

The Geometry of Numerical Reasoning: Language Models Compare Numeric Properties in Linear Subspaces

Add code
Oct 17, 2024
Viaarxiv icon

SubRegWeigh: Effective and Efficient Annotation Weighing with Subword Regularization

Add code
Sep 10, 2024
Viaarxiv icon

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Add code
Jul 04, 2024
Figure 1 for LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
Figure 2 for LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
Figure 3 for LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
Figure 4 for LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
Viaarxiv icon

An Analysis of BPE Vocabulary Trimming in Neural Machine Translation

Add code
Mar 30, 2024
Viaarxiv icon