Picture for Kris Cao

Kris Cao

One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers

Add code
Jun 12, 2025
Viaarxiv icon

Command A: An Enterprise-Ready Large Language Model

Add code
Apr 01, 2025
Viaarxiv icon

Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance

Add code
Mar 10, 2024
Viaarxiv icon

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

What is the best recipe for character-level encoder-only modelling?

Add code
May 09, 2023
Viaarxiv icon

Towards Coherent and Consistent Use of Entities in Narrative Generation

Add code
Feb 03, 2022
Figure 1 for Towards Coherent and Consistent Use of Entities in Narrative Generation
Figure 2 for Towards Coherent and Consistent Use of Entities in Narrative Generation
Figure 3 for Towards Coherent and Consistent Use of Entities in Narrative Generation
Figure 4 for Towards Coherent and Consistent Use of Entities in Narrative Generation
Viaarxiv icon

Control Prefixes for Text Generation

Add code
Oct 15, 2021
Figure 1 for Control Prefixes for Text Generation
Figure 2 for Control Prefixes for Text Generation
Figure 3 for Control Prefixes for Text Generation
Figure 4 for Control Prefixes for Text Generation
Viaarxiv icon

You should evaluate your language model on marginal likelihood over tokenisations

Add code
Sep 21, 2021
Figure 1 for You should evaluate your language model on marginal likelihood over tokenisations
Figure 2 for You should evaluate your language model on marginal likelihood over tokenisations
Figure 3 for You should evaluate your language model on marginal likelihood over tokenisations
Figure 4 for You should evaluate your language model on marginal likelihood over tokenisations
Viaarxiv icon

Pitfalls of Static Language Modelling

Add code
Feb 03, 2021
Figure 1 for Pitfalls of Static Language Modelling
Figure 2 for Pitfalls of Static Language Modelling
Figure 3 for Pitfalls of Static Language Modelling
Figure 4 for Pitfalls of Static Language Modelling
Viaarxiv icon