Picture for Tom Gunter

Tom Gunter

Reusing Pre-Training Data at Test Time is a Compute Multiplier

Add code
Nov 06, 2025
Figure 1 for Reusing Pre-Training Data at Test Time is a Compute Multiplier
Figure 2 for Reusing Pre-Training Data at Test Time is a Compute Multiplier
Figure 3 for Reusing Pre-Training Data at Test Time is a Compute Multiplier
Figure 4 for Reusing Pre-Training Data at Test Time is a Compute Multiplier
Viaarxiv icon

Can External Validation Tools Improve Annotation Quality for LLM-as-a-Judge?

Add code
Jul 22, 2025
Viaarxiv icon

Language Models Improve When Pretraining Data Matches Target Tasks

Add code
Jul 16, 2025
Viaarxiv icon

Datasets, Documents, and Repetitions: The Practicalities of Unequal Data Quality

Add code
Mar 10, 2025
Viaarxiv icon

Apple Intelligence Foundation Language Models

Add code
Jul 29, 2024
Figure 1 for Apple Intelligence Foundation Language Models
Figure 2 for Apple Intelligence Foundation Language Models
Figure 3 for Apple Intelligence Foundation Language Models
Figure 4 for Apple Intelligence Foundation Language Models
Viaarxiv icon

Large Language Model-guided Document Selection

Add code
Jun 07, 2024
Figure 1 for Large Language Model-guided Document Selection
Figure 2 for Large Language Model-guided Document Selection
Figure 3 for Large Language Model-guided Document Selection
Figure 4 for Large Language Model-guided Document Selection
Viaarxiv icon

Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training

Add code
May 23, 2024
Figure 1 for Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training
Figure 2 for Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training
Figure 3 for Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training
Figure 4 for Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training
Viaarxiv icon

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Add code
Mar 22, 2024
Figure 1 for MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Figure 2 for MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Figure 3 for MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Figure 4 for MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Viaarxiv icon

Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

Add code
Sep 08, 2023
Figure 1 for Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Figure 2 for Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Figure 3 for Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Figure 4 for Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Viaarxiv icon

STAIR: Learning Sparse Text and Image Representation in Grounded Tokens

Add code
Feb 08, 2023
Viaarxiv icon