Picture for Lewis Tunstall

Lewis Tunstall

How Can We Synthesize High-Quality Pretraining Data? A Systematic Study of Prompt Design, Generator Model, and Source Data

Add code
Apr 15, 2026
Viaarxiv icon

QED-Nano: Teaching a Tiny Model to Prove Hard Theorems

Add code
Apr 06, 2026
Viaarxiv icon

Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

Add code
Apr 15, 2025
Viaarxiv icon

SmolVLM: Redefining small and efficient multimodal models

Add code
Apr 07, 2025
Figure 1 for SmolVLM: Redefining small and efficient multimodal models
Figure 2 for SmolVLM: Redefining small and efficient multimodal models
Figure 3 for SmolVLM: Redefining small and efficient multimodal models
Figure 4 for SmolVLM: Redefining small and efficient multimodal models
Viaarxiv icon

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Add code
Mar 10, 2025
Figure 1 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Figure 2 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Figure 3 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Figure 4 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Viaarxiv icon

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Add code
Feb 04, 2025
Figure 1 for SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Figure 2 for SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Figure 3 for SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Figure 4 for SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Viaarxiv icon

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

Add code
Mar 24, 2024
Figure 1 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Figure 2 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Figure 3 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Figure 4 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Viaarxiv icon

Zephyr: Direct Distillation of LM Alignment

Add code
Oct 25, 2023
Figure 1 for Zephyr: Direct Distillation of LM Alignment
Figure 2 for Zephyr: Direct Distillation of LM Alignment
Figure 3 for Zephyr: Direct Distillation of LM Alignment
Figure 4 for Zephyr: Direct Distillation of LM Alignment
Viaarxiv icon

AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages

Add code
Apr 04, 2023
Viaarxiv icon

Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements

Add code
Oct 06, 2022
Figure 1 for Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements
Figure 2 for Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements
Figure 3 for Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements
Viaarxiv icon