Picture for Greg Leppert

Greg Leppert

Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability

Add code
Jun 10, 2025
Viaarxiv icon

Towards Best Practices for Open Datasets for LLM Training

Add code
Jan 14, 2025
Viaarxiv icon

SEAL: Systematic Error Analysis for Value ALignment

Add code
Aug 16, 2024
Viaarxiv icon