Picture for Sewoong Oh

Sewoong Oh

Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models

Add code
Jun 05, 2025
Viaarxiv icon

OpenThoughts: Data Recipes for Reasoning Models

Add code
Jun 05, 2025
Viaarxiv icon

Foundation model for mass spectrometry proteomics

Add code
May 19, 2025
Viaarxiv icon

A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage

Add code
Apr 28, 2025
Viaarxiv icon

Open Deep Search: Democratizing Search with Open-source Reasoning Agents

Add code
Mar 26, 2025
Viaarxiv icon

SuperBPE: Space Travel for Language Models

Add code
Mar 17, 2025
Figure 1 for SuperBPE: Space Travel for Language Models
Figure 2 for SuperBPE: Space Travel for Language Models
Figure 3 for SuperBPE: Space Travel for Language Models
Figure 4 for SuperBPE: Space Travel for Language Models
Viaarxiv icon

S4S: Solving for a Diffusion Model Solver

Add code
Feb 24, 2025
Figure 1 for S4S: Solving for a Diffusion Model Solver
Figure 2 for S4S: Solving for a Diffusion Model Solver
Figure 3 for S4S: Solving for a Diffusion Model Solver
Figure 4 for S4S: Solving for a Diffusion Model Solver
Viaarxiv icon

Economics of Sourcing Human Data

Add code
Feb 11, 2025
Viaarxiv icon

Scalable Fingerprinting of Large Language Models

Add code
Feb 11, 2025
Figure 1 for Scalable Fingerprinting of Large Language Models
Figure 2 for Scalable Fingerprinting of Large Language Models
Figure 3 for Scalable Fingerprinting of Large Language Models
Figure 4 for Scalable Fingerprinting of Large Language Models
Viaarxiv icon

OML: Open, Monetizable, and Loyal AI

Add code
Nov 01, 2024
Figure 1 for OML: Open, Monetizable, and Loyal AI
Figure 2 for OML: Open, Monetizable, and Loyal AI
Figure 3 for OML: Open, Monetizable, and Loyal AI
Figure 4 for OML: Open, Monetizable, and Loyal AI
Viaarxiv icon