Picture for Pang Wei Koh

Pang Wei Koh

FlexOlmo: Open Language Models for Flexible Data Use

Add code
Jul 09, 2025
Viaarxiv icon

The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains

Add code
Jul 08, 2025
Viaarxiv icon

Frustratingly Simple Retrieval Improves Challenging, Reasoning-Intensive Benchmarks

Add code
Jul 02, 2025
Viaarxiv icon

Spurious Rewards: Rethinking Training Signals in RLVR

Add code
Jun 12, 2025
Viaarxiv icon

Precise Information Control in Long-Form Text Generation

Add code
Jun 06, 2025
Viaarxiv icon

ReasonIR: Training Retrievers for Reasoning Tasks

Add code
Apr 29, 2025
Viaarxiv icon

A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage

Add code
Apr 28, 2025
Viaarxiv icon

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data

Add code
Apr 20, 2025
Viaarxiv icon

DataDecide: How to Predict Best Pretraining Data with Small Experiments

Add code
Apr 15, 2025
Viaarxiv icon

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

Add code
Apr 09, 2025
Viaarxiv icon