Picture for Pang Wei Koh

Pang Wei Koh

The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains

Add code
Jul 08, 2025
Viaarxiv icon

Frustratingly Simple Retrieval Improves Challenging, Reasoning-Intensive Benchmarks

Add code
Jul 02, 2025
Viaarxiv icon

Spurious Rewards: Rethinking Training Signals in RLVR

Add code
Jun 12, 2025
Figure 1 for Spurious Rewards: Rethinking Training Signals in RLVR
Figure 2 for Spurious Rewards: Rethinking Training Signals in RLVR
Figure 3 for Spurious Rewards: Rethinking Training Signals in RLVR
Figure 4 for Spurious Rewards: Rethinking Training Signals in RLVR
Viaarxiv icon

Precise Information Control in Long-Form Text Generation

Add code
Jun 06, 2025
Viaarxiv icon

ReasonIR: Training Retrievers for Reasoning Tasks

Add code
Apr 29, 2025
Viaarxiv icon

A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage

Add code
Apr 28, 2025
Viaarxiv icon

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data

Add code
Apr 20, 2025
Viaarxiv icon

DataDecide: How to Predict Best Pretraining Data with Small Experiments

Add code
Apr 15, 2025
Figure 1 for DataDecide: How to Predict Best Pretraining Data with Small Experiments
Figure 2 for DataDecide: How to Predict Best Pretraining Data with Small Experiments
Figure 3 for DataDecide: How to Predict Best Pretraining Data with Small Experiments
Figure 4 for DataDecide: How to Predict Best Pretraining Data with Small Experiments
Viaarxiv icon

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

Add code
Apr 09, 2025
Viaarxiv icon

EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees

Add code
Mar 11, 2025
Viaarxiv icon