Picture for Pang Wei Koh

Pang Wei Koh

Olmo 3

Add code
Dec 15, 2025
Viaarxiv icon

RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments

Add code
Nov 10, 2025
Figure 1 for RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
Figure 2 for RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
Figure 3 for RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
Figure 4 for RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
Viaarxiv icon

FlexOlmo: Open Language Models for Flexible Data Use

Add code
Jul 09, 2025
Figure 1 for FlexOlmo: Open Language Models for Flexible Data Use
Figure 2 for FlexOlmo: Open Language Models for Flexible Data Use
Figure 3 for FlexOlmo: Open Language Models for Flexible Data Use
Figure 4 for FlexOlmo: Open Language Models for Flexible Data Use
Viaarxiv icon

The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains

Add code
Jul 08, 2025
Viaarxiv icon

Frustratingly Simple Retrieval Improves Challenging, Reasoning-Intensive Benchmarks

Add code
Jul 02, 2025
Viaarxiv icon

Spurious Rewards: Rethinking Training Signals in RLVR

Add code
Jun 12, 2025
Figure 1 for Spurious Rewards: Rethinking Training Signals in RLVR
Figure 2 for Spurious Rewards: Rethinking Training Signals in RLVR
Figure 3 for Spurious Rewards: Rethinking Training Signals in RLVR
Figure 4 for Spurious Rewards: Rethinking Training Signals in RLVR
Viaarxiv icon

Precise Information Control in Long-Form Text Generation

Add code
Jun 06, 2025
Viaarxiv icon

ReasonIR: Training Retrievers for Reasoning Tasks

Add code
Apr 29, 2025
Viaarxiv icon

A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage

Add code
Apr 28, 2025
Viaarxiv icon

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data

Add code
Apr 20, 2025
Viaarxiv icon