Picture for Shuyue Stella Li

Shuyue Stella Li

Olmo 3

Add code
Dec 15, 2025
Viaarxiv icon

RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments

Add code
Nov 10, 2025
Viaarxiv icon

PrefPalette: Personalized Preference Modeling with Latent Attributes

Add code
Jul 17, 2025
Figure 1 for PrefPalette: Personalized Preference Modeling with Latent Attributes
Figure 2 for PrefPalette: Personalized Preference Modeling with Latent Attributes
Figure 3 for PrefPalette: Personalized Preference Modeling with Latent Attributes
Figure 4 for PrefPalette: Personalized Preference Modeling with Latent Attributes
Viaarxiv icon

Spurious Rewards: Rethinking Training Signals in RLVR

Add code
Jun 12, 2025
Figure 1 for Spurious Rewards: Rethinking Training Signals in RLVR
Figure 2 for Spurious Rewards: Rethinking Training Signals in RLVR
Figure 3 for Spurious Rewards: Rethinking Training Signals in RLVR
Figure 4 for Spurious Rewards: Rethinking Training Signals in RLVR
Viaarxiv icon

Precise Information Control in Long-Form Text Generation

Add code
Jun 06, 2025
Viaarxiv icon

BehaviorSFT: Behavioral Token Conditioning for Clinical Agents Across the Proactivity Spectrum

Add code
May 27, 2025
Figure 1 for BehaviorSFT: Behavioral Token Conditioning for Clinical Agents Across the Proactivity Spectrum
Figure 2 for BehaviorSFT: Behavioral Token Conditioning for Clinical Agents Across the Proactivity Spectrum
Figure 3 for BehaviorSFT: Behavioral Token Conditioning for Clinical Agents Across the Proactivity Spectrum
Figure 4 for BehaviorSFT: Behavioral Token Conditioning for Clinical Agents Across the Proactivity Spectrum
Viaarxiv icon

BLAB: Brutally Long Audio Bench

Add code
May 05, 2025
Viaarxiv icon

A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage

Add code
Apr 28, 2025
Viaarxiv icon

Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning

Add code
Feb 20, 2025
Viaarxiv icon

CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs

Add code
Oct 03, 2024
Figure 1 for CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs
Figure 2 for CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs
Figure 3 for CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs
Figure 4 for CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs
Viaarxiv icon