Picture for Simon Shaolei Du

Simon Shaolei Du

Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models

Add code
Jun 09, 2025
Viaarxiv icon

Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval

Add code
May 21, 2025
Viaarxiv icon

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Add code
Apr 29, 2025
Viaarxiv icon

LoRe: Personalizing LLMs via Low-Rank Reward Modeling

Add code
Apr 20, 2025
Viaarxiv icon

SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters

Add code
Feb 11, 2025
Viaarxiv icon

Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation

Add code
Dec 17, 2024
Viaarxiv icon

Hybrid Preference Optimization for Alignment: Provably Faster Convergence Rates by Combining Offline Preferences with Online Exploration

Add code
Dec 13, 2024
Viaarxiv icon

On Erroneous Agreements of CLIP Image Embeddings

Add code
Nov 07, 2024
Viaarxiv icon

Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning

Add code
Jul 02, 2024
Viaarxiv icon

CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning

Add code
May 29, 2024
Viaarxiv icon