Picture for Simon Shaolei Du

Simon Shaolei Du

Global Convergence of Four-Layer Matrix Factorization under Random Initialization

Add code
Nov 19, 2025
Viaarxiv icon

RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments

Add code
Nov 10, 2025
Figure 1 for RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
Figure 2 for RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
Figure 3 for RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
Figure 4 for RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
Viaarxiv icon

Policy-Based Trajectory Clustering in Offline Reinforcement Learning

Add code
Jun 12, 2025
Viaarxiv icon

Spurious Rewards: Rethinking Training Signals in RLVR

Add code
Jun 12, 2025
Figure 1 for Spurious Rewards: Rethinking Training Signals in RLVR
Figure 2 for Spurious Rewards: Rethinking Training Signals in RLVR
Figure 3 for Spurious Rewards: Rethinking Training Signals in RLVR
Figure 4 for Spurious Rewards: Rethinking Training Signals in RLVR
Viaarxiv icon

Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models

Add code
Jun 09, 2025
Viaarxiv icon

Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval

Add code
May 21, 2025
Figure 1 for Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval
Figure 2 for Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval
Figure 3 for Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval
Figure 4 for Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval
Viaarxiv icon

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Add code
Apr 29, 2025
Figure 1 for Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Figure 2 for Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Figure 3 for Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Figure 4 for Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Viaarxiv icon

LoRe: Personalizing LLMs via Low-Rank Reward Modeling

Add code
Apr 20, 2025
Figure 1 for LoRe: Personalizing LLMs via Low-Rank Reward Modeling
Figure 2 for LoRe: Personalizing LLMs via Low-Rank Reward Modeling
Figure 3 for LoRe: Personalizing LLMs via Low-Rank Reward Modeling
Figure 4 for LoRe: Personalizing LLMs via Low-Rank Reward Modeling
Viaarxiv icon

SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters

Add code
Feb 11, 2025
Figure 1 for SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters
Figure 2 for SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters
Figure 3 for SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters
Figure 4 for SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters
Viaarxiv icon

Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation

Add code
Dec 17, 2024
Figure 1 for Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation
Figure 2 for Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation
Figure 3 for Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation
Figure 4 for Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation
Viaarxiv icon