Picture for Taiwei Shi

Taiwei Shi

Video-Based Reward Modeling for Computer-Use Agents

Add code
Mar 10, 2026
Viaarxiv icon

Experiential Reinforcement Learning

Add code
Feb 15, 2026
Viaarxiv icon

One Model, All Roles: Multi-Turn, Multi-Agent Self-Play Reinforcement Learning for Conversational Social Intelligence

Add code
Feb 03, 2026
Viaarxiv icon

CoAct-1: Computer-using Agents with Coding as Actions

Add code
Aug 05, 2025
Figure 1 for CoAct-1: Computer-using Agents with Coding as Actions
Figure 2 for CoAct-1: Computer-using Agents with Coding as Actions
Figure 3 for CoAct-1: Computer-using Agents with Coding as Actions
Figure 4 for CoAct-1: Computer-using Agents with Coding as Actions
Viaarxiv icon

STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models

Add code
May 27, 2025
Viaarxiv icon

The Hallucination Tax of Reinforcement Finetuning

Add code
May 20, 2025
Viaarxiv icon

Efficient Reinforcement Finetuning via Adaptive Curriculum Learning

Add code
Apr 07, 2025
Viaarxiv icon

Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base

Add code
Mar 30, 2025
Viaarxiv icon

Detecting and Filtering Unsafe Training Data via Data Attribution

Add code
Feb 17, 2025
Figure 1 for Detecting and Filtering Unsafe Training Data via Data Attribution
Figure 2 for Detecting and Filtering Unsafe Training Data via Data Attribution
Figure 3 for Detecting and Filtering Unsafe Training Data via Data Attribution
Figure 4 for Detecting and Filtering Unsafe Training Data via Data Attribution
Viaarxiv icon

WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback

Add code
Aug 28, 2024
Figure 1 for WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback
Figure 2 for WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback
Figure 3 for WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback
Figure 4 for WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback
Viaarxiv icon