Picture for Bosi Wen

Bosi Wen

IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation

Add code
Mar 05, 2026
Viaarxiv icon

RLAR: An Agentic Reward System for Multi-task Reinforcement Learning on Large Language Models

Add code
Feb 28, 2026
Viaarxiv icon

RAVEL: Reasoning Agents for Validating and Evaluating LLM Text Synthesis

Add code
Feb 28, 2026
Viaarxiv icon

TraceSIR: A Multi-Agent Framework for Structured Analysis and Reporting of Agentic Execution Traces

Add code
Feb 28, 2026
Viaarxiv icon

GLM-5: from Vibe Coding to Agentic Engineering

Add code
Feb 17, 2026
Viaarxiv icon

RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing

Add code
Aug 28, 2025
Figure 1 for RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing
Figure 2 for RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing
Figure 3 for RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing
Figure 4 for RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing
Viaarxiv icon

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Add code
Aug 08, 2025
Viaarxiv icon

HPSS: Heuristic Prompting Strategy Search for LLM Evaluators

Add code
Feb 18, 2025
Figure 1 for HPSS: Heuristic Prompting Strategy Search for LLM Evaluators
Figure 2 for HPSS: Heuristic Prompting Strategy Search for LLM Evaluators
Figure 3 for HPSS: Heuristic Prompting Strategy Search for LLM Evaluators
Figure 4 for HPSS: Heuristic Prompting Strategy Search for LLM Evaluators
Viaarxiv icon

CharacterBench: Benchmarking Character Customization of Large Language Models

Add code
Dec 16, 2024
Figure 1 for CharacterBench: Benchmarking Character Customization of Large Language Models
Figure 2 for CharacterBench: Benchmarking Character Customization of Large Language Models
Figure 3 for CharacterBench: Benchmarking Character Customization of Large Language Models
Figure 4 for CharacterBench: Benchmarking Character Customization of Large Language Models
Viaarxiv icon

Benchmarking Complex Instruction-Following with Multiple Constraints Composition

Add code
Jul 04, 2024
Figure 1 for Benchmarking Complex Instruction-Following with Multiple Constraints Composition
Figure 2 for Benchmarking Complex Instruction-Following with Multiple Constraints Composition
Figure 3 for Benchmarking Complex Instruction-Following with Multiple Constraints Composition
Figure 4 for Benchmarking Complex Instruction-Following with Multiple Constraints Composition
Viaarxiv icon