Picture for Rui Miao

Rui Miao

Off-Policy Evaluation for Missingness-Aware Policies in MDPs with Rewards Missing Not at Random

Add code
Jun 18, 2026
Viaarxiv icon

SAIGuard: Communication-State Simulation for Proactive Defense of LLM Multi-Agent Systems

Add code
Jun 10, 2026
Viaarxiv icon

Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy

Add code
May 25, 2026
Viaarxiv icon

OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning

Add code
May 21, 2026
Viaarxiv icon

Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection

Add code
May 19, 2026
Viaarxiv icon

Precision Physical Activity Prescription via Reinforcement Learning for Functional Actions

Add code
May 19, 2026
Viaarxiv icon

Backtracking When It Strays: Mitigating Dual Exposure Biases in LLM Reasoning Distillation

Add code
May 19, 2026
Viaarxiv icon

TOPPO: Rethinking PPO for Multi-Task Reinforcement Learning with Critic Balancing

Add code
May 12, 2026
Viaarxiv icon

On the Step Length Confounding in LLM Reasoning Data Selection

Add code
Apr 08, 2026
Viaarxiv icon

ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical Reinforcement Learning

Add code
Mar 17, 2026
Viaarxiv icon