Picture for Xiyang Hu

Xiyang Hu

Ben

Counterfactual Trace Auditing of LLM Agent Skills

Add code
May 12, 2026
Viaarxiv icon

When Simulation Lies: A Sim-to-Real Benchmark and Domain-Randomized RL Recipe for Tool-Use Agents

Add code
May 12, 2026
Viaarxiv icon

FORTIS: Benchmarking Over-Privilege in Agent Skills

Add code
May 09, 2026
Viaarxiv icon

Do Vision Language Models Understand Human Engagement in Games?

Add code
Mar 19, 2026
Viaarxiv icon

Fairness or Fluency? An Investigation into Language Bias of Pairwise LLM-as-a-Judge

Add code
Jan 20, 2026
Viaarxiv icon

Multimodal Generative Engine Optimization: Rank Manipulation for Vision-Language Model Rankers

Add code
Jan 18, 2026
Viaarxiv icon

Value-Action Alignment in Large Language Models under Privacy-Prosocial Conflict

Add code
Jan 07, 2026
Viaarxiv icon

A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations

Add code
May 20, 2025
Figure 1 for A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations
Figure 2 for A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations
Figure 3 for A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations
Figure 4 for A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations
Viaarxiv icon

AD-AGENT: A Multi-agent Framework for End-to-end Anomaly Detection

Add code
May 19, 2025
Figure 1 for AD-AGENT: A Multi-agent Framework for End-to-end Anomaly Detection
Figure 2 for AD-AGENT: A Multi-agent Framework for End-to-end Anomaly Detection
Figure 3 for AD-AGENT: A Multi-agent Framework for End-to-end Anomaly Detection
Figure 4 for AD-AGENT: A Multi-agent Framework for End-to-end Anomaly Detection
Viaarxiv icon

Graph Synthetic Out-of-Distribution Exposure with Large Language Models

Add code
Apr 29, 2025
Figure 1 for Graph Synthetic Out-of-Distribution Exposure with Large Language Models
Figure 2 for Graph Synthetic Out-of-Distribution Exposure with Large Language Models
Figure 3 for Graph Synthetic Out-of-Distribution Exposure with Large Language Models
Figure 4 for Graph Synthetic Out-of-Distribution Exposure with Large Language Models
Viaarxiv icon