Picture for Jen-tse Huang

Jen-tse Huang

HUMANLLM: Benchmarking and Reinforcing LLM Anthropomorphism via Human Cognitive Patterns

Add code
Jan 15, 2026
Viaarxiv icon

Knowing But Not Doing: Convergent Morality and Divergent Action in LLMs

Add code
Jan 12, 2026
Viaarxiv icon

Probing Multimodal Large Language Models on Cognitive Biases in Chinese Short-Video Misinformation

Add code
Jan 10, 2026
Viaarxiv icon

Curing Miracle Steps in LLM Mathematical Reasoning with Rubric Rewards

Add code
Oct 09, 2025
Viaarxiv icon

FAIRGAMER: Evaluating Biases in the Application of Large Language Models to Video Games

Add code
Aug 25, 2025
Viaarxiv icon

Towards Evaluating Proactive Risk Awareness of Multimodal Language Models

Add code
May 23, 2025
Viaarxiv icon

A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?

Add code
May 16, 2025
Viaarxiv icon

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

Add code
Apr 22, 2025
Viaarxiv icon

SOTOPIA-S4: a user-friendly system for flexible, customizable, and large-scale social simulation

Add code
Apr 19, 2025
Viaarxiv icon

CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations

Add code
Apr 19, 2025
Figure 1 for CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations
Figure 2 for CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations
Figure 3 for CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations
Figure 4 for CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations
Viaarxiv icon