Picture for Jen-tse Huang

Jen-tse Huang

SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades

Add code
May 14, 2026
Viaarxiv icon

How to Interpret Agent Behavior

Add code
May 13, 2026
Viaarxiv icon

The Chameleon's Limit: Investigating Persona Collapse and Homogenization in Large Language Models

Add code
Apr 27, 2026
Viaarxiv icon

What do Language Models Learn and When? The Implicit Curriculum Hypothesis

Add code
Apr 09, 2026
Viaarxiv icon

HUMANLLM: Benchmarking and Reinforcing LLM Anthropomorphism via Human Cognitive Patterns

Add code
Jan 15, 2026
Viaarxiv icon

Knowing But Not Doing: Convergent Morality and Divergent Action in LLMs

Add code
Jan 12, 2026
Viaarxiv icon

Probing Multimodal Large Language Models on Cognitive Biases in Chinese Short-Video Misinformation

Add code
Jan 10, 2026
Viaarxiv icon

Curing Miracle Steps in LLM Mathematical Reasoning with Rubric Rewards

Add code
Oct 09, 2025
Viaarxiv icon

FAIRGAMER: Evaluating Biases in the Application of Large Language Models to Video Games

Add code
Aug 25, 2025
Viaarxiv icon

Towards Evaluating Proactive Risk Awareness of Multimodal Language Models

Add code
May 23, 2025
Viaarxiv icon