Picture for Yutao Hou

Yutao Hou

Unveiling Over-Memorization in Finetuning LLMs for Reasoning Tasks

Add code
Aug 06, 2025
Viaarxiv icon

Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers

Add code
Jun 05, 2025
Viaarxiv icon

Compound-QA: A Benchmark for Evaluating LLMs on Compound Questions

Add code
Nov 15, 2024
Figure 1 for Compound-QA: A Benchmark for Evaluating LLMs on Compound Questions
Figure 2 for Compound-QA: A Benchmark for Evaluating LLMs on Compound Questions
Figure 3 for Compound-QA: A Benchmark for Evaluating LLMs on Compound Questions
Figure 4 for Compound-QA: A Benchmark for Evaluating LLMs on Compound Questions
Viaarxiv icon