Picture for Mutsumi Nakamura

Mutsumi Nakamura

How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$-bench

Add code
Aug 28, 2025
Viaarxiv icon

VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks

Add code
Oct 17, 2024
Figure 1 for VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Figure 2 for VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Figure 3 for VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Figure 4 for VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Viaarxiv icon

Step-by-Step Reasoning to Solve Grid Puzzles: Where do LLMs Falter?

Add code
Jul 20, 2024
Viaarxiv icon

Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models

Add code
Jun 24, 2024
Figure 1 for Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models
Figure 2 for Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models
Figure 3 for Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models
Figure 4 for Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models
Viaarxiv icon

Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models

Add code
Apr 23, 2024
Viaarxiv icon