Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yo Nakawake

The NazoNazo Benchmark: A Cost-Effective and Extensible Test of Insight-Based Reasoning in LLMs

Sep 18, 2025

Masaharu Mizumoto, Dat Nguyen, Zhiheng Han, Jiyuan Fang, Heyuan Guan, Xingfu Li, Naoya Shiraishi, Xuyang Tian, Yo Nakawake, Le Minh Nguyen

Abstract:Benchmark saturation and contamination undermine confidence in LLM evaluation. We present Nazonazo, a cost-effective and extensible benchmark built from Japanese children's riddles to test insight-based reasoning. Items are short (mostly one sentence), require no specialized domain knowledge, and can be generated at scale, enabling rapid refresh of blind sets when leakage is suspected. We evaluate 38 frontier models and 126 adults on 120 riddles. No model except for GPT-5 is comparable to human performance, which achieves a 52.9% mean accuracy. Model comparison on extended 201 items shows that reasoning models significantly outperform non-reasoning peers, while model size shows no reliable association with accuracy. Beyond aggregate accuracy, an informal candidate-tracking analysis of thought logs reveals many cases of verification failure: models often produce the correct solution among intermediate candidates yet fail to select it as the final answer, which we illustrate with representative examples observed in multiple models. Nazonazo thus offers a cost-effective, scalable, and easily renewable benchmark format that addresses the current evaluation crisis while also suggesting a recurrent meta-cognitive weakness, providing clear targets for future control and calibration methods.

Via

Access Paper or Ask Questions

Systematic quantitative analyses reveal the folk-zoological knowledge embedded in folktales

Jul 09, 2019

Yo Nakawake, Kosuke Sato

Figure 1 for Systematic quantitative analyses reveal the folk-zoological knowledge embedded in folktales

Figure 2 for Systematic quantitative analyses reveal the folk-zoological knowledge embedded in folktales

Figure 3 for Systematic quantitative analyses reveal the folk-zoological knowledge embedded in folktales

Figure 4 for Systematic quantitative analyses reveal the folk-zoological knowledge embedded in folktales

Abstract:Cultural learning is a unique human capacity essential for a wide range of adaptations. Researchers have argued that folktales have the pedagogical function of transmitting the essential information for the environment. The most important knowledge for foraging and pastoral society is folk-zoological knowledge, such as the predator-prey relationship among wild animals, or between wild and domesticated animals. Here, we analysed the descriptions of the 382 animal folktales using the natural language processing method and descriptive statistics listed in a worldwide tale-type index (Aarne-Thompson-Uther type index). Our analyses suggested that first, the predator-prey relationship frequently appeared in a co-occurrent animal pair within a folktale (e.g., cat and mouse or wolf and pig), and second, the motif of 'deception', describing the antagonistic behaviour among animals, appeared relatively higher in 'wild and domestic animals' and 'wild animals' than other types. Furthermore, the motif of 'deception' appeared more frequently in pairs, corresponding to the predator-prey relationship. These results corresponded with the hypothesis that the combination of animal characters and what happens in stories represented relationships in the real world. The present study demonstrated that the combination of quantitative methods and qualitative data broaden our understanding of the evolutionary aspects of human cultures.

* This document is a preprint. We expect changes in the peer review process. Please contact the information for authors

Via

Access Paper or Ask Questions