Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hyoshin Kim

Benchmarking LLM Causal Reasoning with Scientifically Validated Relationships

Oct 08, 2025

Donggyu Lee, Sungwon Park, Yerin Hwang, Hyunwoo Oh, Hyoshin Kim, Jungwon Kim, Meeyoung Cha, Sangyoon Park, Jihee Kim

Figure 1 for Benchmarking LLM Causal Reasoning with Scientifically Validated Relationships

Figure 2 for Benchmarking LLM Causal Reasoning with Scientifically Validated Relationships

Figure 3 for Benchmarking LLM Causal Reasoning with Scientifically Validated Relationships

Figure 4 for Benchmarking LLM Causal Reasoning with Scientifically Validated Relationships

Abstract:Causal reasoning is fundamental for Large Language Models (LLMs) to understand genuine cause-and-effect relationships beyond pattern matching. Existing benchmarks suffer from critical limitations such as reliance on synthetic data and narrow domain coverage. We introduce a novel benchmark constructed from casually identified relationships extracted from top-tier economics and finance journals, drawing on rigorous methodologies including instrumental variables, difference-in-differences, and regression discontinuity designs. Our benchmark comprises 40,379 evaluation items covering five task types across domains such as health, environment, technology, law, and culture. Experimental results on eight state-of-the-art LLMs reveal substantial limitations, with the best model achieving only 57.6\% accuracy. Moreover, model scale does not consistently translate to superior performance, and even advanced reasoning models struggle with fundamental causal relationship identification. These findings underscore a critical gap between current LLM capabilities and demands of reliable causal reasoning in high-stakes applications.

Via

Access Paper or Ask Questions

GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM

Jul 17, 2025

Kyeongjin Ahn, Sungwon Han, Seungeon Lee, Donghyun Ahn, Hyoshin Kim, Jungwon Kim, Jihee Kim, Sangyoon Park, Meeyoung Cha

Figure 1 for GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM

Figure 2 for GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM

Figure 3 for GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM

Figure 4 for GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM

Abstract:Socio-economic indicators like regional GDP, population, and education levels, are crucial to shaping policy decisions and fostering sustainable development. This research introduces GeoReg a regression model that integrates diverse data sources, including satellite imagery and web-based geospatial information, to estimate these indicators even for data-scarce regions such as developing countries. Our approach leverages the prior knowledge of large language model (LLM) to address the scarcity of labeled data, with the LLM functioning as a data engineer by extracting informative features to enable effective estimation in few-shot settings. Specifically, our model obtains contextual relationships between data features and the target indicator, categorizing their correlations as positive, negative, mixed, or irrelevant. These features are then fed into the linear estimator with tailored weight constraints for each category. To capture nonlinear patterns, the model also identifies meaningful feature interactions and integrates them, along with nonlinear transformations. Experiments across three countries at different stages of development demonstrate that our model outperforms baselines in estimating socio-economic indicators, even for low-income countries with limited data availability.

* 15 pages, 13 figures, 7 tables

Via

Access Paper or Ask Questions