Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yibai Li

Nancy

Developing the PsyCogMetrics AI Lab to Evaluate Large Language Models and Advance Cognitive Science -- A Three-Cycle Action Design Science Study

Mar 13, 2026

Zhiye Jin, Yibai Li, K. D. Joshi, Xuefei, Deng, Xiaobing, Li

Abstract:This study presents the development of the PsyCogMetrics AI Lab (psycogmetrics.ai), an integrated, cloud-based platform that operationalizes psychometric and cognitive-science methodologies for Large Language Model (LLM) evaluation. Framed as a three-cycle Action Design Science study, the Relevance Cycle identifies key limitations in current evaluation methods and unfulfilled stakeholder needs. The Rigor Cycle draws on kernel theories such as Popperian falsifiability, Classical Test Theory, and Cognitive Load Theory to derive deductive design objectives. The Design Cycle operationalizes these objectives through nested Build-Intervene-Evaluate loops. The study contributes a novel IT artifact, a validated design for LLM evaluation, benefiting research at the intersection of AI, psychology, cognitive science, and the social and behavioral sciences.

* Proceedings of the 59th Hawaii International Conference on System Sciences (HICSS), January 2026, pp. 6952-6961
* 10 pages. Prepared: April 2025; submitted: June 15, 2025; accepted: August 2025. In: Proceedings of the 59th Hawaii International Conference on System Sciences (HICSS 2026), January 2026

Via

Access Paper or Ask Questions

Counterweights and Complementarities: The Convergence of AI and Blockchain Powering a Decentralized Future

Mar 11, 2026

Yibai Li, Zhiye Jin, Xiaobing, Li, K. D. Joshi, Xuefei, Deng

Abstract:This editorial addresses the critical intersection of artificial intelligence (AI) and blockchain technologies, highlighting their contrasting tendencies toward centralization and decentralization, respectively. While AI, particularly with the rise of large language models (LLMs), exhibits a strong centralizing force due to data and resource monopolization by large corporations, blockchain offers a counterbalancing mechanism through its inherent decentralization, transparency, and security. The editorial argues that these technologies are not mutually exclusive but possess complementary strengths. Blockchain can mitigate AI's centralizing risks by enabling decentralized data management, computation, and governance, promoting greater inclusivity, transparency, and user privacy. Conversely, AI can enhance blockchain's efficiency and security through automated smart contract management, content curation, and threat detection. The core argument calls for the development of ``decentralized intelligence'' (DI) -- an interdisciplinary research area focused on creating intelligent systems that function without centralized control.

* ACM SIGMIS Database: the DATABASE for Advances in Information Systems, Vol. 56, No. 2, pp. 6-12, April 2025
* 7 pages, Editorial, published in ACM SIGMIS Database Vol. 56, Iss. 2

Via

Access Paper or Ask Questions

AI Psychometrics: Evaluating the Psychological Reasoning of Large Language Models with Psychometric Validities

Mar 11, 2026

Yibai Li, Xiaolin Lin, Zhenghui Sha, Zhiye Jin, Xiaobing Li

Abstract:The immense number of parameters and deep neural networks make large language models (LLMs) rival the complexity of human brains, which also makes them opaque ``black box'' systems that are challenging to evaluate and interpret. AI Psychometrics is an emerging field that aims to tackle these challenges by applying psychometric methodologies to evaluate and interpret the psychological traits and processes of artificial intelligence (AI) systems. This paper investigates the application of AI Psychometrics to evaluate the psychological reasoning and overall psychometric validity of four prominent LLMs: GPT-3.5, GPT-4, LLaMA-2, and LLaMA-3. Using the Technology Acceptance Model (TAM), we examined convergent, discriminant, predictive, and external validity across these models. Our findings reveal that the responses from all these models generally met all validity criteria. Moreover, higher-performing models like GPT-4 and LLaMA-3 consistently demonstrated superior psychometric validity compared to their predecessors, GPT-3.5 and LLaMA-2. These results help to establish the validity of applying AI Psychometrics to evaluate and interpret large language models.

* Proc. 58th Hawaii International Conference on System Sciences (HICSS), 2025, pp. 5189-5197
* Accepted for publication in the Proceedings of the 58th Hawaii International Conference on System Sciences (HICSS), 2025

Via

Access Paper or Ask Questions