Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aravind S

MINDGAMES: A Live Arena for Evaluating Social and Strategic Reasoning in Multi-Agent LLMs

May 28, 2026

Kevin Wang, Anna Thöni, Benjamin Kempinski, Bobby Cheng, Jianzhu Yao, Benjamin Finch, Leon Guertler, Viraj Nadkarni, Yihan Jiang, Aliaksei Korshuk(+43 more)

Abstract:Large language models (LLMs) are increasingly deployed as interactive agents, yet their capacity for social and strategic reasoning over extended interaction remains poorly understood. Existing evaluations rely on static vignettes or single-game benchmarks that cannot capture the sustained, multi-faceted reasoning that real-world multi-agent settings demand. We introduce Mindgames, a multi-game arena and evaluation platform for LLM agents that operationalizes complementary reasoning demands relevant to ``theory of mind'': belief attribution under hidden information, opponent modeling through repeated strategic interaction, cooperative inference under knowledge asymmetries, and sustained deception in social deduction. Built on TextArena, Mindgames provides a unified interaction interface, TrueSkill-based rating, and full trajectory logging across four game environments. We instantiate Mindgames through a 2025 competition cycle hosted at a major AI conference, which assessed 944 submitted agents from 76 teams across four games: Colonel Blotto, Iterated Prisoner's Dilemma, Codenames, and Secret Mafia. Our analysis surfaces both agent-level and evaluation-level limitations: brittle rule adherence remains a major bottleneck, top-performing systems repeatedly rely on explicit structural scaffolding, and leaderboard validity differs sharply across environments. In particular, failure-heavy environments can reward robustness to opponent errors as much as strategic ability, with Secret Mafia exhibiting a pronounced error-survival confound in this cycle. We release a dataset of 29,571 multi-agent games with turn-level observations, actions, and rewards, together with MG-Ref, a deterministic offline tournament protocol that scores new agents against a frozen reference pool of top-ranked, low-error Stage~II submissions under the same error-attribution lens used in this analysis.

Via

Access Paper or Ask Questions

Autonomous UAV for Building Monitoring, Detection and Localisation of Faults

Nov 13, 2021

Suhas Thalanki, T Vijay Prashant, Harshith Kumar M B, Shayak Bhadraray, Aravind S, Srikrishna BR, Sameer Dhole

Figure 1 for Autonomous UAV for Building Monitoring, Detection and Localisation of Faults

Figure 2 for Autonomous UAV for Building Monitoring, Detection and Localisation of Faults

Figure 3 for Autonomous UAV for Building Monitoring, Detection and Localisation of Faults

Figure 4 for Autonomous UAV for Building Monitoring, Detection and Localisation of Faults

Abstract:Collapsing of structural buildings has been sighted commonly and the presence of potential faults has proved to be damaging to the buildings, resulting in accidents. It is essential to continuously monitor any building for faults where human access is restricted. With UAVs (Unmanned Aerial Vehicles) emerging in the field of computer vision, monitoring any building and detecting such faults is seen as a possibility. This paper puts forth a novel approach where an automated UAV traverses around the target building, detects any potential faults in the building, and localizes the faults. With the dimensions of the building provided, a path around the building is generated. The images captured by the onboard camera of the UAV are passed through a neural network system to confirm the presence of faults. Once a fault is detected, the UAV maneuvers itself to the corresponding position where the crack is detected. The simulation is done with ROS(Robot Operating System) using the AirSim environment which initializes ROS Wrappers and provides an integrated interface of ROS and AirSim to work with, The UAV is simulated in the same.

* Submitted, ICRA 2022

Via

Access Paper or Ask Questions