Abstract:While many studies of Large Language Model (LLM) reasoning capabilities emphasize mathematical or technical tasks, few address reasoning about social concepts: the abstract ideas shaping social norms, culture, and institutions. This understudied capability is essential for modern models acting as social agents, yet no systematic evaluation methodology targets it. We introduce SCRuB (Social Concept Reasoning under Rubric-Based Evaluation), a framework designed for this setting of task indeterminacy. Our goal is to measure the degree to which a model reasons about social concepts with the depth and critical rigor of a human expert. SCRuB proceeds in three phases: prompt construction from established sources, response generation by experts and models, and comparative evaluation using a five-dimensional critical thinking rubric. To enable generalization of the pipeline, we introduce a Panel of Disciplinary Perspectives ensemble validated against independent expert judges. We release SCRuBEval (n=4,711 evaluation prompts) and SCRuBAnnotations (300 expert-authored responses and 150 expert comparative judgments from 45 PhD-level scholars). Our results show that frontier models consistently outperform human experts across all five rubric dimensions. Across 1,170 pairwise comparisons, expert judges ranked a model response first in 80.8% of judgments and preferred model responses overall 74.4% of the time. Ultimately, this study provides the first expert-grounded demonstration of evaluation saturation for social concept reasoning: the single-turn exam-style format has reached its ceiling for models and humans alike.
Abstract:Often overlooked as a component of urban development, accessibility infrastructure is undeniably crucial in daily life. Accessibility ramps are one of the most common types of accessibility infrastructure, and serve to benefit not only people with mobile impairments but also able-bodied third parties. While the necessity of accessibility ramps is acknowledged, actual implementation fails in light of the limits of manpower required for the design stage. In response, we present an algorithm capable of the automatic generation of a feasible accessibility ramp based on a 3D model of the relevant environment. Through the manual specification of initial and terminal points within a 3D model, the algorithm uses AI search algorithms to determine the optimal pathway connecting these points. Essential components in devising a wheelchair-accessible ramp are encoded within the process, as evaluated by the algorithm, including but not limited to elevation differentials, spatial constraints, and gradient specifications. From this, the algorithm then generates the pathway to be expanded into a full-scale, usable model of a ramp, which then can be easily exported and transformed through inter-software exchanges. Though some human input is still required following the generation stage, the minimising of human resources provides significant boosts of efficiency in the design process thus lowering the threshold for the incorporation of accessibility features in future urban design.