Abstract:This paper investigates the logical reasoning capabilities of large language models (LLMs). For a precisely defined yet tractable formulation, we choose the conceptually simple but technically complex task of constructing proofs in Boolean logic. A trained LLM receives as input a set of assumptions and a goal, and produces as output a proof that formally derives the goal from the assumptions. Incorrect proofs are caught by an automated proof checker. A critical obstacle for training is the scarcity of real-world proofs. We propose an efficient, randomized procedure for synthesizing valid proofs and introduce Template Transformation, a data augmentation technique that enhances the model's ability to handle complex logical expressions. The central evaluation question is whether an LLM has indeed learned to reason. We propose tests to measure the reasoning ability of a black-box LLM. By these measures, experiments demonstrate strong reasoning capabilities for assertions with short proofs, which decline with proof complexity. Notably, template transformation improves accuracy even for smaller models, suggesting its effectiveness across model scales.
Abstract:This paper describes Resh, a new, statically typed, interpreted programming language and associated runtime for orchestrating multirobot systems. The main features of Resh are: (1) It offloads much of the tedious work of programming such systems away from the programmer and into the language runtime; (2) It is based on a small set of temporal and locational operators; and (3) It is not restricted to specific robot types or tasks. The Resh runtime consists of three engines that collaborate to run a Resh program using the available robots in their current environment. This paper describes both Resh and its runtime and gives examples of its use.