Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Reid Simmons

InterPReT: Interactive Policy Restructuring and Training Enable Effective Imitation Learning from Laypersons

Feb 04, 2026

Feiyu Gavin Zhu, Jean Oh, Reid Simmons

Abstract:Imitation learning has shown success in many tasks by learning from expert demonstrations. However, most existing work relies on large-scale demonstrations from technical professionals and close monitoring of the training process. These are challenging for a layperson when they want to teach the agent new skills. To lower the barrier of teaching AI agents, we propose Interactive Policy Restructuring and Training (InterPReT), which takes user instructions to continually update the policy structure and optimize its parameters to fit user demonstrations. This enables end-users to interactively give instructions and demonstrations, monitor the agent's performance, and review the agent's decision-making strategies. A user study (N=34) on teaching an AI agent to drive in a racing game confirms that our approach yields more robust policies without impairing system usability, compared to a generic imitation learning baseline, when a layperson is responsible for both giving demonstrations and determining when to stop. This shows that our method is more suitable for end-users without much technical background in machine learning to train a dependable policy

* Proceedings of the 21st ACM/IEEE International Conference on Human-Robot Interaction

Via

Access Paper or Ask Questions

Older Adults' Preferences for Feedback Cadence from an Exercise Coach Robot

Jan 13, 2026

Roshni Kaushik, Reid Simmons

Abstract:People can respond to feedback and guidance in different ways, and it is important for robots to personalize their interactions and utilize verbal and nonverbal communication cues. We aim to understand how older adults respond to different cadences of verbal and nonverbal feedback of a robot exercise coach. We conducted an online study of older adults, where participants evaluated videos of the robot giving feedback at different cadences for each modality. The results indicate that changing the cadence of one modality affects the perception of both it and the other modality. We can use the results from this study to better design the frequency of the robot coach's feedback during an exercise session with this population.

* Nonarchival submission to RO-MAN 2024 - poster session

Via

Access Paper or Ask Questions

Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind

Apr 28, 2025

Mouad Abrini, Omri Abend, Dina Acklin, Henny Admoni, Gregor Aichinger, Nitay Alon, Zahra Ashktorab, Ashish Atreja, Moises Auron, Alexander Aufreiter(+98 more)

Figure 1 for Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind

Figure 2 for Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind

Figure 3 for Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind

Figure 4 for Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind

Abstract:This volume includes a selection of papers presented at the Workshop on Advancing Artificial Intelligence through Theory of Mind held at AAAI 2025 in Philadelphia US on 3rd March 2025. The purpose of this volume is to provide an open access and curated anthology for the ToM and AI research community.

* workshop proceedings

Via

Access Paper or Ask Questions

Second-order Theory of Mind for Human Teachers and Robot Learners

Mar 17, 2025

Patrick Callaghan, Reid Simmons, Henny Admoni

Abstract:Confusing or otherwise unhelpful learner feedback creates or perpetuates erroneous beliefs that the teacher and learner have of each other, thereby increasing the cognitive burden placed upon the human teacher. For example, the robot's feedback might cause the human to misunderstand what the learner knows about the learning objective or how the learner learns. At the same time -- and in addition to the learning objective -- the learner might misunderstand how the teacher perceives the learner's task knowledge and learning processes. To ease the teaching burden, the learner should provide feedback that accounts for these misunderstandings and elicits efficient teaching from the human. This work endows an AI learner with a Second-order Theory of Mind that models perceived rationality as a source for the erroneous beliefs a teacher and learner may have of one another. It also explores how a learner can ease the teaching burden and improve teacher efficacy if it selects feedback which accounts for its model of the teacher's beliefs about the learner and its learning objective.

Via

Access Paper or Ask Questions

Bi-Directional Mental Model Reconciliation for Human-Robot Interaction with Large Language Models

Mar 10, 2025

Nina Moorman, Michelle Zhao, Matthew B. Luebbers, Sanne Van Waveren, Reid Simmons, Henny Admoni, Sonia Chernova, Matthew Gombolay

Abstract:In human-robot interactions, human and robot agents maintain internal mental models of their environment, their shared task, and each other. The accuracy of these representations depends on each agent's ability to perform theory of mind, i.e. to understand the knowledge, preferences, and intentions of their teammate. When mental models diverge to the extent that it affects task execution, reconciliation becomes necessary to prevent the degradation of interaction. We propose a framework for bi-directional mental model reconciliation, leveraging large language models to facilitate alignment through semi-structured natural language dialogue. Our framework relaxes the assumption of prior model reconciliation work that either the human or robot agent begins with a correct model for the other agent to align to. Through our framework, both humans and robots are able to identify and communicate missing task-relevant context during interaction, iteratively progressing toward a shared mental model.

* Advancing Artificial Intelligence through Theory of Mind Workshop at AAAI 2025

Via

Access Paper or Ask Questions

Sample-Efficient Behavior Cloning Using General Domain Knowledge

Jan 27, 2025

Feiyu Zhu, Jean Oh, Reid Simmons

Figure 1 for Sample-Efficient Behavior Cloning Using General Domain Knowledge

Figure 2 for Sample-Efficient Behavior Cloning Using General Domain Knowledge

Figure 3 for Sample-Efficient Behavior Cloning Using General Domain Knowledge

Figure 4 for Sample-Efficient Behavior Cloning Using General Domain Knowledge

Abstract:Behavior cloning has shown success in many sequential decision-making tasks by learning from expert demonstrations, yet they can be very sample inefficient and fail to generalize to unseen scenarios. One approach to these problems is to introduce general domain knowledge, such that the policy can focus on the essential features and may generalize to unseen states by applying that knowledge. Although this knowledge is easy to acquire from the experts, it is hard to be combined with learning from individual examples due to the lack of semantic structure in neural networks and the time-consuming nature of feature engineering. To enable learning from both general knowledge and specific demonstration trajectories, we use a large language model's coding capability to instantiate a policy structure based on expert domain knowledge expressed in natural language and tune the parameters in the policy with demonstrations. We name this approach the Knowledge Informed Model (KIM) as the structure reflects the semantics of expert knowledge. In our experiments with lunar lander and car racing tasks, our approach learns to solve the tasks with as few as 5 demonstrations and is robust to action noise, outperforming the baseline model without domain knowledge. This indicates that with the help of large language models, we can incorporate domain knowledge into the structure of the policy, increasing sample efficiency for behavior cloning.

Via

Access Paper or Ask Questions

Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback

Oct 11, 2024

Michelle Zhao, Reid Simmons, Henny Admoni, Aaditya Ramdas, Andrea Bajcsy

Figure 1 for Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback

Figure 2 for Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback

Figure 3 for Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback

Figure 4 for Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback

Abstract:In interactive imitation learning (IL), uncertainty quantification offers a way for the learner (i.e. robot) to contend with distribution shifts encountered during deployment by actively seeking additional feedback from an expert (i.e. human) online. Prior works use mechanisms like ensemble disagreement or Monte Carlo dropout to quantify when black-box IL policies are uncertain; however, these approaches can lead to overconfident estimates when faced with deployment-time distribution shifts. Instead, we contend that we need uncertainty quantification algorithms that can leverage the expert human feedback received during deployment time to adapt the robot's uncertainty online. To tackle this, we draw upon online conformal prediction, a distribution-free method for constructing prediction intervals online given a stream of ground-truth labels. Human labels, however, are intermittent in the interactive IL setting. Thus, from the conformal prediction side, we introduce a novel uncertainty quantification algorithm called intermittent quantile tracking (IQT) that leverages a probabilistic model of intermittent labels, maintains asymptotic coverage guarantees, and empirically achieves desired coverage levels. From the interactive IL side, we develop ConformalDAgger, a new approach wherein the robot uses prediction intervals calibrated by IQT as a reliable measure of deployment-time uncertainty to actively query for more expert feedback. We compare ConformalDAgger to prior uncertainty-aware DAgger methods in scenarios where the distribution shift is (and isn't) present because of changes in the expert's policy. We find that in simulated and hardware deployments on a 7DOF robotic manipulator, ConformalDAgger detects high uncertainty when the expert shifts and increases the number of interventions compared to baselines, allowing the robot to more quickly learn the new behavior.

Via

Access Paper or Ask Questions

Conformalized Teleoperation: Confidently Mapping Human Inputs to High-Dimensional Robot Actions

Jun 11, 2024

Michelle Zhao, Reid Simmons, Henny Admoni, Andrea Bajcsy

Figure 1 for Conformalized Teleoperation: Confidently Mapping Human Inputs to High-Dimensional Robot Actions

Figure 2 for Conformalized Teleoperation: Confidently Mapping Human Inputs to High-Dimensional Robot Actions

Figure 3 for Conformalized Teleoperation: Confidently Mapping Human Inputs to High-Dimensional Robot Actions

Figure 4 for Conformalized Teleoperation: Confidently Mapping Human Inputs to High-Dimensional Robot Actions

Abstract:Assistive robotic arms often have more degrees-of-freedom than a human teleoperator can control with a low-dimensional input, like a joystick. To overcome this challenge, existing approaches use data-driven methods to learn a mapping from low-dimensional human inputs to high-dimensional robot actions. However, determining if such a black-box mapping can confidently infer a user's intended high-dimensional action from low-dimensional inputs remains an open problem. Our key idea is to adapt the assistive map at training time to additionally estimate high-dimensional action quantiles, and then calibrate these quantiles via rigorous uncertainty quantification methods. Specifically, we leverage adaptive conformal prediction which adjusts the intervals over time, reducing the uncertainty bounds when the mapping is performant and increasing the bounds when the mapping consistently mis-predicts. Furthermore, we propose an uncertainty-interval-based mechanism for detecting high-uncertainty user inputs and robot states. We evaluate the efficacy of our proposed approach in a 2D assistive navigation task and two 7DOF Kinova Jaco tasks involving assistive cup grasping and goal reaching. Our findings demonstrate that conformalized assistive teleoperation manages to detect (but not differentiate between) high uncertainty induced by diverse preferences and induced by low-precision trajectories in the mapping's training dataset. On the whole, we see this work as a key step towards enabling robots to quantify their own uncertainty and proactively seek intervention when needed.

Via

Access Paper or Ask Questions

Understanding Robot Minds: Leveraging Machine Teaching for Transparent Human-Robot Collaboration Across Diverse Groups

Apr 23, 2024

Suresh Kumaar Jayaraman, Reid Simmons, Aaron Steinfeld, Henny Admoni

Figure 1 for Understanding Robot Minds: Leveraging Machine Teaching for Transparent Human-Robot Collaboration Across Diverse Groups

Figure 2 for Understanding Robot Minds: Leveraging Machine Teaching for Transparent Human-Robot Collaboration Across Diverse Groups

Figure 3 for Understanding Robot Minds: Leveraging Machine Teaching for Transparent Human-Robot Collaboration Across Diverse Groups

Figure 4 for Understanding Robot Minds: Leveraging Machine Teaching for Transparent Human-Robot Collaboration Across Diverse Groups

Abstract:In this work, we aim to improve transparency and efficacy in human-robot collaboration by developing machine teaching algorithms suitable for groups with varied learning capabilities. While previous approaches focused on tailored approaches for teaching individuals, our method teaches teams with various compositions of diverse learners using team belief representations to address personalization challenges within groups. We investigate various group teaching strategies, such as focusing on individual beliefs or the group's collective beliefs, and assess their impact on learning robot policies for different team compositions. Our findings reveal that team belief strategies yield less variation in learning duration and better accommodate diverse teams compared to individual belief strategies, suggesting their suitability in mixed-proficiency settings with limited resources. Conversely, individual belief strategies provide a more uniform knowledge level, particularly effective for homogeneously inexperienced groups. Our study indicates that the teaching strategy's efficacy is significantly influenced by team composition and learner proficiency, highlighting the importance of real-time assessment of learner proficiency and adapting teaching approaches based on learner proficiency for optimal teaching outcomes.

Via

Access Paper or Ask Questions

Bootstrapping Cognitive Agents with a Large Language Model

Feb 25, 2024

Feiyu Zhu, Reid Simmons

Figure 1 for Bootstrapping Cognitive Agents with a Large Language Model

Figure 2 for Bootstrapping Cognitive Agents with a Large Language Model

Figure 3 for Bootstrapping Cognitive Agents with a Large Language Model

Figure 4 for Bootstrapping Cognitive Agents with a Large Language Model

Abstract:Large language models contain noisy general knowledge of the world, yet are hard to train or fine-tune. On the other hand cognitive architectures have excellent interpretability and are flexible to update but require a lot of manual work to instantiate. In this work, we combine the best of both worlds: bootstrapping a cognitive-based model with the noisy knowledge encoded in large language models. Through an embodied agent doing kitchen tasks, we show that our proposed framework yields better efficiency compared to an agent based entirely on large language models. Our experiments indicate that large language models are a good source of information for cognitive architectures, and the cognitive architecture in turn can verify and update the knowledge of large language models to a specific domain.

Via

Access Paper or Ask Questions