Abstract:Handheld data collection systems, such as the Universal Manipulation Interface (UMI), enable scalable data collection across diverse environments but only capture observed actions rather than the desired actions executed by a robot controller. In contrast, teleoperation captures desired actions directly, but is prohibitively time-consuming to collect. We revisit this trade-off through the lens of action validity across task phases. We observe that handheld trajectories provide valid supervision in tolerant, free-space phases, but lack dynamic feasibility in contact-sensitive phases, where tracking observed trajectories at high stiffness produces large, unsafe contact forces. We study the interaction between these two supervision types for contact-rich manipulation and find that training policies that combine handheld data with a small number of targeted teleoperated demonstrations provide an efficient hybrid strategy. Specifically, rather than teleoperating the entire task, we only collect partial teleoperated demonstrations for task segments where base handheld policies fail. However, naively mixing handheld and teleoperated phase-specific data yields worse performance than training on handheld data alone. To address this mismatch between observed and desired supervision, we propose Bi-modal Routing for Imitation Data via Gated Experts (BRIDGE), a mixture of diffusion policy experts that routes between specialist task phase heads conditioned on the current robot state. Notably, our approach enables task-phase specific use of desired actions during contact sensitive segments and improves success rates over handheld-only baselines by up to 36.7% across three contact-rich manipulation tasks.
Abstract:Recognising intent in collaborative human robot tasks can improve team performance and human perception of robots. Intent can differ from the observed outcome in the presence of mistakes which are likely in physically dynamic tasks. We created a dataset of 1227 throws of a ball at a target from 10 participants and observed that 47% of throws were mistakes with 16% completely missing the target. Our research leverages facial images capturing the person's reaction to the outcome of a throw to predict when the resulting throw is a mistake and then we determine the actual intent of the throw. The approach we propose for outcome prediction performs 38% better than the two-stream architecture used previously for this task on front-on videos. In addition, we propose a 1-D CNN model which is used in conjunction with priors learned from the frequency of mistakes to provide an end-to-end pipeline for outcome and intent recognition in this throwing task.



Abstract:This paper describes our recent effort to use virtual reality to simulate threatening emergency evacuation scenarios in which a robot guides a person to an exit. Our prior work has demonstrated that people will follow a robot's guidance, even when the robot is faulty, during an emergency evacuation. Yet, because physical in-person emergency evacuation experiments are difficult and costly to conduct and because we would like to evaluate many different factors, we are motivated to develop a system that immerses people in the simulation environment to encourage genuine subject reactions. We are working to complete experiments verifying the validity of our approach.


Abstract:This paper describes current progress on developing an ethical architecture for robots that are designed to follow human ethical decision-making processes. We surveyed both regular adults (folks) and ethics experts (experts) on what they consider to be ethical behavior in two specific scenarios: pill-sorting with an older adult and game playing with a child. A key goal of the surveys is to better understand human ethical decision-making. In the first survey, folk responses were based on the subject's ethical choices ("folk morality"); in the second survey, expert responses were based on the expert's application of different formal ethical frameworks to each scenario. We observed that most of the formal ethical frameworks we included in the survey (Utilitarianism, Kantian Ethics, Ethics of Care and Virtue Ethics) and "folk morality" were conservative toward deception in the high-risk task with an older adult when both the adult and the child had significant performance deficiencies.