Abstract:The presentation of a robot's capability and identity directly influences a human collaborator's perception and implicit trust in the robot. Unlike humans, a physical robot can simultaneously present different identities and have them reside and control different parts of the robot. This paper presents a novel study that investigates how users perceive a robot where different robot control domains (head and gripper) are presented as independent robots. We conducted a mixed design study where participants experienced one of three presentations: a single robot, two agents with shared full control (co-embodiment), or two agents with split control across robot control domains (split-embodiment). Participants underwent three distinct tasks -- a mundane data entry task where the robot provides motivational support, an individual sorting task with isolated robot failures, and a collaborative arrangement task where the robot causes a failure that directly affects the human participant. Participants perceived the robot as residing in the different control domains and were able to associate robot failure with different identities. This work signals how future robots can leverage different embodiment configurations to obtain the benefit of multiple robots within a single body.




Abstract:When assisting people in daily tasks, robots need to accurately interpret visual cues and respond effectively in diverse safety-critical situations, such as sharp objects on the floor. In this context, we present M-CoDAL, a multimodal-dialogue system specifically designed for embodied agents to better understand and communicate in safety-critical situations. The system leverages discourse coherence relations to enhance its contextual understanding and communication abilities. To train this system, we introduce a novel clustering-based active learning mechanism that utilizes an external Large Language Model (LLM) to identify informative instances. Our approach is evaluated using a newly created multimodal dataset comprising 1K safety violations extracted from 2K Reddit images. These violations are annotated using a Large Multimodal Model (LMM) and verified by human annotators. Results with this dataset demonstrate that our approach improves resolution of safety situations, user sentiment, as well as safety of the conversation. Next, we deploy our dialogue system on a Hello Robot Stretch robot and conduct a within-subject user study with real-world participants. In the study, participants role-play two safety scenarios with different levels of severity with the robot and receive interventions from our model and a baseline system powered by OpenAI's ChatGPT. The study results corroborate and extend the findings from automated evaluation, showing that our proposed system is more persuasive and competent in a real-world embodied agent setting.




Abstract:With more robots being deployed in the world, users will likely interact with multiple robots sequentially when receiving services. In this paper, we describe an exploratory field study in which unsuspecting participants experienced a ``person transfer'' -- a scenario in which they first interacted with one stationary robot before another mobile robot joined to complete the interaction. In our 7-hour study spanning 4 days, we recorded 18 instances of person transfers with 40+ individuals. We also interviewed 11 participants after the interaction to further understand their experience. We used the recorded video and interview data to extract interesting insights about in-the-field sequential human-robot interaction, such as mobile robot handovers, trust in person transfer, and the importance of the robots' positions. Our findings expose pitfalls and present important factors to consider when designing sequential human-robot interaction.