Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bilge Mutlu

Making the Invisible Visible: Understanding the Mismatch Between Organizational Goals and Worker Experiences in AI Adoption

May 04, 2026

Christine P. Lee, Min Kyung Lee, Bilge Mutlu

Abstract:While AI is often introduced into organizations to drive innovation and efficiency, many adoption efforts fail as workers resist and struggle to integrate these systems. These failures point to a deeper issue: workers, the very people expected to collaborate with AI, are often invisible in decisions about how AI is designed and used. Drawing on interviews with professionals who interact with AI systems daily in healthcare, finance, and management, we examine the disconnect between organizational expectations and worker experiences. We identify key barriers, including poor usability and interoperability, misaligned expectations, limited control, and insufficient communication. These challenges highlight a gap between how organizations implement AI and the evolving worker needs, tasks, and workflows that it fails to support. We argue that successful adoption requires recognizing workers as central to AI integration and propose adaptation strategies at the individual, task, and organizational levels to better align AI systems with real-world practices.

Via

Access Paper or Ask Questions

U-Define: Designing User Workflows for Hard and Soft Constraints in LLM-Based Planning

May 04, 2026

Christine P Lee, Xinyu Jessica Wang, Aws Albarghouthi, David Porfirio, Bilge Mutlu

Abstract:LLMs are increasingly used for end-user task planning, yet their black-box nature limits users' ability to ensure reliability and control. While recent systems incorporate verification techniques, it remains unclear how users can effectively apply such rigid constraints to represent intent or adapt to real-world variability. For example, prior work finds that hard-only constraints are too rigid, and numeric flexibility weights confuse users. We investigate how interaction workflows can better support users in applying constraints to guide LLM-generated plans, examining whether abstracting strictness into high-level types (i.e., hard and soft) paired with distinct verification mechanisms helps users more reliably express and align intent. We present U-Define, a system that lets users define constraints in natural language and categorize them as either hard rules that must not be violated or soft preferences that allow flexibility. U-Define verifies these types through complementary methods: formal model checking for hard constraints and LLM-as-judge evaluation for soft ones. Through a technical evaluation and user studies with general and expert participants, we find that user-defined constraint types improve perceived usefulness, performance, and satisfaction while maintaining usability. These findings provide insights for designing flexible yet reliable constraint-based workflows.

Via

Access Paper or Ask Questions

Designing Robots to Support Parent-Child Connections: Opportunities Through Robot-Mediated Communication

Apr 27, 2026

Michael F Xu, Bengisu Cagiltay, Yaxin Hu, Anjun Zhu, Bilge Mutlu

Abstract:The sense of family connectedness may support positive outcomes including individual well-being, resilience, and healthy family functioning. However, as technologies advance, they often replace human-human interactions instead of nurturing them. In this work, we investigate how robot-facilitated communication tools might instead create new opportunities for family connection. We conducted two studies with families with children aged 5-12. We first explored the design space through in-home technology probe sessions with six families. These probes inspired us to explore two key interaction design dimensions: the robot's behavior strategy (passive, reactive, proactive) and the mode of communication (synchronous, asynchronous). We then conducted a laboratory study with 20 families to examine how the two dimensions shaped parent-child interaction and connection. Our findings characterize how parents and children appropriated robot-mediated exchanges, the tensions they experienced around initiative, timing, and privacy, and the opportunities they envisioned for supporting everyday connectedness.

* Proceedings of the 25th Interaction Design and Children Conference (IDC '26)

Via

Access Paper or Ask Questions

Supporting Family-School Partnerships with Robot-Facilitated Home-Based Activities

Apr 27, 2026

Michael F Xu, Qiyao Yang, Heather Kirkorian, Bilge Mutlu

Abstract:Family-school partnerships (FSP) are critical to children's development, yet families often face barriers such as time constraints, fragmented communication, and limited opportunities for meaningful engagement. As a step toward facilitating broader family-school partnerships, we explore a novel approach that integrates a social robot into family settings, specifically supporting home-based activities. Through interviews and co-design sessions, we designed and developed a robotic system informed by both parents and children, that supported, among other interactions, family communication about school topics. We evaluated the robot in a week-long, in-home study with 10 families. Our findings show how families integrated the robot into daily life, how parental facilitation styles shaped use, and how families perceived both the helpfulness and challenges of the robot. We contribute empirical insights, a modular system, and design implications for family- and child-robot interactions. We discuss ethical and privacy considerations, and broaden the design space for technologies supporting family-school partnerships.

* Proceedings of the 25th Interaction Design and Children Conference (IDC '26)

Via

Access Paper or Ask Questions

The Long-Horizon Task Mirage? Diagnosing Where and Why Agentic Systems Break

Apr 13, 2026

Xinyu Jessica Wang, Haoyue Bai, Yiyou Sun, Haorui Wang, Shuibai Zhang, Wenjie Hu, Mya Schroder, Bilge Mutlu, Dawn Song, Robert D Nowak

Abstract:Large language model (LLM) agents perform strongly on short- and mid-horizon tasks, but often break down on long-horizon tasks that require extended, interdependent action sequences. Despite rapid progress in agentic systems, these long-horizon failures remain poorly characterized, hindering principled diagnosis and comparison across domains. To address this gap, we introduce HORIZON, an initial cross-domain diagnostic benchmark for systematically constructing tasks and analyzing long-horizon failure behaviors in LLM-based agents. Using HORIZON, we evaluate state-of-the-art (SOTA) agents from multiple model families (GPT-5 variants and Claude models), collecting 3100+ trajectories across four representative agentic domains to study horizon-dependent degradation patterns. We further propose a trajectory-grounded LLM-as-a-Judge pipeline for scalable and reproducible failure attribution, and validate it with human annotation on trajectories, achieving strong agreement (inter-annotator κ=0.61; human-judge κ=0.84). Our findings offer an initial methodological step toward systematic, cross-domain analysis of long-horizon agent failures and offer practical guidance for building more reliable long-horizon agents. We release our project website at \href{https://xwang2775.github.io/horizon-leaderboard/}{HORIZON Leaderboard} and welcome contributions from the community.

Via

Access Paper or Ask Questions

Designing Robots for Families: In-Situ Prototyping for Contextual Reminders on Family Routines

Feb 26, 2026

Michael F. Xu, Enhui Zhao, Yawen Zhang, Joseph E. Michaelis, Sarah Sebo, Bilge Mutlu

Abstract:Robots are increasingly entering the daily lives of families, yet their successful integration into domestic life remains a challenge. We explore family routines as a critical entry point for understanding how robots might find a sustainable role in everyday family settings. Together with each of the ten families, we co-designed robot interactions and behaviors, and a plan for the robot to support their chosen routines, accounting for contextual factors such as timing, participants, locations, and the activities in the environment. We then designed, prototyped, and deployed a mobile social robot as a four-day, in-home user study. Families welcomed the robot's reminders, with parents especially appreciating the offloading of some reminding tasks. At the same time, interviews revealed tensions around timing, authority, and family dynamics, highlighting the complexity of integrating robots into households beyond the immediate task of reminders. Based on these insights, we offer design implications for robot-facilitated contextual reminders and discuss broader considerations for designing robots for family settings.

* Proceedings of the 21st ACM/IEEE International Conference on Human Robot Interaction (HRI 2026)

Via

Access Paper or Ask Questions

"It's like a pet...but my pet doesn't collect data about me": Multi-person Households' Privacy Design Preferences for Household Robots

Feb 19, 2026

Jennica Li, Shirley Zhang, Dakota Sullivan, Bengisu Cagiltay, Heather Kirkorian, Bilge Mutlu, Kassem Fawaz

Abstract:Household robots boasting mobility, more sophisticated sensors, and powerful processing models have become increasingly prevalent in the commercial market. However, these features may expose users to unwanted privacy risks, including unsolicited data collection and unauthorized data sharing. While security and privacy researchers thus far have explored people's privacy concerns around household robots, literature investigating people's preferred privacy designs and mitigation strategies is still limited. Additionally, the existing literature has not yet accounted for multi-user perspectives on privacy design and household robots. We aimed to fill this gap by conducting in-person participatory design sessions with 15 households to explore how they would design a privacy-aware household robot based on their concerns and expectations. We found that participants did not trust that robots, or their respective manufacturers, would respect the data privacy of household members or operate in a multi-user ecosystem without jeopardizing users' personal data. Based on these concerns, they generated designs that gave them authority over their data, contained accessible controls and notification systems, and could be customized and tailored to suit the needs and preferences of each user over time. We synthesize our findings into actionable design recommendations for robot manufacturers and developers.

* 13 pages (main body), 2 figures

Via

Access Paper or Ask Questions

Elements of Robot Morphology: Supporting Designers in Robot Form Exploration

Feb 09, 2026

Amy Koike, Ge, Guo, Xinning He, Callie Y. Kim, Dakota Sullivan, Bilge Mutlu

Abstract:Robot morphology, the form, shape, and structure of robots, is a key design space in human-robot interaction (HRI), shaping how robots function, express themselves, and interact with people. Yet, despite its importance, little is known about how design frameworks can guide systematic form exploration. To address this gap, we introduce Elements of Robot Morphology, a framework that identifies five fundamental elements: perception, articulation, end effectors, locomotion, and structure. Derived from an analysis of existing robots, the framework supports structured exploration of diverse robot forms. To operationalize the framework, we developed Morphology Exploration Blocks (MEB), a set of tangible blocks that enable hands-on, collaborative experimentation with robot morphologies. We evaluate the framework and toolkit through a case study and design workshops, showing how they support analysis, ideation, reflection, and collaborative robot design.

* 10 pages, 5 figures, Proceedings of the 21st ACM/IEEE International Conference on Human-Robot Interaction (HRI '26)

Via

Access Paper or Ask Questions

Robot-Assisted Group Tours for Blind People

Feb 04, 2026

Yaxin Hu, Masaki Kuribayashi, Allan Wang, Seita Kayukawa, Daisuke Sato, Bilge Mutlu, Hironobu Takagi, Chieko Asakawa

Abstract:Group interactions are essential to social functioning, yet effective engagement relies on the ability to recognize and interpret visual cues, making such engagement a significant challenge for blind people. In this paper, we investigate how a mobile robot can support group interactions for blind people. We used the scenario of a guided tour with mixed-visual groups involving blind and sighted visitors. Based on insights from an interview study with blind people (n=5) and museum experts (n=5), we designed and prototyped a robotic system that supported blind visitors to join group tours. We conducted a field study in a science museum where each blind participant (n=8) joined a group tour with one guide and two sighted participants (n=8). Findings indicated users' sense of safety from the robot's navigational support, concerns in the group participation, and preferences for obtaining environmental information. We present design implications for future robotic systems to support blind people's mixed-visual group participation.

* In Proceedings of ACM CHI 2026 conference on Human Factors in Computing Systems

Via

Access Paper or Ask Questions

MAP: Multi-user Personalization with Collaborative LLM-powered Agents

Mar 17, 2025

Christine Lee, Jihye Choi, Bilge Mutlu

Abstract:The widespread adoption of Large Language Models (LLMs) and LLM-powered agents in multi-user settings underscores the need for reliable, usable methods to accommodate diverse preferences and resolve conflicting directives. Drawing on conflict resolution theory, we introduce a user-centered workflow for multi-user personalization comprising three stages: Reflection, Analysis, and Feedback. We then present MAP -- a \textbf{M}ulti-\textbf{A}gent system for multi-user \textbf{P}ersonalization -- to operationalize this workflow. By delegating subtasks to specialized agents, MAP (1) retrieves and reflects on relevant user information, while enhancing reliability through agent-to-agent interactions, (2) provides detailed analysis for improved transparency and usability, and (3) integrates user feedback to iteratively refine results. Our user study findings (n=12) highlight MAP's effectiveness and usability for conflict resolution while emphasizing the importance of user involvement in resolution verification and failure management. This work highlights the potential of multi-agent systems to implement user-centered, multi-user personalization workflows and concludes by offering insights for personalization in multi-user contexts.

* In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '25), April 26-May 1, 2025, Yokohama, Japan

Via

Access Paper or Ask Questions