Alert button
Picture for Sonia Chernova

Sonia Chernova

Alert button

ConSOR: A Context-Aware Semantic Object Rearrangement Framework for Partially Arranged Scenes

Sep 30, 2023
Kartik Ramachandruni, Max Zuo, Sonia Chernova

Object rearrangement is the problem of enabling a robot to identify the correct object placement in a complex environment. Prior work on object rearrangement has explored a diverse set of techniques for following user instructions to achieve some desired goal state. Logical predicates, images of the goal scene, and natural language descriptions have all been used to instruct a robot in how to arrange objects. In this work, we argue that burdening the user with specifying goal scenes is not necessary in partially-arranged environments, such as common household settings. Instead, we show that contextual cues from partially arranged scenes (i.e., the placement of some number of pre-arranged objects in the environment) provide sufficient context to enable robots to perform object rearrangement \textit{without any explicit user goal specification}. We introduce ConSOR, a Context-aware Semantic Object Rearrangement framework that utilizes contextual cues from a partially arranged initial state of the environment to complete the arrangement of new objects, without explicit goal specification from the user. We demonstrate that ConSOR strongly outperforms two baselines in generalizing to novel object arrangements and unseen object categories. The code and data can be found at https://github.com/kartikvrama/consor.

* Accepted to IROS 2023 
Viaarxiv icon

State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding

Sep 21, 2023
Devleena Das, Sonia Chernova, Been Kim

Figure 1 for State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding
Figure 2 for State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding
Figure 3 for State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding
Figure 4 for State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding

With more complex AI systems used by non-AI experts to complete daily tasks, there is an increasing effort to develop methods that produce explanations of AI decision making understandable by non-AI experts. Towards this effort, leveraging higher-level concepts and producing concept-based explanations have become a popular method. Most concept-based explanations have been developed for classification techniques, and we posit that the few existing methods for sequential decision making are limited in scope. In this work, we first contribute a desiderata for defining "concepts" in sequential decision making settings. Additionally, inspired by the Protege Effect which states explaining knowledge often reinforces one's self-learning, we explore the utility of concept-based explanations providing a dual benefit to the RL agent by improving agent learning rate, and to the end-user by improving end-user understanding of agent decision making. To this end, we contribute a unified framework, State2Explanation (S2E), that involves learning a joint embedding model between state-action pairs and concept-based explanations, and leveraging such learned model to both (1) inform reward shaping during an agent's training, and (2) provide explanations to end-users at deployment for improved task performance. Our experimental validations, in Connect 4 and Lunar Lander, demonstrate the success of S2E in providing a dual-benefit, successfully informing reward shaping and improving agent learning rate, as well as significantly improving end user task performance at deployment time.

* Accepted to NeurIPS 2023 
Viaarxiv icon

Predicting Routine Object Usage for Proactive Robot Assistance

Sep 12, 2023
Maithili Patel, Aswin Prakash, Sonia Chernova

Proactivity in robot assistance refers to the robot's ability to anticipate user needs and perform assistive actions without explicit requests. This requires understanding user routines, predicting consistent activities, and actively seeking information to predict inconsistent behaviors. We propose SLaTe-PRO (Sequential Latent Temporal model for Predicting Routine Object usage), which improves upon prior state-of-the-art by combining object and user action information, and conditioning object usage predictions on past history. Additionally, we find some human behavior to be inherently stochastic and lacking in contextual cues that the robot can use for proactive assistance. To address such cases, we introduce an interactive query mechanism that can be used to ask queries about the user's intended activities and object use to improve prediction. We evaluate our approach on longitudinal data from three households, spanning 24 activity classes. SLaTe-PRO performance raises the F1 score metric to 0.57 without queries, and 0.60 with user queries, over a score of 0.43 from prior work. We additionally present a case study with a fully autonomous household robot.

Viaarxiv icon

IndoorSim-to-OutdoorReal: Learning to Navigate Outdoors without any Outdoor Experience

May 10, 2023
Joanne Truong, April Zitkovich, Sonia Chernova, Dhruv Batra, Tingnan Zhang, Jie Tan, Wenhao Yu

Figure 1 for IndoorSim-to-OutdoorReal: Learning to Navigate Outdoors without any Outdoor Experience
Figure 2 for IndoorSim-to-OutdoorReal: Learning to Navigate Outdoors without any Outdoor Experience
Figure 3 for IndoorSim-to-OutdoorReal: Learning to Navigate Outdoors without any Outdoor Experience
Figure 4 for IndoorSim-to-OutdoorReal: Learning to Navigate Outdoors without any Outdoor Experience

We present IndoorSim-to-OutdoorReal (I2O), an end-to-end learned visual navigation approach, trained solely in simulated short-range indoor environments, and demonstrates zero-shot sim-to-real transfer to the outdoors for long-range navigation on the Spot robot. Our method uses zero real-world experience (indoor or outdoor), and requires the simulator to model no predominantly-outdoor phenomenon (sloped grounds, sidewalks, etc). The key to I2O transfer is in providing the robot with additional context of the environment (i.e., a satellite map, a rough sketch of a map by a human, etc.) to guide the robot's navigation in the real-world. The provided context-maps do not need to be accurate or complete -- real-world obstacles (e.g., trees, bushes, pedestrians, etc.) are not drawn on the map, and openings are not aligned with where they are in the real-world. Crucially, these inaccurate context-maps provide a hint to the robot about a route to take to the goal. We find that our method that leverages Context-Maps is able to successfully navigate hundreds of meters in novel environments, avoiding novel obstacles on its path, to a distant goal without a single collision or human intervention. In comparison, policies without the additional context fail completely. Lastly, we test the robustness of the Context-Map policy by adding varying degrees of noise to the map in simulation. We find that the Context-Map policy is surprisingly robust to noise in the provided context-map. In the presence of significantly inaccurate maps (corrupted with 50% noise, or entirely blank maps), the policy gracefully regresses to the behavior of a policy with no context. Videos are available at https://www.joannetruong.com/projects/i2o.html

Viaarxiv icon

Proactive Robot Assistance via Spatio-Temporal Object Modeling

Nov 28, 2022
Maithili Patel, Sonia Chernova

Figure 1 for Proactive Robot Assistance via Spatio-Temporal Object Modeling
Figure 2 for Proactive Robot Assistance via Spatio-Temporal Object Modeling
Figure 3 for Proactive Robot Assistance via Spatio-Temporal Object Modeling
Figure 4 for Proactive Robot Assistance via Spatio-Temporal Object Modeling

Proactive robot assistance enables a robot to anticipate and provide for a user's needs without being explicitly asked. We formulate proactive assistance as the problem of the robot anticipating temporal patterns of object movements associated with everyday user routines, and proactively assisting the user by placing objects to adapt the environment to their needs. We introduce a generative graph neural network to learn a unified spatio-temporal predictive model of object dynamics from temporal sequences of object arrangements. We additionally contribute the Household Object Movements from Everyday Routines (HOMER) dataset, which tracks household objects associated with human activities of daily living across 50+ days for five simulated households. Our model outperforms the leading baseline in predicting object movement, correctly predicting locations for 11.1% more objects and wrongly predicting locations for 11.5% fewer objects used by the human user.

Viaarxiv icon

StructDiffusion: Object-Centric Diffusion for Semantic Rearrangement of Novel Objects

Nov 08, 2022
Weiyu Liu, Tucker Hermans, Sonia Chernova, Chris Paxton

Figure 1 for StructDiffusion: Object-Centric Diffusion for Semantic Rearrangement of Novel Objects
Figure 2 for StructDiffusion: Object-Centric Diffusion for Semantic Rearrangement of Novel Objects
Figure 3 for StructDiffusion: Object-Centric Diffusion for Semantic Rearrangement of Novel Objects
Figure 4 for StructDiffusion: Object-Centric Diffusion for Semantic Rearrangement of Novel Objects

Robots operating in human environments must be able to rearrange objects into semantically-meaningful configurations, even if these objects are previously unseen. In this work, we focus on the problem of building physically-valid structures without step-by-step instructions. We propose StructDiffusion, which combines a diffusion model and an object-centric transformer to construct structures out of a single RGB-D image based on high-level language goals, such as "set the table." Our method shows how diffusion models can be used for complex multi-step 3D planning tasks. StructDiffusion improves success rate on assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model, while allowing us to use one multi-task model to produce a wider range of different structures. We show experiments on held-out objects in both simulation and on real-world rearrangement tasks. For videos and additional results, check out our website: http://weiyuliu.com/StructDiffusion/.

Viaarxiv icon

D-ITAGS: A Dynamic Interleaved Approach to Resilient Task Allocation, Scheduling, and Motion Planning

Sep 27, 2022
Glen Neville, Sonia Chernova, Harish Ravichandar

Figure 1 for D-ITAGS: A Dynamic Interleaved Approach to Resilient Task Allocation, Scheduling, and Motion Planning
Figure 2 for D-ITAGS: A Dynamic Interleaved Approach to Resilient Task Allocation, Scheduling, and Motion Planning
Figure 3 for D-ITAGS: A Dynamic Interleaved Approach to Resilient Task Allocation, Scheduling, and Motion Planning
Figure 4 for D-ITAGS: A Dynamic Interleaved Approach to Resilient Task Allocation, Scheduling, and Motion Planning

Complex, multi-objective missions require the coordination of heterogeneous robots at multiple inter-connected levels, such as coalition formation, scheduling, and motion planning. This challenge is exacerbated by dynamic changes, such as sensor and actuator failures, communication loss, and unexpected delays. We introduce Dynamic Iterative Task Allocation Graph Search (D-ITAGS) to \textit{simultaneously} address coalition formation, scheduling, and motion planning in dynamic settings involving heterogeneous teams. D-ITAGS achieves resilience via two key characteristics: i) interleaved execution, and ii) targeted repair. \textit{Interleaved execution} enables an effective search for solutions at each layer while avoiding incompatibility with other layers. \textit{Targeted repair} identifies and repairs parts of the existing solution impacted by a given disruption, while conserving the rest. In addition to algorithmic contributions, we provide theoretical insights into the inherent trade-off between time and resource optimality in these settings and derive meaningful bounds on schedule suboptimality. Our experiments reveal that i) D-ITAGS is significantly faster than recomputation from scratch in dynamic settings, with little to no loss in solution quality, and ii) the theoretical suboptimality bounds consistently hold in practice.

Viaarxiv icon

Rethinking Sim2Real: Lower Fidelity Simulation Leads to Higher Sim2Real Transfer in Navigation

Jul 21, 2022
Joanne Truong, Max Rudolph, Naoki Yokoyama, Sonia Chernova, Dhruv Batra, Akshara Rai

Figure 1 for Rethinking Sim2Real: Lower Fidelity Simulation Leads to Higher Sim2Real Transfer in Navigation
Figure 2 for Rethinking Sim2Real: Lower Fidelity Simulation Leads to Higher Sim2Real Transfer in Navigation
Figure 3 for Rethinking Sim2Real: Lower Fidelity Simulation Leads to Higher Sim2Real Transfer in Navigation
Figure 4 for Rethinking Sim2Real: Lower Fidelity Simulation Leads to Higher Sim2Real Transfer in Navigation

If we want to train robots in simulation before deploying them in reality, it seems natural and almost self-evident to presume that reducing the sim2real gap involves creating simulators of increasing fidelity (since reality is what it is). We challenge this assumption and present a contrary hypothesis -- sim2real transfer of robots may be improved with lower (not higher) fidelity simulation. We conduct a systematic large-scale evaluation of this hypothesis on the problem of visual navigation -- in the real world, and on 2 different simulators (Habitat and iGibson) using 3 different robots (A1, AlienGo, Spot). Our results show that, contrary to expectation, adding fidelity does not help with learning; performance is poor due to slow simulation speed (preventing large-scale learning) and overfitting to inaccuracies in simulation physics. Instead, building simple models of the robot motion using real-world data can improve learning and generalization.

Viaarxiv icon

Explainable Knowledge Graph Embedding: Inference Reconciliation for Knowledge Inferences Supporting Robot Actions

May 04, 2022
Angel Daruna, Devleena Das, Sonia Chernova

Figure 1 for Explainable Knowledge Graph Embedding: Inference Reconciliation for Knowledge Inferences Supporting Robot Actions
Figure 2 for Explainable Knowledge Graph Embedding: Inference Reconciliation for Knowledge Inferences Supporting Robot Actions
Figure 3 for Explainable Knowledge Graph Embedding: Inference Reconciliation for Knowledge Inferences Supporting Robot Actions
Figure 4 for Explainable Knowledge Graph Embedding: Inference Reconciliation for Knowledge Inferences Supporting Robot Actions

Learned knowledge graph representations supporting robots contain a wealth of domain knowledge that drives robot behavior. However, there does not exist an inference reconciliation framework that expresses how a knowledge graph representation affects a robot's sequential decision making. We use a pedagogical approach to explain the inferences of a learned, black-box knowledge graph representation, a knowledge graph embedding. Our interpretable model, uses a decision tree classifier to locally approximate the predictions of the black-box model, and provides natural language explanations interpretable by non-experts. Results from our algorithmic evaluation affirm our model design choices, and the results of our user studies with non-experts support the need for the proposed inference reconciliation framework. Critically, results from our simulated robot evaluation indicate that our explanations enable non-experts to correct erratic robot behaviors due to nonsensical beliefs within the black-box.

* Submitted to IROS 2022 
Viaarxiv icon