Abstract:Object rearrangement is a key task for household robots requiring personalization without explicit instructions, meaningful object placement in environments occupied with objects, and generalization to unseen objects and new environments. To facilitate research addressing these challenges, we introduce PARSEC, an object rearrangement benchmark for learning user organizational preferences from observed scene context to place objects in a partially arranged environment. PARSEC is built upon a novel dataset of 110K rearrangement examples crowdsourced from 72 users, featuring 93 object categories and 15 environments. We also propose ContextSortLM, an LLM-based rearrangement model that places objects in partially arranged environments by adapting to user preferences from prior and current scene context while accounting for multiple valid placements. We evaluate ContextSortLM and existing personalized rearrangement approaches on the PARSEC benchmark and complement these findings with a crowdsourced evaluation of 108 online raters ranking model predictions based on alignment with user preferences. Our results indicate that personalized rearrangement models leveraging multiple scene context sources perform better than models relying on a single context source. Moreover, ContextSortLM outperforms other models in placing objects to replicate the target user's arrangement and ranks among the top two in all three environment categories, as rated by online evaluators. Importantly, our evaluation highlights challenges associated with modeling environment semantics across different environment categories and provides recommendations for future work.
Abstract:Object rearrangement is the problem of enabling a robot to identify the correct object placement in a complex environment. Prior work on object rearrangement has explored a diverse set of techniques for following user instructions to achieve some desired goal state. Logical predicates, images of the goal scene, and natural language descriptions have all been used to instruct a robot in how to arrange objects. In this work, we argue that burdening the user with specifying goal scenes is not necessary in partially-arranged environments, such as common household settings. Instead, we show that contextual cues from partially arranged scenes (i.e., the placement of some number of pre-arranged objects in the environment) provide sufficient context to enable robots to perform object rearrangement \textit{without any explicit user goal specification}. We introduce ConSOR, a Context-aware Semantic Object Rearrangement framework that utilizes contextual cues from a partially arranged initial state of the environment to complete the arrangement of new objects, without explicit goal specification from the user. We demonstrate that ConSOR strongly outperforms two baselines in generalizing to novel object arrangements and unseen object categories. The code and data can be found at https://github.com/kartikvrama/consor.