We focus on the problem of rearranging a set of objects with a team of car-like robot pushers built using off-the-shelf components. Maintaining control of pushed objects while avoiding collisions in a tight space demands highly coordinated motion that is challenging to execute on constrained hardware. Centralized replanning approaches become intractable even for small-sized problems whereas decentralized approaches often get stuck in deadlocks. Our key insight is that by carefully assigning pushing tasks to robots, we could reduce the complexity of the rearrangement task, enabling robust performance via scalable decentralized control. Based on this insight, we built PuSHR, a system that optimally assigns pushing tasks and trajectories to robots offline, and performs trajectory tracking via decentralized control online. Through an ablation study in simulation, we demonstrate that PuSHR dominates baselines ranging from purely decentralized to fully decentralized in terms of success rate and time efficiency across challenging tasks with up to 4 robots. Hardware experiments demonstrate the transfer of our system to the real world and highlight its robustness to model inaccuracies. Our code can be found at https://github.com/prl-mushr/pushr, and videos from our experiments at https://youtu.be/DIWmZerF_O8.
We focus on robot navigation in crowded environments. To navigate safely and efficiently within crowds, robots need models for crowd motion prediction. Building such models is hard due to the high dimensionality of multiagent domains and the challenge of collecting or simulating interaction-rich crowd-robot demonstrations. While there has been important progress on models for offline pedestrian motion forecasting, transferring their performance on real robots is nontrivial due to close interaction settings and novelty effects on users. In this paper, we investigate the utility of a recent state-of-the-art motion prediction model (S-GAN) for crowd navigation tasks. We incorporate this model into a model predictive controller (MPC) and deploy it on a self-balancing robot which we subject to a diverse range of crowd behaviors in the lab. We demonstrate that while S-GAN motion prediction accuracy transfers to the real world, its value is not reflected on navigation performance, measured with respect to safety and efficiency; in fact, the MPC performs indistinguishably even when using a simple constant-velocity prediction model, suggesting that substantial model improvements might be needed to yield significant gains for crowd navigation tasks. Footage from our experiments can be found at https://youtu.be/mzFiXg8KsZ0.
During in-hand manipulation, robots must be able to continuously estimate the pose of the object in order to generate appropriate control actions. The performance of algorithms for pose estimation hinges on the robot's sensors being able to detect discriminative geometric object features, but previous sensing modalities are unable to make such measurements robustly. The robot's fingers can occlude the view of environment- or robot-mounted image sensors, and tactile sensors can only measure at the local areas of contact. Motivated by fingertip-embedded proximity sensors' robustness to occlusion and ability to measure beyond the local areas of contact, we present the first evaluation of proximity sensor based pose estimation for in-hand manipulation. We develop a novel two-fingered hand with fingertip-embedded optical time-of-flight proximity sensors as a testbed for pose estimation during planar in-hand manipulation. Here, the in-hand manipulation task consists of the robot moving a cylindrical object from one end of its workspace to the other. We demonstrate, with statistical significance, that proximity-sensor based pose estimation via particle filtering during in-hand manipulation: a) exhibits 50% lower average pose error than a tactile-sensor based baseline; b) empowers a model predictive controller to achieve 30% lower final positioning error compared to when using tactile-sensor based pose estimates.
Benchmarks for robot manipulation are crucial to measuring progress in the field, yet there are few benchmarks that demonstrate critical manipulation skills, possess standardized metrics, and can be attempted by a wide array of robot platforms. To address a lack of such benchmarks, we propose Rubik's cube manipulation as a benchmark to measure simultaneous performance of precise manipulation and sequential manipulation. The sub-structure of the Rubik's cube demands precise positioning of the robot's end effectors, while its highly reconfigurable nature enables tasks that require the robot to manage pose uncertainty throughout long sequences of actions. We present a protocol for quantitatively measuring both the accuracy and speed of Rubik's cube manipulation. This protocol can be attempted by any general-purpose manipulator, and only requires a standard 3x3 Rubik's cube and a flat surface upon which the Rubik's cube initially rests (e.g. a table). We demonstrate this protocol for two distinct baseline approaches on a PR2 robot. The first baseline provides a fundamental approach for pose-based Rubik's cube manipulation. The second baseline demonstrates the benchmark's ability to quantify improved performance by the system, particularly that resulting from the integration of pre-touch sensing. To demonstrate the benchmark's applicability to other robot platforms and algorithmic approaches, we present the functional blocks required to enable the HERB robot to manipulate the Rubik's cube via push-grasping.
Efficient and reliable generation of global path plans are necessary for safe execution and deployment of autonomous systems. In order to generate planning graphs which adequately resolve the topology of a given environment, many sampling-based motion planners resort to coarse, heuristically-driven strategies which often fail to generalize to new and varied surroundings. Further, many of these approaches are not designed to contend with partial-observability. We posit that such uncertainty in environment geometry can, in fact, help \textit{drive} the sampling process in generating feasible, and probabilistically-safe planning graphs. We propose a method for Probabilistic Roadmaps which relies on particle-based Variational Inference to efficiently cover the posterior distribution over feasible regions in configuration space. Our approach, Stein Variational Probabilistic Roadmap (SV-PRM), results in sample-efficient generation of planning-graphs and large improvements over traditional sampling approaches. We demonstrate the approach on a variety of challenging planning problems, including real-world probabilistic occupancy maps and high-dof manipulation problems common in robotics.
Dexterous manipulation remains an open problem in robotics. To coordinate efforts of the research community towards tackling this problem, we propose a shared benchmark. We designed and built robotic platforms that are hosted at the MPI-IS and can be accessed remotely. Each platform consists of three robotic fingers that are capable of dexterous object manipulation. Users are able to control the platforms remotely by submitting code that is executed automatically, akin to a computational cluster. Using this setup, i) we host robotics competitions, where teams from anywhere in the world access our platforms to tackle challenging tasks, ii) we publish the datasets collected during these competitions (consisting of hundreds of robot hours), and iii) we give researchers access to these platforms for their own projects.
We focus on the problem of analyzing multiagent interactions in traffic domains. Understanding the space of behavior of real-world traffic may offer significant advantages for algorithmic design, data-driven methodologies, and benchmarking. However, the high dimensionality of the space and the stochasticity of human behavior may hinder the identification of important interaction patterns. Our key insight is that traffic environments feature significant geometric and temporal structure, leading to highly organized collective behaviors, often drawn from a small set of dominant modes. In this work, we propose a representation based on the formalism of topological braids that can summarize arbitrarily complex multiagent behavior into a compact object of dual geometric and symbolic nature, capturing critical events of interaction. This representation allows us to formally enumerate the space of outcomes in a traffic scene and characterize their complexity. We illustrate the value of the proposed representation in summarizing critical aspects of real-world traffic behavior through a case study on recent driving datasets. We show that despite the density of real-world traffic, observed behavior tends to follow highly organized patterns of low interaction. Our framework may be a valuable tool for evaluating the richness of driving datasets, but also for synthetically designing balanced training datasets or benchmarks.
We focus on the problem of planning safe and efficient motion for a ballbot (i.e., a dynamically balancing mobile robot), navigating in a crowded environment. The ballbot's design gives rise to human-readable motion which is valuable for crowd navigation. However, dynamic stabilization introduces kinematic constraints that severely limit the ability of the robot to execute aggressive maneuvers, complicating collision avoidance and respect for human personal space. Past works reduce the need for aggressive maneuvering by motivating anticipatory collision avoidance through the use of human motion prediction models. However, multiagent behavior prediction is hard due to the combinatorial structure of the space. Our key insight is that we can accomplish anticipatory multiagent collision avoidance without high-fidelity prediction models if we capture fundamental features of multiagent dynamics. To this end, we build a model predictive control architecture that employs a constant-velocity model of human motion prediction but monitors and proactively adapts to the unfolding homotopy class of crowd-robot dynamics by taking actions that maximize the pairwise winding numbers between the robot and each human agent. This results in robot motion that accomplishes statistically significantly higher clearances from the crowd compared to state-of-the-art baselines while maintaining similar levels of efficiency, across a variety of challenging physical scenarios and crowd simulators.
For robots to operate in a three dimensional world and interact with humans, learning spatial relationships among objects in the surrounding is necessary. Reasoning about the state of the world requires inputs from many different sensory modalities including vision ($V$) and haptics ($H$). We examine the problem of desk organization: learning how humans spatially position different objects on a planar surface according to organizational ''preference''. We model this problem by examining how humans position objects given multiple features received from vision and haptic modalities. However, organizational habits vary greatly between people both in structure and adherence. To deal with user organizational preferences, we add an additional modality, ''utility'' ($U$), which informs on a particular human's perceived usefulness of a given object. Models were trained as generalized (over many different people) or tailored (per person). We use two types of models: random forests, which focus on precise multi-task classification, and Markov logic networks, which provide an easily interpretable insight into organizational habits. The models were applied to both synthetic data, which proved to be learnable when using fixed organizational constraints, and human-study data, on which the random forest achieved over 90% accuracy. Over all combinations of $\{H, U, V\}$ modalities, $UV$ and $HUV$ were the most informative for organization. In a follow-up study, we gauged participants preference of desk organizations by a generalized random forest organization vs. by a random model. On average, participants rated the random forest models as 4.15 on a 5-point Likert scale compared to 1.84 for the random model
Dexterous manipulation is a challenging and important problem in robotics. While data-driven methods are a promising approach, current benchmarks require simulation or extensive engineering support due to the sample inefficiency of popular methods. We present benchmarks for the TriFinger system, an open-source robotic platform for dexterous manipulation and the focus of the 2020 Real Robot Challenge. The benchmarked methods, which were successful in the challenge, can be generally described as structured policies, as they combine elements of classical robotics and modern policy optimization. This inclusion of inductive biases facilitates sample efficiency, interpretability, reliability and high performance. The key aspects of this benchmarking is validation of the baselines across both simulation and the real system, thorough ablation study over the core features of each solution, and a retrospective analysis of the challenge as a manipulation benchmark. The code and demo videos for this work can be found on our website (https://sites.google.com/view/benchmark-rrc).