Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daniela Rus

Learning autonomous driving from aerial imagery

Oct 18, 2024

Varun Murali, Guy Rosman, Sertac Karaman, Daniela Rus

Figure 1 for Learning autonomous driving from aerial imagery

Figure 2 for Learning autonomous driving from aerial imagery

Figure 3 for Learning autonomous driving from aerial imagery

Figure 4 for Learning autonomous driving from aerial imagery

Abstract:In this work, we consider the problem of learning end to end perception to control for ground vehicles solely from aerial imagery. Photogrammetric simulators allow the synthesis of novel views through the transformation of pre-generated assets into novel views.However, they have a large setup cost, require careful collection of data and often human effort to create usable simulators. We use a Neural Radiance Field (NeRF) as an intermediate representation to synthesize novel views from the point of view of a ground vehicle. These novel viewpoints can then be used for several downstream autonomous navigation applications. In this work, we demonstrate the utility of novel view synthesis though the application of training a policy for end to end learning from images and depth data. In a traditional real to sim to real framework, the collected data would be transformed into a visual simulator which could then be used to generate novel views. In contrast, using a NeRF allows a compact representation and the ability to optimize over the parameters of the visual simulator as more data is gathered in the environment. We demonstrate the efficacy of our method in a custom built mini-city environment through the deployment of imitation policies on robotic cars. We additionally consider the task of place localization and demonstrate that our method is able to relocalize the car in the real world.

* Presented at IROS 2024

Via

Access Paper or Ask Questions

Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models

Oct 16, 2024

Makram Chahine, Alex Quach, Alaa Maalouf, Tsun-Hsuan Wang, Daniela Rus

Figure 1 for Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models

Figure 2 for Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models

Figure 3 for Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models

Figure 4 for Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models

Abstract:End-to-end learning directly maps sensory inputs to actions, creating highly integrated and efficient policies for complex robotics tasks. However, such models are tricky to efficiently train and often struggle to generalize beyond their training scenarios, limiting adaptability to new environments, tasks, and concepts. In this work, we investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies under unseen text instructions and visual distribution shifts. To this end, we design datasets with various levels of data representation richness, refine feature extraction protocols by leveraging multi-modal foundation model encoders, and assess the suitability of different policy network heads. Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors, generating spatially aware embeddings that integrate semantic and visual information. These rich features form the basis for training highly robust downstream policies capable of generalizing across platforms, environments, and text-specified tasks. We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning on a small simulated dataset successfully generalize to real-world scenes, handling diverse novel goals and command formulations.

Via

Access Paper or Ask Questions

Faster Algorithms for Growing Collision-Free Convex Polytopes in Robot Configuration Space

Oct 16, 2024

Peter Werner, Thomas Cohn, Rebecca H. Jiang, Tim Seyde, Max Simchowitz, Russ Tedrake, Daniela Rus

Abstract:We propose two novel algorithms for constructing convex collision-free polytopes in robot configuration space. Finding these polytopes enables the application of stronger motion-planning frameworks such as trajectory optimization with Graphs of Convex Sets [1] and is currently a major roadblock in the adoption of these approaches. In this paper, we build upon IRIS-NP (Iterative Regional Inflation by Semidefinite & Nonlinear Programming) [2] to significantly improve tunability, runtimes, and scaling to complex environments. IRIS-NP uses nonlinear programming paired with uniform random initialization to find configurations on the boundary of the free configuration space. Our key insight is that finding near-by configuration-space obstacles using sampling is inexpensive and greatly accelerates region generation. We propose two algorithms using such samples to either employ nonlinear programming more efficiently (IRIS-NP2 ) or circumvent it altogether using a massively-parallel zero-order optimization strategy (IRIS-ZO). We also propose a termination condition that controls the probability of exceeding a user-specified permissible fraction-in-collision, eliminating a significant source of tuning difficulty in IRIS-NP. We compare performance across eight robot environments, showing that IRIS-ZO achieves an order-of-magnitude speed advantage over IRIS-NP. IRISNP2, also significantly faster than IRIS-NP, builds larger polytopes using fewer hyperplanes, enabling faster downstream computation. Website: https://sites.google.com/view/fastiris

* 16 pages, 6 figures, accepted for publication in the proceedings of the International Symposium for Robotics Research 2024

Via

Access Paper or Ask Questions

Learning Object Properties Using Robot Proprioception via Differentiable Robot-Object Interaction

Oct 04, 2024

Peter Yichen Chen, Chao Liu, Pingchuan Ma, John Eastman, Daniela Rus, Dylan Randle, Yuri Ivanov, Wojciech Matusik

Abstract:Differentiable simulation has become a powerful tool for system identification. While prior work has focused on identifying robot properties using robot-specific data or object properties using object-specific data, our approach calibrates object properties by using information from the robot, without relying on data from the object itself. Specifically, we utilize robot joint encoder information, which is commonly available in standard robotic systems. Our key observation is that by analyzing the robot's reactions to manipulated objects, we can infer properties of those objects, such as inertia and softness. Leveraging this insight, we develop differentiable simulations of robot-object interactions to inversely identify the properties of the manipulated objects. Our approach relies solely on proprioception -- the robot's internal sensing capabilities -- and does not require external measurement tools or vision-based tracking systems. This general method is applicable to any articulated robot and requires only joint position information. We demonstrate the effectiveness of our method on a low-cost robotic platform, achieving accurate mass and elastic modulus estimations of manipulated objects with just a few seconds of computation on a laptop.

Via

Access Paper or Ask Questions

Oscillatory State-Space Models

Oct 04, 2024

T. Konstantin Rusch, Daniela Rus

Abstract:We propose Linear Oscillatory State-Space models (LinOSS) for efficiently learning on long sequences. Inspired by cortical dynamics of biological neural networks, we base our proposed LinOSS model on a system of forced harmonic oscillators. A stable discretization, integrated over time using fast associative parallel scans, yields the proposed state-space model. We prove that LinOSS produces stable dynamics only requiring nonnegative diagonal state matrix. This is in stark contrast to many previous state-space models relying heavily on restrictive parameterizations. Moreover, we rigorously show that LinOSS is universal, i.e., it can approximate any continuous and causal operator mapping between time-varying functions, to desired accuracy. In addition, we show that an implicit-explicit discretization of LinOSS perfectly conserves the symmetry of time reversibility of the underlying dynamics. Together, these properties enable efficient modeling of long-range interactions, while ensuring stable and accurate long-horizon forecasting. Finally, our empirical results, spanning a wide range of time-series tasks from mid-range to very long-range classification and regression, as well as long-horizon forecasting, demonstrate that our proposed LinOSS model consistently outperforms state-of-the-art sequence models. Notably, LinOSS outperforms Mamba by nearly 2x and LRU by 2.5x on a sequence modeling task with sequences of length 50k.

Via

Access Paper or Ask Questions

Improving Efficiency of Sampling-based Motion Planning via Message-Passing Monte Carlo

Oct 04, 2024

Makram Chahine, T. Konstantin Rusch, Zach J. Patterson, Daniela Rus

Abstract:Sampling-based motion planning methods, while effective in high-dimensional spaces, often suffer from inefficiencies due to irregular sampling distributions, leading to suboptimal exploration of the configuration space. In this paper, we propose an approach that enhances the efficiency of these methods by utilizing low-discrepancy distributions generated through Message-Passing Monte Carlo (MPMC). MPMC leverages Graph Neural Networks (GNNs) to generate point sets that uniformly cover the space, with uniformity assessed using the the $\cL_p$-discrepancy measure, which quantifies the irregularity of sample distributions. By improving the uniformity of the point sets, our approach significantly reduces computational overhead and the number of samples required for solving motion planning problems. Experimental results demonstrate that our method outperforms traditional sampling techniques in terms of planning efficiency.

Via

Access Paper or Ask Questions

Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference

Sep 16, 2024

Huy-Dung Nguyen, Anass Bairouk, Mirjana Maras, Wei Xiao, Tsun-Hsuan Wang, Patrick Chareyre, Ramin Hasani, Marc Blanchon, Daniela Rus

Figure 1 for Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference

Figure 2 for Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference

Figure 3 for Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference

Figure 4 for Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference

Abstract:Autonomous driving holds great potential to transform road safety and traffic efficiency by minimizing human error and reducing congestion. A key challenge in realizing this potential is the accurate estimation of steering angles, which is essential for effective vehicle navigation and control. Recent breakthroughs in deep learning have made it possible to estimate steering angles directly from raw camera inputs. However, the limited available navigation data can hinder optimal feature learning, impacting the system's performance in complex driving scenarios. In this paper, we propose a shared encoder trained on multiple computer vision tasks critical for urban navigation, such as depth, pose, and 3D scene flow estimation, as well as semantic, instance, panoptic, and motion segmentation. By incorporating diverse visual information used by humans during navigation, this unified encoder might enhance steering angle estimation. To achieve effective multi-task learning within a single encoder, we introduce a multi-scale feature network for pose estimation to improve depth learning. Additionally, we employ knowledge distillation from a multi-backbone model pretrained on these navigation tasks to stabilize training and boost performance. Our findings demonstrate that a shared backbone trained on diverse visual tasks is capable of providing overall perception capabilities. While our performance in steering angle estimation is comparable to existing methods, the integration of human-like perception through multi-task learning holds significant potential for advancing autonomous driving systems. More details and the pretrained model are available at https://hi-computervision.github.io/uni-encoder/.

Via

Access Paper or Ask Questions

Design and Control of Modular Soft-Rigid Hybrid Manipulators with Self-Contact

Aug 17, 2024

Zach J. Patterson, Emily Sologuren, Cosimo Della Santina, Daniela Rus

Figure 1 for Design and Control of Modular Soft-Rigid Hybrid Manipulators with Self-Contact

Figure 2 for Design and Control of Modular Soft-Rigid Hybrid Manipulators with Self-Contact

Figure 3 for Design and Control of Modular Soft-Rigid Hybrid Manipulators with Self-Contact

Figure 4 for Design and Control of Modular Soft-Rigid Hybrid Manipulators with Self-Contact

Abstract:Soft robotics focuses on designing robots with highly deformable materials, allowing them to adapt and operate safely and reliably in unstructured and variable environments. While soft robots offer increased compliance over rigid body robots, their payloads are limited, and they consume significant energy when operating against gravity in terrestrial environments. To address the carrying capacity limitation, we introduce a novel class of soft-rigid hybrid robot manipulators (SRH) that incorporates both soft continuum modules and rigid joints in a serial configuration. The SRH manipulators can seamlessly transition between being compliant and delicate to rigid and strong, achieving this through dynamic shape modulation and employing self-contact among rigid components to effectively form solid structures. We discuss the design and fabrication of SRH robots, and present a class of novel control algorithms for SRH systems. We propose a configuration space PD+ shape controller and a Cartesian impedance controller, both of which are provably stable, endowing the soft robot with the necessary low-level capabilities. We validate the controllers on SRH hardware and demonstrate the robot performing several tasks. Our results highlight the potential for the soft-rigid hybrid paradigm to produce robots that are both physically safe and effective at task performance.

* 23 pages, 7 figures

Via

Access Paper or Ask Questions

Unifying 3D Representation and Control of Diverse Robots with a Single Camera

Jul 11, 2024

Sizhe Lester Li, Annan Zhang, Boyuan Chen, Hanna Matusik, Chao Liu, Daniela Rus, Vincent Sitzmann

Figure 1 for Unifying 3D Representation and Control of Diverse Robots with a Single Camera

Figure 2 for Unifying 3D Representation and Control of Diverse Robots with a Single Camera

Figure 3 for Unifying 3D Representation and Control of Diverse Robots with a Single Camera

Figure 4 for Unifying 3D Representation and Control of Diverse Robots with a Single Camera

Abstract:Mirroring the complex structures and diverse functions of natural organisms is a long-standing challenge in robotics. Modern fabrication techniques have dramatically expanded feasible hardware, yet deploying these systems requires control software to translate desired motions into actuator commands. While conventional robots can easily be modeled as rigid links connected via joints, it remains an open challenge to model and control bio-inspired robots that are often multi-material or soft, lack sensing capabilities, and may change their material properties with use. Here, we introduce Neural Jacobian Fields, an architecture that autonomously learns to model and control robots from vision alone. Our approach makes no assumptions about the robot's materials, actuation, or sensing, requires only a single camera for control, and learns to control the robot without expert intervention by observing the execution of random commands. We demonstrate our method on a diverse set of robot manipulators, varying in actuation, materials, fabrication, and cost. Our approach achieves accurate closed-loop control and recovers the causal dynamic structure of each robot. By enabling robot control with a generic camera as the only sensor, we anticipate our work will dramatically broaden the design space of robotic systems and serve as a starting point for lowering the barrier to robotic automation.

* Project Page: https://sizhe-li.github.io/publication/neural_jacobian_field

Via

Access Paper or Ask Questions

Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks

Jun 21, 2024

Alex Quach, Makram Chahine, Alexander Amini, Ramin Hasani, Daniela Rus

Figure 1 for Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks

Figure 2 for Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks

Figure 3 for Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks

Figure 4 for Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks

Abstract:Simulators are powerful tools for autonomous robot learning as they offer scalable data generation, flexible design, and optimization of trajectories. However, transferring behavior learned from simulation data into the real world proves to be difficult, usually mitigated with compute-heavy domain randomization methods or further model fine-tuning. We present a method to improve generalization and robustness to distribution shifts in sim-to-real visual quadrotor navigation tasks. To this end, we first build a simulator by integrating Gaussian Splatting with quadrotor flight dynamics, and then, train robust navigation policies using Liquid neural networks. In this way, we obtain a full-stack imitation learning protocol that combines advances in 3D Gaussian splatting radiance field rendering, crafty programming of expert demonstration training data, and the task understanding capabilities of Liquid networks. Through a series of quantitative flight tests, we demonstrate the robust transfer of navigation skills learned in a single simulation scene directly to the real world. We further show the ability to maintain performance beyond the training environment under drastic distribution and physical environment changes. Our learned Liquid policies, trained on single target manoeuvres curated from a photorealistic simulated indoor flight only, generalize to multi-step hikes onboard a real hardware platform outdoors.

Via

Access Paper or Ask Questions