Modeling the kinematics and dynamics of robotics systems with suspended loads using dual quaternions has not been explored so far. This paper introduces a new innovative control strategy using dual quaternions for UAVs with cable-suspended loads, focusing on the sling load lifting and tracking problems. By utilizing the mathematical efficiency and compactness of dual quaternions, a unified representation of the UAV and its suspended load's dynamics and kinematics is achieved, facilitating the realization of load lifting and trajectory tracking. The simulation results have tested the proposed strategy's accuracy, efficiency, and robustness. This study makes a substantial contribution to present this novel control strategy that harnesses the benefits of dual quaternions for cargo UAVs. Our work also holds promise for inspiring future innovations in under-actuated systems control using dual quaternions.
Visual relationship detection aims to identify objects and their relationships in images. Prior methods approach this task by adding separate relationship modules or decoders to existing object detection architectures. This separation increases complexity and hinders end-to-end training, which limits performance. We propose a simple and highly efficient decoder-free architecture for open-vocabulary visual relationship detection. Our model consists of a Transformer-based image encoder that represents objects as tokens and models their relationships implicitly. To extract relationship information, we introduce an attention mechanism that selects object pairs likely to form a relationship. We provide a single-stage recipe to train this model on a mixture of object and relationship detection data. Our approach achieves state-of-the-art relationship detection performance on Visual Genome and on the large-vocabulary GQA benchmark at real-time inference speeds. We provide analyses of zero-shot performance, ablations, and real-world qualitative examples.
This work focuses on the agile transportation of liquids with robotic manipulators. In contrast to existing methods that are either computationally heavy, system/container specific or dependant on a singularity-prone pendulum model, we present a real-time slosh-free tracking technique. This method solely requires the reference trajectory and the robot's kinematic constraints to output kinematically feasible joint space commands. The crucial element underlying this approach consists on mimicking the end-effector's motion through a virtual quadrotor, which is inherently slosh-free and differentially flat, thereby allowing us to calculate a slosh-free reference orientation. Through the utilization of a cascaded proportional-derivative (PD) controller, this slosh-free reference is transformed into task space acceleration commands, which, following the resolution of a Quadratic Program (QP) based on Resolved Acceleration Control (RAC), are translated into a feasible joint configuration. The validity of the proposed approach is demonstrated by simulated and real-world experiments on a 7 DoF Franka Emika Panda robot. Code: https://github.com/jonarriza96/gsft Video: https://youtu.be/4kitqYVS9n8
While real-world problems are often challenging to analyze analytically, deep learning excels in modeling complex processes from data. Existing optimization frameworks like CasADi facilitate seamless usage of solvers but face challenges when integrating learned process models into numerical optimizations. To address this gap, we present the Learning for CasADi (L4CasADi) framework, enabling the seamless integration of PyTorch-learned models with CasADi for efficient and potentially hardware-accelerated numerical optimization. The applicability of L4CasADi is demonstrated with two tutorial examples: First, we optimize a fish's trajectory in a turbulent river for energy efficiency where the turbulent flow is represented by a PyTorch model. Second, we demonstrate how an implicit Neural Radiance Field environment representation can be easily leveraged for optimal control with L4CasADi. L4CasADi, along with examples and documentation, is available under MIT license at https://github.com/Tim-Salzmann/l4casadi
Anticipating the motion of all humans in dynamic environments such as homes and offices is critical to enable safe and effective robot navigation. Such spaces remain challenging as humans do not follow strict rules of motion and there are often multiple occluded entry points such as corners and doors that create opportunities for sudden encounters. In this work, we present a Transformer based architecture to predict human future trajectories in human-centric environments from input features including human positions, head orientations, and 3D skeletal keypoints from onboard in-the-wild sensory information. The resulting model captures the inherent uncertainty for future human trajectory prediction and achieves state-of-the-art performance on common prediction benchmarks and a human tracking dataset captured from a mobile robot adapted for the prediction task. Furthermore, we identify new agents with limited historical data as a major contributor to error and demonstrate the complementary nature of 3D skeletal poses in reducing prediction error in such challenging scenarios.
This work focuses on pose-following, a variant of path-following in which the goal is to steer the system's position and attitude along a path with a moving frame attached to it. Full body motion control, while accounting for the additional freedom to self-regulate the progress along the path, is an appealing trade-off. Towards this end, we extend the well-established dual quaternion-based pose-tracking method into a pose-following control law. Specifically, we derive the equations of motion for the full pose error between the geometric reference and the rigid body in the form of a dual quaternion and dual twist. Subsequently, we formulate an almost globally asymptotically stable control law. The global attractivity of the presented approach is validated in a spatial example, while its benefits over pose-tracking are showcased through a planar case-study.
This paper focuses on spatial time-optimal motion planning, a generalization of the exact time-optimal path following problem that allows the system to plan within a predefined space. In contrast to state-of-the-art methods, we drop the assumption that a collision-free geometric reference is given. Instead, we present a two-stage motion planning method that solely relies on a goal location and a geometric representation of the environment to compute a time-optimal trajectory that is compliant with system dynamics and constraints. To do so, the proposed scheme first computes an obstacle-free Pythagorean Hodograph parametric spline, and second solves a spatially reformulated minimum-time optimization problem. The spline obtained in the first stage is not a geometric reference, but an extension of the environment representation, and thus, time-optimality of the solution is guaranteed. The efficacy of the proposed approach is benchmarked by a known planar example and validated in a more complex spatial system, illustrating its versatility and applicability.
This paper presents a two-stage prediction-based control scheme for embedding the environment's geometric properties into a collision-free Pythagorean Hodograph spline, and subsequently finding the optimal path within the parameterized free space. The ingredients of this approach are twofold: First, we present a novel spatial path parameterization applicable to any arbitrary curve without prior assumptions in its adapted frame. Second, we identify the appropriateness of Pythagorean Hodograph curves for a compact and continuous definition of the path-parametric functions required by the presented spatial model. This dual-stage formulation results in a motion planning approach, where the geometric properties of the environment arise as states of the prediction model. Thus, the presented method is attractive for motion planning in dense environments. The efficacy of the approach is evaluated according to an illustrative example.
This paper reports on the development, execution, and open-sourcing of a new robotics course at MIT. The course is a modern take on "Visual Navigation for Autonomous Vehicles" (VNAV) and targets first-year graduate students and senior undergraduates with prior exposure to robotics. VNAV has the goal of preparing the students to perform research in robotics and vision-based navigation, with emphasis on drones and self-driving cars. The course spans the entire autonomous navigation pipeline; as such, it covers a broad set of topics, including geometric control and trajectory optimization, 2D and 3D computer vision, visual and visual-inertial odometry, place recognition, simultaneous localization and mapping, and geometric deep learning for perception. VNAV has three key features. First, it bridges traditional computer vision and robotics courses by exposing the challenges that are specific to embodied intelligence, e.g., limited computation and need for just-in-time and robust perception to close the loop over control and decision making. Second, it strikes a balance between depth and breadth by combining rigorous technical notes (including topics that are less explored in typical robotics courses, e.g., on-manifold optimization) with slides and videos showcasing the latest research results. Third, it provides a compelling approach to hands-on robotics education by leveraging a physical drone platform (mostly suitable for small residential courses) and a photo-realistic Unity-based simulator (open-source and scalable to large online courses). VNAV has been offered at MIT in the Falls of 2018-2021 and is now publicly available on MIT OpenCourseWare (OCW).
We present SMORS, the first Soft fully actuated MultirOtoR System for multimodal locomotion. Unlike conventional hexarotors, SMORS is equipped with three rigid and three continuously soft arms, with each arm hosting a propeller. We create a bridge between the fields of soft and aerial robotics by mechanically coupling the actuation of a fully actuated flying platform with the actuation of a soft robotic manipulator. Each rotor is slightly tilted, allowing for full actuation of the platform. The soft components combined with the platform's full actuation allow for a robust interaction, in the form of efficient multimodal locomotion. In this work, we present the dynamical model of the platform, derive a closed-loop control, and present simulation results fortifying the robustness of the platform under a jumping-flying maneuver. We demonstrate in simulations that our multimodal locomotion approach can be more energy-efficient than the flight with a hexarotor.