Abstract:Computed Tomography (CT) image guidance enables accurate and safe minimally invasive treatment of diseases, including cancer and chronic pain, with needle-like tools via a percutaneous approach. The physician incrementally inserts and adjusts the needle with intermediate images due to the accuracy limitation of free-hand adjustment and patient physiological motion. Scanning frequency is limited to minimize ionizing radiation exposure for the patient and physician. Robots can provide high positional accuracy and compensate for physiological motion with fewer scans. To accomplish this, the robots must operate within the confined imaging bore while retaining sufficient dexterity to insert and manipulate the needle. This paper presents CRANE: CT Robotic Arm and Needle Emplacer, a CT-compatible robot with a design focused on system dexterity that enables physicians to manipulate and insert needles within the scanner bore as naturally as they would be able to by hand. We define abstract and measurable clinically motivated metrics for in-bore dexterity applicable to general-purpose intra-bore image-guided needle placement robots, develop an automatic robot planning and control method for intra-bore needle manipulation and device setup, and demonstrate the redundant linkage design provides dexterity across various human morphology and meets the clinical requirements for target accuracy during an in-situ evaluation.
Abstract:Screw-based locomotion is a robust method of locomotion across a wide range of media including water, sand, and gravel. A challenge with screws is their significant number of impactful design parameters that affect locomotion performance in varying environments. One crucial parameter is the angle of attack, also referred to as the lead angle. The angle of attack has a significant impact on the screw's performance as it creates a trade-off between efficiency and forward velocity. This trend is consistent across various types of media. In this work, we present a Novel Actuating Screw Unit. It is the first screw-based propulsion design that enables the reconfiguration of the angle of attack dynamically for optimized locomotion across multiple media. The design is inspired by the kresling unit, which is a widespread mechanism in origami robotics, and the angle of attack is adjusted with a linear actuator, while the entire unit is spun on its axis as an archimedean screw. NASU is integrated onto a mobile test-bed and experiments are conducted in a large variety of media including gravel, grass, and sand. Our experiments show the proposed design is a promising direction for reconfigurable screws by allowing control to optimize for efficiency or velocity.
Abstract:Robot navigation within complex environments requires precise state estimation and localization to ensure robust and safe operations. For ambulating mobile robots like robot snakes, traditional methods for sensing require multiple embedded sensors or markers, leading to increased complexity, cost, and increased points of failure. Alternatively, deploying an external camera in the environment is very easy to do, and marker-less state estimation of the robot from this camera's images is an ideal solution: both simple and cost-effective. However, the challenge in this process is in tracking the robot under larger environments where the cameras may be moved around without extrinsic calibration, or maybe when in motion (e.g., a drone following the robot). The scenario itself presents a complex challenge: single-image reconstruction of robot poses under noisy observations. In this paper, we address the problem of tracking ambulatory mobile robots from a single camera. The method combines differentiable rendering with the Kalman filter. This synergy allows for simultaneous estimation of the robot's joint angle and pose while also providing state uncertainty which could be used later on for robust control. We demonstrate the efficacy of our approach on a snake-like robot in both stationary and non-stationary (moving) cameras, validating its performance in both structured and unstructured scenarios. The results achieved show an average error of 0.05 m in localizing the robot's base position and 6 degrees in joint state estimation. We believe this novel technique opens up possibilities for enhanced robot mobility and navigation in future exploratory and search-and-rescue missions.
Abstract:Constrained robot motion planning is a ubiquitous need for robots interacting with everyday environments, but it is a notoriously difficult problem to solve. Many sampled points in a sample-based planner need to be rejected as they fall outside the constraint manifold, or require significant iterative effort to correct. Given this, few solutions exist that present a constraint-satisfying trajectory for robots, in reasonable time and of low path cost. In this work, we present a transformer-based model for motion planning with task space constraints for manipulation systems. Vector Quantized-Motion Planning Transformer (VQ-MPT) is a recent learning-based model that reduces the search space for unconstrained planning for sampling-based motion planners. We propose to adapt a pre-trained VQ-MPT model to reduce the search space for constraint planning without retraining or finetuning the model. We also propose to update the neural network output to move sampling regions closer to the constraint manifold. Our experiments show how VQ-MPT improves planning times and accuracy compared to traditional planners in simulated and real-world environments. Unlike previous learning methods, which require task-related data, our method uses pre-trained neural network models and requires no additional data for training and finetuning the model making this a \textit{one-shot} process. We also tested our method on a physical Franka Panda robot with real-world sensor data, demonstrating the generalizability of our algorithm. We anticipate this approach to be an accessible and broadly useful for transferring learned neural planners to various robotic-environment interaction scenarios.
Abstract:There has been increasing awareness of the difficulties in reaching and extracting people from mass casualty scenarios, such as those arising from natural disasters. While platforms have been designed to consider reaching casualties and even carrying them out of harm's way, the challenge of repositioning a casualty from its found configuration to one suitable for extraction has not been explicitly explored. Furthermore, this planning problem needs to incorporate biomechanical safety considerations for the casualty. Thus, we present a first solution to biomechanically safe trajectory generation for repositioning limbs of unconscious human casualties. We describe biomechanical safety as mathematical constraints, mechanical descriptions of the dynamics for the robot-human coupled system, and the planning and trajectory optimization process that considers this coupled and constrained system. We finally evaluate our approach over several variations of the problem and demonstrate it on a real robot and human subject. This work provides a crucial part of search and rescue that can be used in conjunction with past and present works involving robots and vision systems designed for search and rescue.
Abstract:Manipulation of tissue with surgical tools often results in large deformations that current methods in tracking and reconstructing algorithms have not effectively addressed. A major source of tracking errors during large deformations stems from wrong data association between observed sensor measurements with previously tracked scene. To mitigate this issue, we present a surgical perception framework, SuPerPM, that leverages learning-based non-rigid point cloud matching for data association, thus accommodating larger deformations. The learning models typically require training data with ground truth point cloud correspondences, which is challenging or even impractical to collect in surgical environments. Thus, for tuning the learning model, we gather endoscopic data of soft tissue being manipulated by a surgical robot and then establish correspondences between point clouds at different time points to serve as ground truth. This was achieved by employing a position-based dynamics (PBD) simulation to ensure that the correspondences adhered to physical constraints. The proposed framework is demonstrated on several challenging surgical datasets that are characterized by large deformations, achieving superior performance over state-of-the-art surgical scene tracking algorithms.
Abstract:Towards flexible object-centric visual perception, we propose a one-shot instance-aware object keypoint (OKP) extraction approach, AnyOKP, which leverages the powerful representation ability of pretrained vision transformer (ViT), and can obtain keypoints on multiple object instances of arbitrary category after learning from a support image. An off-the-shelf petrained ViT is directly deployed for generalizable and transferable feature extraction, which is followed by training-free feature enhancement. The best-prototype pairs (BPPs) are searched for in support and query images based on appearance similarity, to yield instance-unaware candidate keypoints.Then, the entire graph with all candidate keypoints as vertices are divided to sub-graphs according to the feature distributions on the graph edges. Finally, each sub-graph represents an object instance. AnyOKP is evaluated on real object images collected with the cameras of a robot arm, a mobile robot, and a surgical robot, which not only demonstrates the cross-category flexibility and instance awareness, but also show remarkable robustness to domain shift and viewpoint change.
Abstract:The ability to envision future states is crucial to informed decision making while interacting with dynamic environments. With cameras providing a prevalent and information rich sensing modality, the problem of predicting future states from image sequences has garnered a lot of attention. Current state of the art methods typically train large parametric models for their predictions. Though often able to predict with accuracy, these models rely on the availability of large training datasets to converge to useful solutions. In this paper we focus on the problem of predicting future images of an image sequence from very little training data. To approach this problem, we use non-parametric models to take a probabilistic approach to image prediction. We generate probability distributions over sequentially predicted images and propagate uncertainty through time to generate a confidence metric for our predictions. Gaussian Processes are used for their data efficiency and ability to readily incorporate new training data online. We showcase our method by successfully predicting future frames of a smooth fluid simulation environment.
Abstract:Large offline learning-based models have enabled robots to successfully interact with objects for a wide variety of tasks. However, these models rely on fairly consistent structured environments. For more unstructured environments, an online learning component is necessary to gather and estimate information about objects in the environment in order to successfully interact with them. Unfortunately, online learning methods like Bayesian non-parametric models struggle with changes in the environment, which is often the desired outcome of interaction-based tasks. We propose using an object-centric representation for interactive online learning. This representation is generated by transforming the robot's actions into the object's coordinate frame. We demonstrate how switching to this task-relevant space improves our ability to reason with the training data collected online, enabling scalable online learning of robot-object interactions. We showcase our method by successfully navigating a manipulator arm through an environment with multiple unknown objects without violating interaction-based constraints.
Abstract:Without ground truth supervision, self-supervised depth estimation can be trapped in a local minimum due to the gradient-locality issue of the photometric loss. In this paper, we present a framework to enhance depth by leveraging semantic segmentation to guide the network to jump out of the local minimum. Prior works have proposed to share encoders between these two tasks or explicitly align them based on priors like the consistency between edges in the depth and segmentation maps. Yet, these methods usually require ground truth or high-quality pseudo labels, which may not be easily accessible in real-world applications. In contrast, we investigate self-supervised depth estimation along with a segmentation branch that is supervised with noisy labels provided by models pre-trained with limited data. We extend parameter sharing from the encoder to the decoder and study the influence of different numbers of shared decoder parameters on model performance. Also, we propose to use cross-task information to refine current depth and segmentation predictions to generate pseudo-depth and semantic labels for training. The advantages of the proposed method are demonstrated through extensive experiments on the KITTI benchmark and a downstream task for endoscopic tissue deformation tracking.