We present a novel end-to-end algorithm (PoCo) for the indoor RGB-D place recognition task, aimed at identifying the most likely match for a given query frame within a reference database. The task presents inherent challenges attributed to the constrained field of view and limited range of perception sensors. We propose a new network architecture, which generalizes the recent Context of Clusters (CoCs) to extract global descriptors directly from the noisy point clouds through end-to-end learning. Moreover, we develop the architecture by integrating both color and geometric modalities into the point features to enhance the global descriptor representation. We conducted evaluations on public datasets ScanNet-PR and ARKit with 807 and 5047 scenarios, respectively. PoCo achieves SOTA performance: on ScanNet-PR, we achieve R@1 of 64.63%, a 5.7% improvement from the best-published result CGis (61.12%); on Arkit, we achieve R@1 of 45.12%, a 13.3% improvement from the best-published result CGis (39.82%). In addition, PoCo shows higher efficiency than CGis in inference time (1.75X-faster), and we demonstrate the effectiveness of PoCo in recognizing places within a real-world laboratory environment.
We propose VLM-Social-Nav, a novel Vision-Language Model (VLM) based navigation approach to compute a robot's trajectory in human-centered environments. Our goal is to make real-time decisions on robot actions that are socially compliant with human expectations. We utilize a perception model to detect important social entities and prompt a VLM to generate guidance for socially compliant robot behavior. VLM-Social-Nav uses a VLM-based scoring module that computes a cost term that ensures socially appropriate and effective robot actions generated by the underlying planner. Our overall approach reduces reliance on large datasets (for training) and enhances adaptability in decision-making. In practice, it results in improved socially compliant navigation in human-shared environments. We demonstrate and evaluate our system in four different real-world social navigation scenarios with a Turtlebot robot. We observe at least 36.37% improvement in average success rate and 20.00% improvement in average collision rate in the four social navigation scenarios. The user study score shows that VLM-Social-Nav generates the most socially compliant navigation behavior.
We present a novel end-to-end diffusion-based trajectory generation method, DTG, for mapless global navigation in challenging outdoor scenarios with occlusions and unstructured off-road features like grass, buildings, bushes, etc. Given a distant goal, our approach computes a trajectory that satisfies the following goals: (1) minimize the travel distance to the goal; (2) maximize the traversability by choosing paths that do not lie in undesirable areas. Specifically, we present a novel Conditional RNN(CRNN) for diffusion models to efficiently generate trajectories. Furthermore, we propose an adaptive training method that ensures that the diffusion model generates more traversable trajectories. We evaluate our methods in various outdoor scenes and compare the performance with other global navigation algorithms on a Husky robot. In practice, we observe at least a 15% improvement in traveling distance and around a 7% improvement in traversability.
In this paper, we highlight the critical issues of robustness and safety associated with integrating large language models (LLMs) and vision-language models (VLMs) into robotics applications. Recent works have focused on using LLMs and VLMs to improve the performance of robotics tasks, such as manipulation, navigation, etc. However, such integration can introduce significant vulnerabilities, in terms of their susceptibility to adversarial attacks due to the language models, potentially leading to catastrophic consequences. By examining recent works at the interface of LLMs/VLMs and robotics, we show that it is easy to manipulate or misguide the robot's actions, leading to safety hazards. We define and provide examples of several plausible adversarial attacks, and conduct experiments on three prominent robot frameworks integrated with a language model, including KnowNo VIMA, and Instruct2Act, to assess their susceptibility to these attacks. Our empirical findings reveal a striking vulnerability of LLM/VLM-robot integrated systems: simple adversarial attacks can significantly undermine the effectiveness of LLM/VLM-robot integrated systems. Specifically, our data demonstrate an average performance deterioration of 21.2% under prompt attacks and a more alarming 30.2% under perception attacks. These results underscore the critical need for robust countermeasures to ensure the safe and reliable deployment of the advanced LLM/VLM-based robotic systems.
Segmentation is a critical step in analyzing the developing human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, and the generalizability of algorithms across different imaging centers remains unsolved, limiting real-world clinical applicability. The multi-center FeTA Challenge 2022 focuses on advancing the generalizability of fetal brain segmentation algorithms for magnetic resonance imaging (MRI). In FeTA 2022, the training dataset contained images and corresponding manually annotated multi-class labels from two imaging centers, and the testing data contained images from these two imaging centers as well as two additional unseen centers. The data from different centers varied in many aspects, including scanners used, imaging parameters, and fetal brain super-resolution algorithms applied. 16 teams participated in the challenge, and 17 algorithms were evaluated. Here, a detailed overview and analysis of the challenge results are provided, focusing on the generalizability of the submissions. Both in- and out of domain, the white matter and ventricles were segmented with the highest accuracy, while the most challenging structure remains the cerebral cortex due to anatomical complexity. The FeTA Challenge 2022 was able to successfully evaluate and advance generalizability of multi-class fetal brain tissue segmentation algorithms for MRI and it continues to benchmark new algorithms. The resulting new methods contribute to improving the analysis of brain development in utero.
In this paper, we propose a novel locally statistical variational active contour model based on I-divergence-TV denoising model, which hybrides geodesic active contour (GAC) model with active contours without edges (ACWE) model, and can be used to segment images corrupted by multiplicative gamma noise. By adding a diffusion term into the level set evolution (LSE) equation of the proposed model, we construct a reaction-diffusion (RD) equation, which can gradually regularize the level set function (LSF) to be piecewise constant in each segment domain and gain the stable solution. We further transform the proposed model into classic ROF model by adding a proximity term. Inspired by a fast denoising algorithm proposed by Jia-Zhao recently, we propose two fast fixed point algorithms to solve SAR image segmentation question. Experimental results for real SAR images show that the proposed image segmentation model can efficiently stop the contours at weak or blurred edges, and can automatically detect the exterior and interior boundaries of images with multiplicative gamma noise. The proposed FPRD1/FPRD2 models are about 1/2 (or less than) of the time required for the SBRD model based on the Split Bregman technique.
In this paper, we propose a novel variational active contour model based on Aubert-Aujol (AA) denoising model, which hybrides geodesic active contour (GAC) model with active contours without edges (ACWE) model and can be used to segment images corrupted by multiplicative gamma noise. We transform the proposed model into classic ROF model by adding a proximity term. Inspired by a fast denosing algorithm proposed by Jia-Zhao recently, we propose two fast fixed point algorithms to solve SAR image segmentation question. Experimental results for real SAR images show that the proposed image segmentation model can efficiently stop the contours at weak or blurred edges, and can automatically detect the exterior and interior boundaries of images with multiplicative gamma noise. The proposed fast fixed point algorithms are robustness to initialization contour, and can further reduce about 15% of the time needed for algorithm proposed by Goldstein-Osher.
In this paper, we propose a novel locally statistical active contour model (LACM) based on Aubert-Aujol (AA) denoising model and variational level set method, which can be used for SAR images segmentation with intensity inhomogeneity. Then we transform the proposed model into a global optimization model by using convex relaxation technique. Firstly, we apply the Split Bregman technique to transform the global optimization model into two alternating optimization processes of Shrink operator and Laplace operator, which is called SB_LACM model. Moreover, we propose two fast models to solve the global optimization model , which are more efficient than the SB_LACM model. The first model is: we add the proximal function to transform the global optimization model to a general ROF model[29], which can be solved by a fast denoising algorithm proposed by R.-Q.Jia, and H.Zhao; Thus we obtain a fast segmentation algorithm with global optimization solver that does not involve partial differential equations or difference equation, and only need simple difference computation. The second model is: we use a different splitting approach than one model to transform the global optimization model into a differentiable term and a general ROF model term, which can be solved by the same technique as the first model. Experiments using some challenging synthetic images and Envisat SAR images demonstrate the superiority of our proposed models with respect to the state-of-the-art models.