Robot perception is far from what humans are capable of. Humans do not only have a complex semantic scene understanding but also extract fine-grained intra-object properties for the salient ones. When humans look at plants, they naturally perceive the plant architecture with its individual leaves and branching system. In this work, we want to advance the granularity in plant understanding for agricultural precision robots. We develop a model to extract fine-grained phenotypic information, such as leaf-, stem-, and vein instances. The underlying dataset RumexLeaves is made publicly available and is the first of its kind with keypoint-guided polyline annotations leading along the line from the lowest stem point along the leaf basal to the leaf apex. Furthermore, we introduce an adapted metric POKS complying with the concept of keypoint-guided polylines. In our experimental evaluation, we provide baseline results for our newly introduced dataset while showcasing the benefits of POKS over OKS.
We explore the use of uncertainty estimation in the maritime domain, showing the efficacy on toy datasets (CIFAR10) and proving it on an in-house dataset, SHIPS. We present a method joining the intra-class uncertainty achieved using Monte Carlo Dropout, with recent discoveries in the field of outlier detection, to gain more holistic uncertainty measures. We explore the relationship between the introduced uncertainty measures and examine how well they work on CIFAR10 and in a real-life setting. Our work improves the FPR95 by 8% compared to the current highest-performing work when the models are trained without out-of-distribution data. We increase the performance by 77% compared to a vanilla implementation of the Wide ResNet. We release the SHIPS dataset and show the effectiveness of our method by improving the FPR95 by 44.2% with respect to the baseline. Our approach is model agnostic, easy to implement, and often does not require model retraining.
We present a lightweight encoder-decoder archi- tecture for monocular depth estimation, specifically designed for embedded platforms. Our main contribution is the Guided Upsampling Block (GUB) for building the decoder of our model. Motivated by the concept of guided image filtering, GUB relies on the image to guide the decoder on upsampling the feature representation and the depth map reconstruction, achieving high resolution results with fine-grained details. Based on multiple GUBs, our model outperforms the related methods on the NYU Depth V2 dataset in terms of accuracy while delivering up to 35.1 fps on the NVIDIA Jetson Nano and up to 144.5 fps on the NVIDIA Xavier NX. Similarly, on the KITTI dataset, inference is possible with up to 23.7 fps on the Jetson Nano and 102.9 fps on the Xavier NX. Our code and models are made publicly available.
Real-world perception systems in many cases build on hardware with limited resources to adhere to cost and power limitations of their carrying system. Deploying deep neural networks on resource-constrained hardware became possible with model compression techniques, as well as efficient and hardware-aware architecture design. However, model adaptation is additionally required due to the diverse operation environments. In this work, we address the problem of training deep neural networks on resource-constrained hardware in the context of visual domain adaptation. We select the task of monocular depth estimation where our goal is to transform a pre-trained model to the target's domain data. While the source domain includes labels, we assume an unlabelled target domain, as it happens in real-world applications. Then, we present an adversarial learning approach that is adapted for training on the device with limited resources. Since visual domain adaptation, i.e. neural network training, has not been previously explored for resource-constrained hardware, we present the first feasibility study for image-based depth estimation. Our experiments show that visual domain adaptation is relevant only for efficient network architectures and training sets at the order of a few hundred samples. Models and code are publicly available.
Regardless of their industrial or research application, the streamlining of robot operations is limited by the proximity of experienced users to the actual hardware. Be it massive open online robotics courses, crowd-sourcing of robot task training, or remote research on massive robot farms for machine learning, the need to create an apt remote Human-Machine Interface is quite prevalent. The paper at hand proposes a novel solution to the programming/training of remote robots employing an intuitive and accurate user-interface which offers all the benefits of working with real robots without imposing delays and inefficiency. The system includes: a vision-based 3D hand detection and gesture recognition subsystem, a simulated digital twin of a robot as visual feedback, and the "remote" robot learning/executing trajectories using dynamic motion primitives. Our results indicate that the system is a promising solution to the problem of remote training of robot tasks.