This article studies the problem of applying normal forces on a surface, using an underactuated aerial vehicle equipped with a dexterous robotic arm. A force-motion high-level controller is designed based on a Lyapunov function encompassing alignment and exerted force errors. This controller is coupled with a Control Barrier Function constraint under an optimization scheme using Quadratic Programming. This aims to enforce a prescribed relationship between the approaching motion for the end-effector and its alignment with the surface, thus ensuring safe operation. An adaptive low-level controller is devised for the aerial vehicle, capable of tracking velocity commands generated by the high-level controller. Simulations are presented to demonstrate the force exertion stability and safety of the controller in cases of large disturbances.
Control barrier functions (CBFs) have been widely applied to safety-critical robotic applications. However, the construction of control barrier functions for robotic systems remains a challenging task. Recently, collision detection using differentiable optimization has provided a way to compute the minimum uniform scaling factor that results in an intersection between two convex shapes and to also compute the Jacobian of the scaling factor. In this paper, we propose a framework that uses this scaling factor, with an offset, to systematically define a CBF for obstacle avoidance tasks. We provide a theoretical analysis that proves the continuity of the proposed CBF. Empirically, we show that the proposed CBF is continuously differentiable, and the resulting optimal control problem is computationally efficient, which makes it applicable for real-time robotic control. We validate our approach, first using a 2D mobile robot example, then on the Franka-Emika Research~3 (FR3) robot manipulator both in simulation and experiment.
The exploration of mutual-benefit cross-domains has shown great potential toward accurate self-supervised depth estimation. In this work, we revisit feature fusion between depth and semantic information and propose an efficient local adaptive attention method for geometric aware representation enhancement. Instead of building global connections or deforming attention across the feature space without restraint, we bound the spatial interaction within a learnable region of interest. In particular, we leverage geometric cues from semantic information to learn local adaptive bounding boxes to guide unsupervised feature aggregation. The local areas preclude most irrelevant reference points from attention space, yielding more selective feature learning and faster convergence. We naturally extend the paradigm into a multi-head and hierarchic way to enable the information distillation in different semantic levels and improve the feature discriminative ability for fine-grained depth estimation. Extensive experiments on the KITTI dataset show that our proposed method establishes a new state-of-the-art in self-supervised monocular depth estimation task, demonstrating the effectiveness of our approach over former Transformer variants.
The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.
The control problem of a multi-copter swarm, mechanically coupled through a modular lattice structure of connecting rods, is considered in this article. The system's structural elasticity is considered in deriving the system's dynamics. The devised controller is robust against the induced flexibilities, while an inherent adaptation scheme allows for the control of asymmetrical configurations and the transportation of unknown payloads. Certain optimization metrics are introduced for solving the individual agent thrust allocation problem while achieving maximum system flight time, resulting in a platform-independent control implementation. Experimental studies are offered to illustrate the efficiency of the suggested controller under typical flight conditions, increased rod elasticities and payload transportation.
Planetary exploration depends heavily on 3D image data to characterize the static and dynamic properties of the rock and environment. Analyzing 3D images requires many computations, causing efficiency to suffer lengthy processing time alongside large energy consumption. High-Performance Computing (HPC) provides apparent efficiency at the expense of energy consumption. However, for remote explorations, the conveyed surveillance and the robotized sensing need faster data analysis with ultimate accuracy to make real-time decisions. In such environments, access to HPC and energy is limited. Therefore, we realize that reducing the number of computations to optimal and maintaining the desired accuracy leads to higher efficiency. This paper demonstrates the semantic segmentation capability of a probabilistic decision tree algorithm, 3D Adapted Random Forest Vision (3DARFV), exceeding deep learning algorithm efficiency at the utmost accuracy.
Recent object tracking methods depend upon deep networks or convoluted architectures. Most of those trackers can hardly meet real-time processing requirements on mobile platforms with limited computing resources. In this work, we introduce the Siamese Transformer Pyramid Network (SiamTPN), which inherits the advantages from both CNN and Transformer architectures. Specifically, we exploit the inherent feature pyramid of a lightweight network (ShuffleNetV2) and reinforce it with a Transformer to construct a robust target-specific appearance model. A centralized architecture with lateral cross attention is developed for building augmented high-level feature maps. To avoid the computation and memory intensity while fusing pyramid representations with the Transformer, we further introduce the pooling attention module, which significantly reduces memory and time complexity while improving the robustness. Comprehensive experiments on both aerial and prevalent tracking benchmarks achieve competitive results while operating at high speed, demonstrating the effectiveness of SiamTPN. Moreover, our fastest variant tracker operates over 30 Hz on a single CPU-core and obtaining an AUC score of 58.1% on the LaSOT dataset. Source codes are available at https://github.com/RISCNYUAD/SiamTPNTracker
In this paper, we deal with the problem to predict the future 3D motions of 3D object scans from previous two consecutive frames. Previous methods mostly focus on sparse motion prediction in the form of skeletons. While in this paper we focus on predicting dense 3D motions in the from of 3D point clouds. To approach this problem, we propose a self-supervised approach that leverages the power of the deep neural network to learn a continuous flow function of 3D point clouds that can predict temporally consistent future motions and naturally bring out the correspondences among consecutive point clouds at the same time. More specifically, in our approach, to eliminate the unsolved and challenging process of defining a discrete point convolution on 3D point cloud sequences to encode spatial and temporal information, we introduce a learnable latent code to represent the temporal-aware shape descriptor which is optimized during model training. Moreover, a temporally consistent motion Morpher is proposed to learn a continuous flow field which deforms a 3D scan from the current frame to the next frame. We perform extensive experiments on D-FAUST, SCAPE and TOSCA benchmark data sets and the results demonstrate that our approach is capable of handling temporally inconsistent input and produces consistent future 3D motion while requiring no ground truth supervision.
Cooperative Unmanned Aerial Systems (UASs) in GPS-denied environments demand an accurate pose-localization system to ensure efficient operation. In this paper we present a novel visual relative localization system capable of monitoring a 360$^o$ Field-of-View (FoV) in the immediate surroundings of the UAS using a spherical camera. Collaborating UASs carry a set of fiducial markers which are detected by the camera-system. The spherical image is partitioned and rectified into a set of square images. An algorithm is proposed to select the number of images that balances the computational load while maintaining a minimum tracking-accuracy level. The developed system tracks UASs in the vicinity of the spherical camera and experimental studies using two UASs are offered to validate the performance of the relative visual localization against that of a motion capture system.