Recent advances of deep learning have brought exceptional performance on many computer vision tasks such as semantic segmentation and depth estimation. However, the vulnerability of deep neural networks towards adversarial examples have caused grave concerns for real-world deployment. In this paper, we present to the best of our knowledge the first systematic study of adversarial attacks on monocular depth estimation, an important task of 3D scene understanding in scenarios such as autonomous driving and robot navigation. In order to understand the impact of adversarial attacks on depth estimation, we first define a taxonomy of different attack scenarios for depth estimation, including non-targeted attacks, targeted attacks and universal attacks. We then adapt several state-of-the-art attack methods for classification on the field of depth estimation. Besides, multi-task attacks are introduced to further improve the attack performance for universal attacks. Experimental results show that it is possible to generate significant errors on depth estimation. In particular, we demonstrate that our methods can conduct targeted attacks on given objects (such as a car), resulting in depth estimation 3-4x away from the ground truth (e.g., from 20m to 80m).
Artificial Intelligence (AI) is gradually changing the practice of surgery with the advanced technological development of imaging, navigation and robotic intervention. In this article, the recent successful and influential applications of AI in surgery are reviewed from pre-operative planning and intra-operative guidance to the integration of surgical robots. We end with summarizing the current state, emerging trends and major challenges in the future development of AI in surgery.
The recent breakthroughs in computer vision have benefited from the availability of large representative datasets (e.g. ImageNet and COCO) for training. Yet, robotic vision poses unique challenges for applying visual algorithms developed from these standard computer vision datasets due to their implicit assumption over non-varying distributions for a fixed set of tasks. Fully retraining models each time a new task becomes available is infeasible due to computational, storage and sometimes privacy issues, while na\"{i}ve incremental strategies have been shown to suffer from catastrophic forgetting. It is crucial for the robots to operate continuously under open-set and detrimental conditions with adaptive visual perceptual systems, where lifelong learning is a fundamental capability. However, very few datasets and benchmarks are available to evaluate and compare emerging techniques. To fill this gap, we provide a new lifelong robotic vision dataset ("OpenLORIS-Object") collected via RGB-D cameras mounted on mobile robots. The dataset embeds the challenges faced by a robot in the real-life application and provides new benchmarks for validating lifelong object recognition algorithms. Moreover, we have provided a testbed of $9$ state-of-the-art lifelong learning algorithms. Each of them involves $48$ tasks with $4$ evaluation metrics over the OpenLORIS-Object dataset. The results demonstrate that the object recognition task in the ever-changing difficulty environments is far from being solved and the bottlenecks are at the forward/backward transfer designs. Our dataset and benchmark are publicly available at \href{https://lifelong-robotic-vision.github.io/dataset/Data_Object-Recognition.html}{\underline{this url}}.
Railway transportation is the artery of China's national economy and plays an important role in the development of today's society. Due to the late start of China's railway security inspection technology, the current railway security inspection tasks mainly rely on manual inspection, but the manual inspection efficiency is low, and a lot of manpower and material resources are needed. In this paper, we establish a steel rail fastener detection image dataset, which contains 4,000 rail fastener pictures about 4 types. We use the regional suggestion network to generate the region of interest, extracts the features using the convolutional neural network, and fuses the classifier into the detection network. With online hard sample mining to improve the accuracy of the model, we optimize the Faster RCNN detection framework by reducing the number of regions of interest. Finally, the model accuracy reaches 99% and the speed reaches 35FPS in the deployment environment of TITAN X GPU.
Railway transportation is the artery of China's national economy and plays an important role in the development of today's society. Due to the late start of China's railway security inspection technology, the current railway security inspection tasks mainly rely on manual inspection, but the manual inspection efficiency is low, and a lot of manpower and material resources are needed. In this paper, we establish a steel rail fastener detection image dataset, which contains 4,000 rail fastener pictures about 4 types. We use the regional suggestion network to generate the region of interest, extracts the features using the convolutional neural network, and fuses the classifier into the detection network. With online hard sample mining to improve the accuracy of the model, we optimize the Faster RCNN detection framework by reducing the number of regions of interest. Finally, the model accuracy reaches 99% and the speed reaches 35FPS in the deployment environment of TITAN X GPU.
In this paper, we propose a novel method for highly efficient follicular segmentation of thyroid cytopathological WSIs. Firstly, we propose a hybrid segmentation architecture, which integrates a classifier into Deeplab V3 by adding a branch. A large amount of the WSI segmentation time is saved by skipping the irrelevant areas using the classification branch. Secondly, we merge the low scale fine features into the original atrous spatial pyramid pooling (ASPP) in Deeplab V3 to accurately represent the details in cytopathological images. Thirdly, our hybrid model is trained by a criterion-oriented adaptive loss function, which leads the model converging much faster. Experimental results on a collection of thyroid patches demonstrate that the proposed model reaches 80.9% on the segmentation accuracy. Besides, 93% time is reduced for the WSI segmentation by using our proposed method, and the WSI-level accuracy achieves 53.4%.
Motion behaviors of a rigid body can be characterized by a 6-dimensional motion trajectory, which contains position vectors of a reference point on the rigid body and rotations of this rigid body over time. This paper devises a Rotation and Relative Velocity (RRV) descriptor by exploring the local translational and rotational invariants of motion trajectories of rigid bodies, which is insensitive to noise, invariant to rigid transformation and scaling. A flexible metric is also introduced to measure the distance between two RRV descriptors. The RRV descriptor is then applied to characterize motions of a human body skeleton modeled as articulated interconnections of multiple rigid bodies. To illustrate the descriptive ability of the RRV descriptor, we explore it for different rigid body motion recognition tasks. The experimental results on benchmark datasets demonstrate that this simple RRV descriptor outperforms the previous ones regarding recognition accuracy without increasing computational cost.