In this paper, we introduce VCSL (Video Copy Segment Localization), a new comprehensive segment-level annotated video copy dataset. Compared with existing copy detection datasets restricted by either video-level annotation or small-scale, VCSL not only has two orders of magnitude more segment-level labelled data, with 160k realistic video copy pairs containing more than 280k localized copied segment pairs, but also covers a variety of video categories and a wide range of video duration. All the copied segments inside each collected video pair are manually extracted and accompanied by precisely annotated starting and ending timestamps. Alongside the dataset, we also propose a novel evaluation protocol that better measures the prediction accuracy of copy overlapping segments between a video pair and shows improved adaptability in different scenarios. By benchmarking several baseline and state-of-the-art segment-level video copy detection methods with the proposed dataset and evaluation metric, we provide a comprehensive analysis that uncovers the strengths and weaknesses of current approaches, hoping to open up promising directions for future works. The VCSL dataset, metric and benchmark codes are all publicly available at https://github.com/alipay/VCSL.
With widespread applications of artificial intelligence (AI), the capabilities of the perception, understanding, decision-making and control for autonomous systems have improved significantly in the past years. When autonomous systems consider the performance of accuracy and transferability, several AI methods, like adversarial learning, reinforcement learning (RL) and meta-learning, show their powerful performance. Here, we review the learning-based approaches in autonomous systems from the perspectives of accuracy and transferability. Accuracy means that a well-trained model shows good results during the testing phase, in which the testing set shares a same task or a data distribution with the training set. Transferability means that when a well-trained model is transferred to other testing domains, the accuracy is still good. Firstly, we introduce some basic concepts of transfer learning and then present some preliminaries of adversarial learning, RL and meta-learning. Secondly, we focus on reviewing the accuracy or transferability or both of them to show the advantages of adversarial learning, like generative adversarial networks (GANs), in typical computer vision tasks in autonomous systems, including image style transfer, image superresolution, image deblurring/dehazing/rain removal, semantic segmentation, depth estimation, pedestrian detection and person re-identification (re-ID). Then, we further review the performance of RL and meta-learning from the aspects of accuracy or transferability or both of them in autonomous systems, involving pedestrian tracking, robot navigation and robotic manipulation. Finally, we discuss several challenges and future topics for using adversarial learning, RL and meta-learning in autonomous systems.
Depth information is important for autonomous systems to perceive environments and estimate their own state. Traditional depth estimation methods, like structure from motion and stereo vision matching, are built on feature correspondences of multiple viewpoints. Meanwhile, the predicted depth maps are sparse. Inferring depth information from a single image (monocular depth estimation) is an ill-posed problem. With the rapid development of deep neural networks, monocular depth estimation based on deep learning has been widely studied recently and achieved promising performance in accuracy. Meanwhile, dense depth maps are estimated from single images by deep neural networks in an end-to-end manner. In order to improve the accuracy of depth estimation, different kinds of network frameworks, loss functions and training strategies are proposed subsequently. Therefore, we survey the current monocular depth estimation methods based on deep learning in this review. Initially, we conclude several widely used datasets and evaluation indicators in deep learning-based depth estimation. Furthermore, we review some representative existing methods according to different training manners: supervised, unsupervised and semi-supervised. Finally, we discuss the challenges and provide some ideas for future researches in monocular depth estimation.
Autonomous systems possess the features of inferring their own ego-motion, autonomously understanding their surroundings, and planning trajectories. With the applications of deep learning and reinforcement learning, the perception and decision-making abilities of autonomous systems are being efficiently addressed, and many new learning-based algorithms have surfaced with respect to autonomous perception and decision-making. In this review, we focus on the applications of learning-based approaches in perception and decision-making in autonomous systems, which is different from previous reviews that discussed traditional methods. First, we delineate the existing classical simultaneous localization and mapping (SLAM) solutions and review the environmental perception and understanding methods based on deep learning, including deep learning-based monocular depth estimation, ego-motion prediction, image enhancement, object detection, semantic segmentation, and their combinations with traditional SLAM frameworks. Second, we briefly summarize the existing motion planning techniques, such as path planning and trajectory planning methods, and discuss the navigation methods based on reinforcement learning. Finally, we examine the several challenges and promising directions discussed and concluded in related research for future works in the era of computer science, automatic control, and robotics.
The ability of inferring its own ego-motion, autonomous understanding the surroundings and planning trajectory are the features of autonomous robot systems. Therefore, in order to improve the perception and decision-making ability of autonomous robot systems, Simultaneous localization and mapping (SLAM) and autonomous trajectory planning are widely researched, especially combining learning-based algorithms. In this paper, we mainly focus on the application of learning-based approaches to perception and decision-making, and survey the current state-of-the-art SLAM and trajectory planning algorithms. Firstly, we delineate the existing classical SLAM solutions and review deep learning-based methods of environment perception and understanding, including deep learning-based monocular depth estimation, ego-motion prediction, image enhancement, object detection, semantic segmentation and their combinations with previous SLAM frameworks. Secondly, we summarize the existing path planning and trajectory planning methods and survey the navigation methods based on reinforcement learning. Finally, several challenges and promising directions are concluded to related researchers for future work in both autonomous perception and decision-making.
The proliferation of fake news on social media has opened up new directions of research for timely identification and containment of fake news, and mitigation of its widespread impact on public opinion. While much of the earlier research was focused on identification of fake news based on its contents or by exploiting users' engagements with the news on social media, there has been a rising interest in proactive intervention strategies to counter the spread of misinformation and its impact on society. In this survey, we describe the modern-day problem of fake news and, in particular, highlight the technical challenges associated with it. We discuss existing methods and techniques applicable to both identification and mitigation, with a focus on the significant advances in each method and their advantages and limitations. In addition, research has often been limited by the quality of existing datasets and their specific application contexts. To alleviate this problem, we comprehensively compile and summarize characteristic features of available datasets. Furthermore, we outline new directions of research to facilitate future development of effective and interdisciplinary solutions.
Radio-frequency (RF) tomographic imaging is a promising technique for inferring multi-dimensional physical space by processing RF signals traversed across a region of interest. However, conventional RF tomography schemes are generally based on vector compressed sensing, which ignores the geometric structures of the target spaces and leads to low recovery precision. The recently proposed transform-based tensor model is more appropriate for sensory data processing, as it helps exploit the geometric structures of the three-dimensional target and improve the recovery precision. In this paper, we propose a novel tensor sensing approach that achieves highly accurate estimation for real-world three-dimensional spaces. First, we use the transform-based tensor model to formulate a tensor sensing problem, and propose a fast alternating minimization algorithm called Alt-Min. Secondly, we drive an algorithm which is optimized to reduce memory and computation requirements. Finally, we present evaluation of our Alt-Min approach using IKEA 3D data and demonstrate significant improvement in recovery error and convergence speed compared to prior tensor-based compressed sensing.
Deep Learning (DL) algorithm is the state-of-the-art algorithm of many computer science fields and applied on many intelligent mobile applications. In this paper, we propose a system called CoINF, a practical, adaptive, and flexible deep learning framework that enables cooperative inference between wearable devices (e.g., smartwatches and smart glasses) and handhelds. Our framework accelerates the processing and saves the energy consumption of generic deep learning models inference on wearables via judiciously offloading the workloads to paired handhelds at fine granularity in considering of the system environment, the application requirements, and user preference. Deployed as a user-space library, CoINF offers developer-friendly APIs that are as simple as those in traditional DL libraries such as TensorFlow, with all complicated offloading details hidden. We have implemented a prototype of CoINF on Android OS, and used real deep learning models to evaluate its performance on commercial off-the-shelf smartphone and smartwatches. The experimental results show that our framework can achieve substantial execution speedup and energy saving compared to wearable-only and handheld-only strategies.
Localization technology is important for the development of indoor location-based services (LBS). Global Positioning System (GPS) becomes invalid in indoor environments due to the non-line-of-sight issue, so it is urgent to develop a real-time high-accuracy localization approach for smartphones. However, accurate localization is challenging due to issues such as real-time response requirements, limited fingerprint samples and mobile device storage. To address these problems, we propose a novel deep learning architecture: Tensor-Generative Adversarial Network (TGAN). We first introduce a transform-based 3D tensor to model fingerprint samples. Instead of those passive methods that construct a fingerprint database as a prior, our model applies artificial neural network with deep learning to train network classifiers and then gives out estimations. Then we propose a novel tensor-based super-resolution scheme using the generative adversarial network (GAN) that adopts sparse coding as the generator network and a residual learning network as the discriminator. Further, we analyze the performance of tensor-GAN and implement a trace-based localization experiment, which achieves better performance. Compared to existing methods for smartphones indoor positioning, that are energy-consuming and high demands on devices, TGAN can give out an improved solution in localization accuracy, response time and implementation complexity.