Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Deep Learning for UAV-based Object Detection and Tracking: A Survey

Oct 25, 2021
Xin Wu, Wei Li, Danfeng Hong, Ran Tao, Qian Du

Owing to effective and flexible data acquisition, unmanned aerial vehicle (UAV) has recently become a hotspot across the fields of computer vision (CV) and remote sensing (RS). Inspired by recent success of deep learning (DL), many advanced object detection and tracking approaches have been widely applied to various UAV-related tasks, such as environmental monitoring, precision agriculture, traffic management. This paper provides a comprehensive survey on the research progress and prospects of DL-based UAV object detection and tracking methods. More specifically, we first outline the challenges, statistics of existing methods, and provide solutions from the perspectives of DL-based models in three research topics: object detection from the image, object detection from the video, and object tracking from the video. Open datasets related to UAV-dominated object detection and tracking are exhausted, and four benchmark datasets are employed for performance evaluation using some state-of-the-art methods. Finally, prospects and considerations for the future work are discussed and summarized. It is expected that this survey can facilitate those researchers who come from remote sensing field with an overview of DL-based UAV object detection and tracking methods, along with some thoughts on their further developments.

* IEEE Geoscience and Remote Sensing Magazine,2021 

  Access Paper or Ask Questions

Enhancing Visual Dialog Questioner with Entity-based Strategy Learning and Augmented Guesser

Sep 06, 2021
Duo Zheng, Zipeng Xu, Fandong Meng, Xiaojie Wang, Jiaan Wang, Jie Zhou

Considering the importance of building a good Visual Dialog (VD) Questioner, many researchers study the topic under a Q-Bot-A-Bot image-guessing game setting, where the Questioner needs to raise a series of questions to collect information of an undisclosed image. Despite progress has been made in Supervised Learning (SL) and Reinforcement Learning (RL), issues still exist. Firstly, previous methods do not provide explicit and effective guidance for Questioner to generate visually related and informative questions. Secondly, the effect of RL is hampered by an incompetent component, i.e., the Guesser, who makes image predictions based on the generated dialogs and assigns rewards accordingly. To enhance VD Questioner: 1) we propose a Related entity enhanced Questioner (ReeQ) that generates questions under the guidance of related entities and learns entity-based questioning strategy from human dialogs; 2) we propose an Augmented Guesser (AugG) that is strong and is optimized for the VD setting especially. Experimental results on the VisDial v1.0 dataset show that our approach achieves state-of-theart performance on both image-guessing task and question diversity. Human study further proves that our model generates more visually related, informative and coherent questions.

* Accepted by Findings of EMNLP 2021. Code is available at: https://github.com/zd11024/Entity_Questioner 

  Access Paper or Ask Questions

A Comparison of Modern General-Purpose Visual SLAM Approaches

Aug 05, 2021
Alexey Merzlyakov, Steve Macenski

Advancing maturity in mobile and legged robotics technologies is changing the landscapes where robots are being deployed and found. This innovation calls for a transformation in simultaneous localization and mapping (SLAM) systems to support this new generation of service and consumer robots. No longer can traditionally robust 2D lidar systems dominate while robots are being deployed in multi-story indoor, outdoor unstructured, and urban domains with increasingly inexpensive stereo and RGB-D cameras. Visual SLAM (VSLAM) systems have been a topic of study for decades and a small number of openly available implementations have stood out: ORB-SLAM3, OpenVSLAM and RTABMap. This paper presents a comparison of these 3 modern, feature rich, and uniquely robust VSLAM techniques that have yet to be benchmarked against each other, using several different datasets spanning multiple domains negotiated by service robots. ORB-SLAM3 and OpenVSLAM each were not compared against at least one of these datasets previously in literature and we provide insight through this lens. This analysis is motivated to find general purpose, feature complete, and multi-domain VSLAM options to support a broad class of robot applications for integration into the new and improved ROS 2 Nav2 System as suitable alternatives to traditional 2D lidar solutions.

* 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 
* IROS 2021 

  Access Paper or Ask Questions

MRI-based Alzheimer's disease prediction via distilling the knowledge in multi-modal data

Apr 08, 2021
Hao Guan, Chaoyue Wang, Dacheng Tao

Mild cognitive impairment (MCI) conversion prediction, i.e., identifying MCI patients of high risks converting to Alzheimer's disease (AD), is essential for preventing or slowing the progression of AD. Although previous studies have shown that the fusion of multi-modal data can effectively improve the prediction accuracy, their applications are largely restricted by the limited availability or high cost of multi-modal data. Building an effective prediction model using only magnetic resonance imaging (MRI) remains a challenging research topic. In this work, we propose a multi-modal multi-instance distillation scheme, which aims to distill the knowledge learned from multi-modal data to an MRI-based network for MCI conversion prediction. In contrast to existing distillation algorithms, the proposed multi-instance probabilities demonstrate a superior capability of representing the complicated atrophy distributions, and can guide the MRI-based network to better explore the input MRI. To our best knowledge, this is the first study that attempts to improve an MRI-based prediction model by leveraging extra supervision distilled from multi-modal information. Experiments demonstrate the advantage of our framework, suggesting its potentials in the data-limited clinical settings.

* 19 pages, 13 figures 

  Access Paper or Ask Questions

Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud

Mar 30, 2021
Mingtao Feng, Zhen Li, Qi Li, Liang Zhang, XiangDong Zhang, Guangming Zhu, Hui Zhang, Yaonan Wang, Ajmal Mian

3D object grounding aims to locate the most relevant target object in a raw point cloud scene based on a free-form language description. Understanding complex and diverse descriptions, and lifting them directly to a point cloud is a new and challenging topic due to the irregular and sparse nature of point clouds. There are three main challenges in 3D object grounding: to find the main focus in the complex and diverse description; to understand the point cloud scene; and to locate the target object. In this paper, we address all three challenges. Firstly, we propose a language scene graph module to capture the rich structure and long-distance phrase correlations. Secondly, we introduce a multi-level 3D proposal relation graph module to extract the object-object and object-scene co-occurrence relationships, and strengthen the visual features of the initial proposals. Lastly, we develop a description guided 3D visual graph module to encode global contexts of phrases and proposals by a nodes matching strategy. Extensive experiments on challenging benchmark datasets (ScanRefer and Nr3D) show that our algorithm outperforms existing state-of-the-art. Our code is available at https://github.com/PNXD/FFL-3DOG.


  Access Paper or Ask Questions

Utilizing Import Vector Machines to Identify Dangerous Pro-active Traffic Conditions

Jan 19, 2021
Kui Yang, Wenjing Zhao, Constantinos Antoniou

Traffic accidents have been a severe issue in metropolises with the development of traffic flow. This paper explores the theory and application of a recently developed machine learning technique, namely Import Vector Machines (IVMs), in real-time crash risk analysis, which is a hot topic to reduce traffic accidents. Historical crash data and corresponding traffic data from Shanghai Urban Expressway System were employed and matched. Traffic conditions are labelled as dangerous (i.e. probably leading to a crash) and safe (i.e. a normal traffic condition) based on 5-minute measurements of average speed, volume and occupancy. The IVM algorithm is trained to build the classifier and its performance is compared to the popular and successfully applied technique of Support Vector Machines (SVMs). The main findings indicate that IVMs could successfully be employed in real-time identification of dangerous pro-active traffic conditions. Furthermore, similar to the "support points" of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM, and its classification rates are similar to those of SVMs. This gives the IVM a computational advantage over the SVM, especially when the size of the training data set is large.

* In 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) (pp. 1-6). IEEE 
* 6 pages, 3 figures, 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) 

  Access Paper or Ask Questions

Formalising Concepts as Grounded Abstractions

Jan 13, 2021
Stephen Clark, Alexander Lerchner, Tamara von Glehn, Olivier Tieleman, Richard Tanburn, Misha Dashevskiy, Matko Bosnjak

The notion of concept has been studied for centuries, by philosophers, linguists, cognitive scientists, and researchers in artificial intelligence (Margolis & Laurence, 1999). There is a large literature on formal, mathematical models of concepts, including a whole sub-field of AI -- Formal Concept Analysis -- devoted to this topic (Ganter & Obiedkov, 2016). Recently, researchers in machine learning have begun to investigate how methods from representation learning can be used to induce concepts from raw perceptual data (Higgins, Sonnerat, et al., 2018). The goal of this report is to provide a formal account of concepts which is compatible with this latest work in deep learning. The main technical goal of this report is to show how techniques from representation learning can be married with a lattice-theoretic formulation of conceptual spaces. The mathematics of partial orders and lattices is a standard tool for modelling conceptual spaces (Ch.2, Mitchell (1997), Ganter and Obiedkov (2016)); however, there is no formal work that we are aware of which defines a conceptual lattice on top of a representation that is induced using unsupervised deep learning (Goodfellow et al., 2016). The advantages of partially-ordered lattice structures are that these provide natural mechanisms for use in concept discovery algorithms, through the meets and joins of the lattice.


  Access Paper or Ask Questions

Learning Better Sentence Representation with Syntax Information

Jan 09, 2021
Chen Yang

Sentence semantic understanding is a key topic in the field of natural language processing. Recently, contextualized word representations derived from pre-trained language models such as ELMO and BERT have shown significant improvements for a wide range of semantic tasks, e.g. question answering, text classification and sentiment analysis. However, how to add external knowledge to further improve the semantic modeling capability of model is worth probing. In this paper, we propose a novel approach to combining syntax information with a pre-trained language model. In order to evaluate the effect of the pre-training model, first, we introduce RNN-based and Transformer-based pre-trained language models; secondly, to better integrate external knowledge, such as syntactic information integrate with the pre-training model, we propose a dependency syntax expansion (DSE) model. For evaluation, we have selected two subtasks: sentence completion task and biological relation extraction task. The experimental results show that our model achieves 91.2\% accuracy, outperforming the baseline model by 37.8\% on sentence completion task. And it also gets competitive performance by 75.1\% $F_{1}$ score on relation extraction task.


  Access Paper or Ask Questions

Enhanced Pub/Sub Communications for Massive IoT Traffic with SARSA Reinforcement Learning

Jan 03, 2021
Carlos E. Arruda, Pedro F. Moraes, Nazim Agoulmine, Joberto S. B. Martins

Sensors are being extensively deployed and are expected to expand at significant rates in the coming years. They typically generate a large volume of data on the internet of things (IoT) application areas like smart cities, intelligent traffic systems, smart grid, and e-health. Cloud, edge and fog computing are potential and competitive strategies for collecting, processing, and distributing IoT data. However, cloud, edge, and fog-based solutions need to tackle the distribution of a high volume of IoT data efficiently through constrained and limited resource network infrastructures. This paper addresses the issue of conveying a massive volume of IoT data through a network with limited communications resources (bandwidth) using a cognitive communications resource allocation based on Reinforcement Learning (RL) with SARSA algorithm. The proposed network infrastructure (PSIoTRL) uses a Publish/ Subscribe architecture to access massive and highly distributed IoT data. It is demonstrated that the PSIoTRL bandwidth allocation for buffer flushing based on SARSA enhances the IoT aggregator buffer occupation and network link utilization. The PSIoTRL dynamically adapts the IoT aggregator traffic flushing according to the Pub/Sub topic's priority and network constraint requirements.

* 3rd International Conference on Machine Learning for Networking - MLN 2020, Paris, 20 pages, 8 figures 

  Access Paper or Ask Questions

<<
482
483
484
485
486
487
488
489
490
491
492
493
494
>>