Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

3D-ANAS v2: Grafting Transformer Module on Automatically Designed ConvNet for Hyperspectral Image Classification

Oct 21, 2021
Xizhe Xue, Haokui Zhang, Zongwen Bai, Ying Li

Hyperspectral image (HSI) classification has been a hot topic for decides, as Hyperspectral image has rich spatial and spectral information, providing strong basis for distinguishing different land-cover objects. Benefiting from the development of deep learning technologies, deep learning based HSI classification methods have achieved promising performance. Recently, several neural architecture search (NAS) algorithms are proposed for HSI classification, which further improve the accuracy of HSI classification to a new level. In this paper, we revisit the search space designed in previous HSI classification NAS methods and propose a novel hybrid search space, where 3D convolution, 2D spatial convolution and 2D spectral convolution are employed. Compared search space proposed in previous works, the serach space proposed in this paper is more aligned with characteristic of HSI data that is HSIs have a relatively low spatial resolution and an extremely high spectral resolution. In addition, to further improve the classification accuracy, we attempt to graft the emerging transformer module on the automatically designed ConvNet to adding global information to local region focused features learned by ConvNet. We carry out comparison experiments on three public HSI datasets which have different spectral characteristics to evaluate the proposed method. Experimental results show that the proposed method achieves much better performance than comparison approaches, and both adopting the proposed hybrid search space and grafting transformer module improves classification accuracy. Especially on the most recently captured dataset Houston University, overall accuracy is improved by up to nearly 6 percentage points. Code will be available at:

* 15 pages, 10 figures 
Access Paper or Ask Questions

PETGEN: Personalized Text Generation Attack on Deep Sequence Embedding-based Classification Models

Sep 14, 2021
Bing He, Mustaque Ahamad, Srijan Kumar

\textit{What should a malicious user write next to fool a detection model?} Identifying malicious users is critical to ensure the safety and integrity of internet platforms. Several deep learning based detection models have been created. However, malicious users can evade deep detection models by manipulating their behavior, rendering these models of little use. The vulnerability of such deep detection models against adversarial attacks is unknown. Here we create a novel adversarial attack model against deep user sequence embedding-based classification models, which use the sequence of user posts to generate user embeddings and detect malicious users. In the attack, the adversary generates a new post to fool the classifier. We propose a novel end-to-end Personalized Text Generation Attack model, called \texttt{PETGEN}, that simultaneously reduces the efficacy of the detection model and generates posts that have several key desirable properties. Specifically, \texttt{PETGEN} generates posts that are personalized to the user's writing style, have knowledge about a given target context, are aware of the user's historical posts on the target context, and encapsulate the user's recent topical interests. We conduct extensive experiments on two real-world datasets (Yelp and Wikipedia, both with ground-truth of malicious users) to show that \texttt{PETGEN} significantly reduces the performance of popular deep user sequence embedding-based classification models. \texttt{PETGEN} outperforms five attack baselines in terms of text quality and attack efficacy in both white-box and black-box classifier settings. Overall, this work paves the path towards the next generation of adversary-aware sequence classification models.

* Accepted for publication at: 2021 ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'2021). Code and data at: 
Access Paper or Ask Questions

Low-Rank Autoregressive Tensor Completion for Spatiotemporal Traffic Data Imputation

Apr 30, 2021
Xinyu Chen, Mengying Lei, Nicolas Saunier, Lijun Sun

Spatiotemporal traffic time series (e.g., traffic volume/speed) collected from sensing systems are often incomplete with considerable corruption and large amounts of missing values, preventing users from harnessing the full power of the data. Missing data imputation has been a long-standing research topic and critical application for real-world intelligent transportation systems. A widely applied imputation method is low-rank matrix/tensor completion; however, the low-rank assumption only preserves the global structure while ignores the strong local consistency in spatiotemporal data. In this paper, we propose a low-rank autoregressive tensor completion (LATC) framework by introducing \textit{temporal variation} as a new regularization term into the completion of a third-order (sensor $\times$ time of day $\times$ day) tensor. The third-order tensor structure allows us to better capture the global consistency of traffic data, such as the inherent seasonality and day-to-day similarity. To achieve local consistency, we design the temporal variation by imposing an AR($p$) model for each time series with coefficients as learnable parameters. Different from previous spatial and temporal regularization schemes, the minimization of temporal variation can better characterize temporal generative mechanisms beyond local smoothness, allowing us to deal with more challenging scenarios such "blackout" missing. To solve the optimization problem in LATC, we introduce an alternating minimization scheme that estimates the low-rank tensor and autoregressive coefficients iteratively. We conduct extensive numerical experiments on several real-world traffic data sets, and our results demonstrate the effectiveness of LATC in diverse missing scenarios.

Access Paper or Ask Questions

Robofleet: Secure Open Source Communication and Management for Fleets of Autonomous Robots

Mar 11, 2021
Kavan Singh Sikand, Logan Zartman, Sadegh Rabiee, Joydeep Biswas

Safe long-term deployment of a fleet of mobile robots requires reliable and secure two-way communication channels between individual robots and remote human operators for supervision and tasking. Existing open-source solutions to this problem degrade in performance in challenging real-world situations such as intermittent and low-bandwidth connectivity, do not provide security control options, and can be computationally expensive on hardware-constrained mobile robot platforms. In this paper, we present Robofleet, a lightweight open-source system which provides inter-robot communication, remote monitoring, and remote tasking for a fleet of ROS-enabled service-mobile robots that is designed with the practical goals of resilience to network variance and security control in mind. Robofleet supports multi-user, multi-robot communication via a central server. This architecture deduplicates network traffic between robots, significantly reducing overall network load when compared with native ROS communication. This server also functions as a single entrypoint into the system, enabling security control and user authentication. Individual robots run the lightweight Robofleet client, which is responsible for exchanging messages with the Robofleet server. It automatically adapts to adverse network conditions through backpressure monitoring as well as topic-level priority control, ensuring that safety-critical messages are successfully transmitted. Finally, the system includes a web-based visualization tool that can be run on any internet-connected, browser-enabled device to monitor and control the fleet. We compare Robofleet to existing methods of robotic communication, and demonstrate that it provides superior resilience to network variance while maintaining performance that exceeds that of widely-used systems.

* 7 pages, 7 figures 
Access Paper or Ask Questions

Motor-Imagery-Based Brain Computer Interface using Signal Derivation and Aggregation Functions

Jan 18, 2021
Javier Fumanal-Idocin, Yu-Kai Wang, Chin-Teng Lin, Javier Fernández, Jose Antonio Sanz, Humberto Bustince

Brain Computer Interface technologies are popular methods of communication between the human brain and external devices. One of the most popular approaches to BCI is Motor Imagery. In BCI applications, the ElectroEncephaloGraphy is a very popular measurement for brain dynamics because of its non-invasive nature. Although there is a high interest in the BCI topic, the performance of existing systems is still far from ideal, due to the difficulty of performing pattern recognition tasks in EEG signals. BCI systems are composed of a wide range of components that perform signal pre-processing, feature extraction and decision making. In this paper, we define a BCI Framework, named Enhanced Fusion Framework, where we propose three different ideas to improve the existing MI-based BCI frameworks. Firstly, we include aan additional pre-processing step of the signal: a differentiation of the EEG signal that makes it time-invariant. Secondly, we add an additional frequency band as feature for the system and we show its effect on the performance of the system. Finally, we make a profound study of how to make the final decision in the system. We propose the usage of both up to six types of different classifiers and a wide range of aggregation functions (including classical aggregations, Choquet and Sugeno integrals and their extensions and overlap functions) to fuse the information given by the considered classifiers. We have tested this new system on a dataset of 20 volunteers performing motor imagery-based brain-computer interface experiments. On this dataset, the new system achieved a 88.80% of accuracy. We also propose an optimized version of our system that is able to obtain up to 90,76%. Furthermore, we find that the pair Choquet/Sugeno integrals and overlap functions are the ones providing the best results.

Access Paper or Ask Questions

FireCommander: An Interactive, Probabilistic Multi-agent Environment for Joint Perception-Action Tasks

Oct 31, 2020
Esmaeil Seraj, Xiyang Wu, Matthew Gombolay

The purpose of this tutorial is to help individuals use the \underline{FireCommander} game environment for research applications. The FireCommander is an interactive, probabilistic joint perception-action reconnaissance environment in which a composite team of agents (e.g., robots) cooperate to fight dynamic, propagating firespots (e.g., targets). In FireCommander game, a team of agents must be tasked to optimally deal with a wildfire situation in an environment with propagating fire areas and some facilities such as houses, hospitals, power stations, etc. The team of agents can accomplish their mission by first sensing (e.g., estimating fire states), communicating the sensed fire-information among each other and then taking action to put the firespots out based on the sensed information (e.g., dropping water on estimated fire locations). The FireCommander environment can be useful for research topics spanning a wide range of applications from Reinforcement Learning (RL) and Learning from Demonstration (LfD), to Coordination, Psychology, Human-Robot Interaction (HRI) and Teaming. There are four important facets of the FireCommander environment that overall, create a non-trivial game: (1) Complex Objectives: Multi-objective Stochastic Environment, (2)Probabilistic Environment: Agents' actions result in probabilistic performance, (3) Hidden Targets: Partially Observable Environment and, (4) Uni-task Robots: Perception-only and Action-only agents. The FireCommander environment is first-of-its-kind in terms of including Perception-only and Action-only agents for coordination. It is a general multi-purpose game that can be useful in a variety of combinatorial optimization problems and stochastic games, such as applications of Reinforcement Learning (RL), Learning from Demonstration (LfD) and Inverse RL (iRL).

Access Paper or Ask Questions

An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation

Aug 21, 2020
Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Jesper Jensen

Speech enhancement and speech separation are two related tasks, whose purpose is to extract either one or more target speech signals, respectively, from a mixture of sounds generated by several sources. Traditionally, these tasks have been tackled using signal processing and machine learning techniques applied to the available acoustic signals. More recently, visual information from the target speakers, such as lip movements and facial expressions, has been introduced to speech enhancement and speech separation systems, because the visual aspect of speech is essentially unaffected by the acoustic environment. In order to efficiently fuse acoustic and visual information, researchers have exploited the flexibility of data-driven approaches, specifically deep learning, achieving state-of-the-art performance. The ceaseless proposal of a large number of techniques to extract features and fuse multimodal information has highlighted the need for an overview that comprehensively describes and discusses audio-visual speech enhancement and separation based on deep learning. In this paper, we provide a systematic survey of this research topic, focusing on the main elements that characterise the systems in the literature: visual features; acoustic features; deep learning methods; fusion techniques; training targets and objective functions. We also survey commonly employed audio-visual speech datasets, given their central role in the development of data-driven approaches, and evaluation methods, because they are generally used to compare different systems and determine their performance. In addition, we review deep-learning-based methods for speech reconstruction from silent videos and audio-visual sound source separation for non-speech signals, since these methods can be more or less directly applied to audio-visual speech enhancement and separation.

Access Paper or Ask Questions

NLPContributions: An Annotation Scheme for Machine Reading of Scholarly Contributions in Natural Language Processing Literature

Jun 23, 2020
Jennifer D'Souza, Sören Auer

We describe an annotation initiative to capture the scholarly contributions in natural language processing (NLP) articles, particularly, for the articles that discuss machine learning (ML) approaches for various information extraction tasks. We develop the annotation task based on a pilot annotation exercise on 50 NLP-ML scholarly articles presenting contributions to the five information extraction tasks 1. machine translation, 2. named entity recognition, 3. question answering, 4. relation classification, and 5. text classification. In this article, we describe the outcomes of this pilot annotation phase. Through the exercise we have obtained an annotation methodology; and found eight core information units that reflect the contribution of the NLP-ML scholarly investigations. The resulting annotation scheme we developed based on these information units is called NLPContributions. The overarching goal of our endeavor is: 1) to find a systematic set of patterns of subject-predicate-object statements for the semantic structuring of scholarly contributions that are more or less generically applicable for NLP-ML research articles; 2) to apply the discovered patterns in the creation of a larger annotated dataset for training machine readers of research contributions; 3) to ingest the dataset into the Open Research Knowledge Graph (ORKG) infrastructure as a showcase for creating user-friendly state-of-the-art overviews; 4) to integrate the machine readers into the ORKG to assist users in the manual curation of their respective article contributions. We envision that the NLPContributions methodology engenders a wider discussion on the topic toward its further refinement and development. Our pilot annotated dataset of 50 NLP-ML scholarly articles per the NLPContributions scheme is available at

* Submitted for review at 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE2020) at the ACM/IEEE Joint Conference on Digital Libraries 2020 (JCDL2020), Wuhan, China 
Access Paper or Ask Questions

A multi-task convolutional neural network for mega-city analysis using very high resolution satellite imagery and geospatial data

Feb 26, 2017
Fan Zhang, Bo Du, Liangpei Zhang

Mega-city analysis with very high resolution (VHR) satellite images has been drawing increasing interest in the fields of city planning and social investigation. It is known that accurate land-use, urban density, and population distribution information is the key to mega-city monitoring and environmental studies. Therefore, how to generate land-use, urban density, and population distribution maps at a fine scale using VHR satellite images has become a hot topic. Previous studies have focused solely on individual tasks with elaborate hand-crafted features and have ignored the relationship between different tasks. In this study, we aim to propose a universal framework which can: 1) automatically learn the internal feature representation from the raw image data; and 2) simultaneously produce fine-scale land-use, urban density, and population distribution maps. For the first target, a deep convolutional neural network (CNN) is applied to learn the hierarchical feature representation from the raw image data. For the second target, a novel CNN-based universal framework is proposed to process the VHR satellite images and generate the land-use, urban density, and population distribution maps. To the best of our knowledge, this is the first CNN-based mega-city analysis method which can process a VHR remote sensing image with such a large data volume. A VHR satellite image (1.2 m spatial resolution) of the center of Wuhan covering an area of 2606 km2 was used to evaluate the proposed method. The experimental results confirm that the proposed method can achieve a promising accuracy for land-use, urban density, and population distribution maps.

Access Paper or Ask Questions

Single Image Super Resolution via Manifold Approximation

Mar 10, 2015
Chinh Dang, Hayder Radha

Image super-resolution remains an important research topic to overcome the limitations of physical acquisition systems, and to support the development of high resolution displays. Previous example-based super-resolution approaches mainly focus on analyzing the co-occurrence properties of low resolution and high-resolution patches. Recently, we proposed a novel single image super-resolution approach based on linear manifold approximation of the high-resolution image-patch space [1]. The image super-resolution problem is then formulated as an optimization problem of searching for the best matched high resolution patch in the manifold for a given low-resolution patch. We developed a novel technique based on the l1 norm sparse graph to learn a set of low dimensional affine spaces or tangent subspaces of the high-resolution patch manifold. The optimization problem is then solved based on the learned set of tangent subspaces. In this paper, we build on our recent work as follows. First, we consider and analyze each tangent subspace as one point in a Grassmann manifold, which helps to compute geodesic pairwise distances among these tangent subspaces. Second, we develop a min-max algorithm to select an optimal subset of tangent subspaces. This optimal subset reduces the computational cost while still preserving the quality of the reconstructed high-resolution image. Third, and to further achieve lower computational complexity, we perform hierarchical clustering on the optimal subset based on Grassmann manifold distances. Finally, we analytically prove the validity of the proposed Grassmann-distance based clustering. A comparison of the obtained results with other state-of-the-art methods clearly indicates the viability of the new proposed framework.

* This paper has been withdrawn by the author due to a crucial sign error in equation 1 
Access Paper or Ask Questions