Deep-learning based classification algorithms have been shown to be susceptible to adversarial attacks: minor changes to the input of classifiers can dramatically change their outputs, while being imperceptible to humans. In this paper, we present a simple hypothesis about a feature compression property of artificial intelligence (AI) classifiers and present theoretical arguments to show that this hypothesis successfully accounts for the observed fragility of AI classifiers to small adversarial perturbations. Drawing on ideas from information and coding theory, we propose a general class of defenses for detecting classifier errors caused by abnormally small input perturbations. We further show theoretical guarantees for the performance of this detection method. We present experimental results with (a) a voice recognition system, and (b) a digit recognition system using the MNIST database, to demonstrate the effectiveness of the proposed defense methods. The ideas in this paper are motivated by a simple analogy between AI classifiers and the standard Shannon model of a communication system.
Image synthesis from corrupted contrasts increases the diversity of diagnostic information available for many neurological diseases. Recently the image-to-image translation has experienced significant levels of interest within medical research, beginning with the successful use of the Generative Adversarial Network (GAN) to the introduction of cyclic constraint extended to multiple domains. However, in current approaches, there is no guarantee that the mapping between the two image domains would be unique or one-to-one. In this paper, we introduce a novel approach to unpaired image-to-image translation based on the invertible architecture. The invertible property of the flow-based architecture assures a cycle-consistency of image-to-image translation without additional loss functions. We utilize the temporal information between consecutive slices to provide more constraints to the optimization for transforming one domain to another in unpaired volumetric medical images. To capture temporal structures in the medical images, we explore the displacement between the consecutive slices using a deformation field. In our approach, the deformation field is used as a guidance to keep the translated slides realistic and consistent across the translation. The experimental results have shown that the synthesized images using our proposed approach are able to archive a competitive performance in terms of mean squared error, peak signal-to-noise ratio, and structural similarity index when compared with the existing deep learning-based methods on three standard datasets, i.e. HCP, MRBrainS13, and Brats2019.
We consider the problem of unsupervised domain adaptation for image classification. To learn target-domain-aware features from the unlabeled data, we create a self-supervised pretext task by augmenting the unlabeled data with a certain type of transformation (specifically, image rotation) and ask the learner to predict the properties of the transformation. However, the obtained feature representation may contain a large amount of irrelevant information with respect to the main task. To provide further guidance, we force the feature representation of the augmented data to be consistent with that of the original data. Intuitively, the consistency introduces additional constraints to representation learning, therefore, the learned representation is more likely to focus on the right information about the main task. Our experimental results validate the proposed method and demonstrate state-of-the-art performance on classical domain adaptation benchmarks. Code is available at https://github.com/Jiaolong/ss-da-consistency.
Swarm dynamics is the study of collections of agents that interact with one another without central control. In natural systems, insects, birds, fish and other large mammals function in larger units to increase the overall fitness of the individuals. Their behavior is coordinated through local interactions to enhance mate selection, predator detection, migratory route identification and so forth [Andersson and Wallander 2003; Buhl et al. 2006; Nagy et al. 2010; Partridge 1982; Sumpter et al. 2008]. In artificial systems, swarms of autonomous agents can augment human activities such as search and rescue, and environmental monitoring by covering large areas with multiple nodes [Alami et al. 2007; Caruso et al. 2008; Ogren et al. 2004; Paley et al. 2007; Sibley et al. 2002]. In this paper, we explore the interplay between swarm dynamics, covert leadership and theoretical information transfer. A leader is a member of the swarm that acts upon information in addition to what is provided by local interactions. Depending upon the leadership model, leaders can use their external information either all the time or in response to local conditions [Couzin et al. 2005; Sun et al. 2013]. A covert leader is a leader that is treated no differently than others in the swarm, so leaders and followers participate equally in whatever interaction model is used [Rossi et al. 2007]. In this study, we use theoretical information transfer as a means of analyzing swarm interactions to explore whether or not it is possible to distinguish between followers and leaders based on interactions within the swarm. We find that covert leaders can be distinguished from followers in a swarm because they receive less transfer entropy than followers.
The global pandemic of the novel coronavirus disease 2019 (COVID-19) has put tremendous pressure on the medical system. Imaging plays a complementary role in the management of patients with COVID-19. Computed tomography (CT) and chest X-ray (CXR) are the two dominant screening tools. However, difficulty in eliminating the risk of disease transmission, radiation exposure and not being costeffective are some of the challenges for CT and CXR imaging. This fact induces the implementation of lung ultrasound (LUS) for evaluating COVID-19 due to its practical advantages of noninvasiveness, repeatability, and sensitive bedside property. In this paper, we utilize a deep learning model to perform the classification of COVID-19 from LUS data, which could produce objective diagnostic information for clinicians. Specifically, all LUS images are processed to obtain their corresponding local phase filtered images and radial symmetry transformed images before fed into the multi-scale residual convolutional neural network (CNN). Secondly, image combination as the input of the network is used to explore rich and reliable features. Feature fusion strategy at different levels is adopted to investigate the relationship between the depth of feature aggregation and the classification accuracy. Our proposed method is evaluated on the point-of-care US (POCUS) dataset together with the Italian COVID-19 Lung US database (ICLUS-DB) and shows promising performance for COVID-19 prediction.
Information-theoretic quantities, such as entropy, are used to quantify the amount of information a given variable provides. Entropies can be used together to compute the mutual information, which quantifies the amount of information two variables share. However, accurately estimating these quantities from data is extremely challenging. We have developed a set of computational techniques that allow one to accurately compute marginal and joint entropies. These algorithms are probabilistic in nature and thus provide information on the uncertainty in our estimates, which enable us to establish statistical significance of our findings. We demonstrate these methods by identifying relations between cloud data from the International Satellite Cloud Climatology Project (ISCCP) and data from other sources, such as equatorial pacific sea surface temperatures (SST).
We propose the Parallel Residual Bi-Fusion Feature Pyramid Network (PRB-FPN) for fast and accurate single-shot object detection. Feature Pyramid (FP) is widely used in recent visual detection, however the top-down pathway of FP cannot preserve accurate localization due to pooling shifting. The advantage of FP is weaken as deeper backbones with more layers are used. To address this issue, we propose a new parallel FP structure with bi-directional (top-down and bottom-up) fusion and associated improvements to retain high-quality features for accurate localization. Our method is particularly suitable for detecting small objects. We provide the following design improvements: (1) A parallel bifusion FP structure with a Bottom-up Fusion Module (BFM) to detect both small and large objects at once with high accuracy. (2) A COncatenation and RE-organization (CORE) module provides a bottom-up pathway for feature fusion, which leads to the bi-directional fusion FP that can recover lost information from lower-layer feature maps. (3) The CORE feature is further purified to retain richer contextual information. Such purification is performed with CORE in a few iterations in both top-down and bottom-up pathways. (4) The adding of a residual design to CORE leads to a new Re-CORE module that enables easy training and integration with a wide range of (deeper or lighter) backbones. The proposed network achieves state-of-the-art performance on UAVDT17 and MS COCO datasets.
We study the fundamental online $k$-server problem in a learning-augmented setting. While in the traditional online model, an algorithm has no information about the request sequence, we assume that there is given some advice (e.g. machine-learned predictions) on an algorithm's decision. There is, however, no guarantee on the quality of the prediction and it might be far from being correct. Our main result is a learning-augmented variation of the well-known Double Coverage algorithm for k-server on the line (Chrobak et al., SIDMA 1991) in which we integrate predictions as well as our trust into their quality. We give an error-dependent competitive ratio, which is a function of a user-defined trustiness parameter, and which interpolates smoothly between an optimal consistency, the performance in case that all predictions are correct, and the best-possible robustness regardless of the prediction quality. When given good predictions, we improve upon known lower bounds for online algorithms without advice. We further show that our algorithm achieves for any k an almost optimal consistency-robustness tradeoff, within a class of deterministic algorithms respecting local and memoryless properties. Our algorithm outperforms a previously proposed (more general) learning-augmented algorithm. It is remarkable that the previous algorithm heavily exploits memory, whereas our algorithm is memoryless. Finally, we demonstrate in experiments the practicability and the superior performance of our algorithm on real-world data.
Dense Object Nets (DONs) by Florence, Manuelli and Tedrake (2018) introduced dense object descriptors as a novel visual object representation for the robotics community. It is suitable for many applications including object grasping, policy learning, etc. DONs map an RGB image depicting an object into a descriptor space image, which implicitly encodes key features of an object invariant to the relative camera pose. Impressively, the self-supervised training of DONs can be applied to arbitrary objects and can be evaluated and deployed within hours. However, the training approach relies on accurate depth images and faces challenges with small, reflective objects, typical for industrial settings, when using consumer grade depth cameras. In this paper we show that given a 3D model of an object, we can generate its descriptor space image, which allows for supervised training of DONs. We rely on Laplacian Eigenmaps (LE) to embed the 3D model of an object into an optimally generated space. While our approach uses more domain knowledge, it can be efficiently applied even for smaller and reflective objects, as it does not rely on depth information. We compare the training methods on generating 6D grasps for industrial objects and show that our novel supervised training approach improves the pick-and-place performance in industry-relevant tasks.
The success of Convolutional Neural Networks (CNNs) in computer vision is mainly driven by their strong inductive bias, which is strong enough to allow CNNs to solve vision-related tasks with random weights, meaning without learning. Similarly, Long Short-Term Memory (LSTM) has a strong inductive bias towards storing information over time. However, many real-world systems are governed by conservation laws, which lead to the redistribution of particular quantities -- e.g. in physical and economical systems. Our novel Mass-Conserving LSTM (MC-LSTM) adheres to these conservation laws by extending the inductive bias of LSTM to model the redistribution of those stored quantities. MC-LSTMs set a new state-of-the-art for neural arithmetic units at learning arithmetic operations, such as addition tasks, which have a strong conservation law, as the sum is constant over time. Further, MC-LSTM is applied to traffic forecasting, modelling a pendulum, and a large benchmark dataset in hydrology, where it sets a new state-of-the-art for predicting peak flows. In the hydrology example, we show that MC-LSTM states correlate with real-world processes and are therefore interpretable.