Semi-supervised learning (SSL) uses unlabeled data during training to learn better models. Previous studies on SSL for medical image segmentation focused mostly on improving model generalization to unseen data. In some applications, however, our primary interest is not generalization but to obtain optimal predictions on a specific unlabeled database that is fully available during model development. Examples include population studies for extracting imaging phenotypes. This work investigates an often overlooked aspect of SSL, transduction. It focuses on the quality of predictions made on the unlabeled data of interest when they are included for optimization during training, rather than improving generalization. We focus on the self-training framework and explore its potential for transduction. We analyze it through the lens of Information Gain and reveal that learning benefits from the use of calibrated or under-confident models. Our extensive experiments on a large MRI database for multi-class segmentation of traumatic brain lesions shows promising results when comparing transductive with inductive predictions. We believe this study will inspire further research on transductive learning, a well-suited paradigm for medical image analysis.
Recently, needs for unmanned aerial vehicles (UAVs) that are attachable to the wall have been highlighted. As one of the ways to address the need, researches on various tilting multirotors that can increase maneuverability has been employed. Unfortunately, existing studies on the tilting multirotors require considerable amounts of prior information on the complex dynamic model. Meanwhile, reinforcement learning on quadrotors has been studied to mitigate this issue. Yet, these are only been applied to standard quadrotors, whose systems are less complex than those of tilting multirotors. In this paper, a novel reinforcement learning-based method is proposed to control a tilting multirotor on real-world applications, which is the first attempt to apply reinforcement learning to a tilting multirotor. To do so, we propose a novel reward function for a neural network model that takes power efficiency into account. The model is initially trained over a simulated environment and then fine-tuned using real-world data in order to overcome the sim-to-real gap issue. Furthermore, a novel, efficient state representation with respect to the goal frame that helps the network learn optimal policy better is proposed. As verified on real-world experiments, our proposed method shows robust controllability by overcoming the complex dynamics of tilting multirotors.
Unsupervised domain adaptation (UDA) aims to learn a well-performed model in an unlabeled target domain by leveraging labeled data from one or multiple related source domains. It remains a great challenge due to 1) the lack of annotations in the target domain and 2) the rich discrepancy between the distributions of source and target data. We propose Spectral UDA (SUDA), an efficient yet effective UDA technique that works in the spectral space and is generic across different visual recognition tasks in detection, classification and segmentation. SUDA addresses UDA challenges from two perspectives. First, it mitigates inter-domain discrepancies by a spectrum transformer (ST) that maps source and target images into spectral space and learns to enhance domain-invariant spectra while suppressing domain-variant spectra simultaneously. To this end, we design novel adversarial multi-head spectrum attention that leverages contextual information to identify domain-variant and domain-invariant spectra effectively. Second, it mitigates the lack of annotations in target domain by introducing multi-view spectral learning which aims to learn comprehensive yet confident target representations by maximizing the mutual information among multiple ST augmentations capturing different spectral views of each target sample. Extensive experiments over different visual tasks (e.g., detection, classification and segmentation) show that SUDA achieves superior accuracy and it is also complementary with state-of-the-art UDA methods with consistent performance boosts but little extra computation.
The fully-convolutional network (FCN) with an encoder-decoder architecture has been the standard paradigm for semantic segmentation. The encoder-decoder architecture utilizes an encoder to capture multi-level feature maps, which are incorporated into the final prediction by a decoder. As the context is crucial for precise segmentation, tremendous effort has been made to extract such information in an intelligent fashion, including employing dilated/atrous convolutions or inserting attention modules. However, these endeavours are all based on the FCN architecture with ResNet or other backbones, which cannot fully exploit the context from the theoretical concept. By contrast, we propose the Swin Transformer as the backbone to extract the context information and design a novel decoder of densely connected feature aggregation module (DCFAM) to restore the resolution and produce the segmentation map. The experimental results on two remotely sensed semantic segmentation datasets demonstrate the effectiveness of the proposed scheme.
The current state-of-the-art generative models for open-domain question answering (ODQA) have focused on generating direct answers from unstructured textual information. However, a large amount of world's knowledge is stored in structured databases, and need to be accessed using query languages such as SQL. Furthermore, query languages can answer questions that require complex reasoning, as well as offering full explainability. In this paper, we propose a hybrid framework that takes both textual and tabular evidence as input and generates either direct answers or SQL queries depending on which form could better answer the question. The generated SQL queries can then be executed on the associated databases to obtain the final answers. To the best of our knowledge, this is the first paper that applies Text2SQL to ODQA tasks. Empirically, we demonstrate that on several ODQA datasets, the hybrid methods consistently outperforms the baseline models that only take homogeneous input by a large margin. Specifically we achieve state-of-the-art performance on OpenSQuAD dataset using a T5-base model. In a detailed analysis, we demonstrate that the being able to generate structural SQL queries can always bring gains, especially for those questions that requires complex reasoning.
Deep learning has not been routinely employed for semantic segmentation of seabed environment for synthetic aperture sonar (SAS) imagery due to the implicit need of abundant training data such methods necessitate. Abundant training data, specifically pixel-level labels for all images, is usually not available for SAS imagery due to the complex logistics (e.g., diver survey, chase boat, precision position information) needed for obtaining accurate ground-truth. Many hand-crafted feature based algorithms have been proposed to segment SAS in an unsupervised fashion. However, there is still room for improvement as the feature extraction step of these methods is fixed. In this work, we present a new iterative unsupervised algorithm for learning deep features for SAS image segmentation. Our proposed algorithm alternates between clustering superpixels and updating the parameters of a convolutional neural network (CNN) so that the feature extraction for image segmentation can be optimized. We demonstrate the efficacy of our method on a realistic benchmark dataset. Our results show that the performance of our proposed method is considerably better than current state-of-the-art methods in SAS image segmentation.
Machine learning (ML) used for static portable executable (PE) malware detection typically employs per-file numerical feature vector representations as input with one or more target labels during training. However, there is much orthogonal information that can be gleaned from the \textit{context} in which the file was seen. In this paper, we propose utilizing a static source of contextual information -- the path of the PE file -- as an auxiliary input to the classifier. While file paths are not malicious or benign in and of themselves, they do provide valuable context for a malicious/benign determination. Unlike dynamic contextual information, file paths are available with little overhead and can seamlessly be integrated into a multi-view static ML detector, yielding higher detection rates at very high throughput with minimal infrastructural changes. Here we propose a multi-view neural network, which takes feature vectors from PE file content as well as corresponding file paths as inputs and outputs a detection score. To ensure realistic evaluation, we use a dataset of approximately 10 million samples -- files and file paths from user endpoints of an actual security vendor network. We then conduct an interpretability analysis via LIME modeling to ensure that our classifier has learned a sensible representation and see which parts of the file path most contributed to change in the classifier's score. We find that our model learns useful aspects of the file path for classification, while also learning artifacts from customers testing the vendor's product, e.g., by downloading a directory of malware samples each named as their hash. We prune these artifacts from our test dataset and demonstrate reductions in false negative rate of 32.3% at a $10^{-3}$ false positive rate (FPR) and 33.1% at $10^{-4}$ FPR, over a similar topology single input PE file content only model.
Extracting patterns and useful information from Natural Language datasets is a challenging task, especially when dealing with data written in a language different from English, like Italian. Machine and Deep Learning, together with Natural Language Processing (NLP) techniques have widely spread and improved lately, providing a plethora of useful methods to address both Supervised and Unsupervised problems on textual information. We propose RECKONition, a NLP-based system for Industrial Accidents at Work Prevention. RECKONition, which is meant to provide Natural Language Understanding, Clustering and Inference, is the result of a joint partnership with the Italian National Institute for Insurance against Accidents at Work (INAIL). The obtained results showed the ability to process textual data written in Italian describing industrial accidents dynamics and consequences.
mmWave radar has been shown as an effective sensing technique in low visibility, smoke, dusty, and dense fog environment. However tapping the potential of radar sensing to reconstruct 3D object shapes remains a great challenge, due to the characteristics of radar data such as sparsity, low resolution, specularity, high noise, and multi-path induced shadow reflections and artifacts. In this paper we propose 3D Reconstruction and Imaging via mmWave Radar (3DRIMR), a deep learning based architecture that reconstructs 3D shape of an object in dense detailed point cloud format, based on sparse raw mmWave radar intensity data. The architecture consists of two back-to-back conditional GAN deep neural networks: the first generator network generates 2D depth images based on raw radar intensity data, and the second generator network outputs 3D point clouds based on the results of the first generator. The architecture exploits both convolutional neural network's convolutional operation (that extracts local structure neighborhood information) and the efficiency and detailed geometry capture capability of point clouds (other than costly voxelization of 3D space or distance fields). Our experiments have demonstrated 3DRIMR's effectiveness in reconstructing 3D objects, and its performance improvement over standard techniques.
Electromagnetic medical imaging in the microwave regime is a hard problem notorious for 1) instability 2) under-determinism. This two-pronged problem is tackled with a two-pronged solution that uses double compression to maximally utilizing the cheap unlabelled data to a) provide a priori information required to ease under-determinism and b) reduce sensitivity of inference to the input. The result is a stable solver with a high resolution output. DeepHead is a fully data-driven implementation of the paradigm proposed in the context of microwave brain imaging. It infers the dielectric distribution of the brain at a desired single frequency while making use of an input that spreads over a wide band of frequencies. The performance of the model is evaluated with both simulations and human volunteers experiments. The inference made is juxtaposed with ground-truth dielectric distribution in simulation case, and the golden MRI / CT imaging modalities of the volunteers in real-world case.