Deep Neural Networks for classification behave unpredictably when confronted with inputs not stemming from the training distribution. This motivates out-of-distribution detection (OOD) mechanisms. The usual lack of prior information on out-of-distribution data renders the performance estimation of detection approaches on unseen data difficult. Several contemporary evaluation protocols are based on open set simulations, which average the performance over up to five synthetic random splits of a dataset into in- and out-of-distribution samples. However, the number of possible splits may be much larger, and the performance of Deep Neural Networks is known to fluctuate significantly depending on different sources of random variation. We empirically demonstrate that current protocols may fail to provide reliable estimates of the expected performance of OOD methods. By casting this evaluation as a random process, we generalize the concept of open set simulations and propose to estimate the performance of OOD methods using a Monte Carlo approach that addresses the randomness.
The automatic semantic segmentation of the huge amount of acquired remote sensing data has become an important task in the last decade. Images and Point Clouds (PCs) are fundamental data representations, particularly in urban mapping applications. Textured 3D meshes integrate both data representations geometrically by wiring the PC and texturing the surface elements with available imagery. We present a mesh-centered holistic geometry-driven methodology that explicitly integrates entities of imagery, PC and mesh. Due to its integrative character, we choose the mesh as the core representation that also helps to solve the visibility problem for points in imagery. Utilizing the proposed multi-modal fusion as the backbone and considering the established entity relationships, we enable the sharing of information across the modalities imagery, PC and mesh in a two-fold manner: (i) feature transfer and (ii) label transfer. By these means, we achieve to enrich feature vectors to multi-modal feature vectors for each representation. Concurrently, we achieve to label all representations consistently while reducing the manual label effort to a single representation. Consequently, we facilitate to train machine learning algorithms and to semantically segment any of these data representations - both in a multi-modal and single-modal sense. The paper presents the association mechanism and the subsequent information transfer, which we believe are cornerstones for multi-modal scene analysis. Furthermore, we discuss the preconditions and limitations of the presented approach in detail. We demonstrate the effectiveness of our methodology on the ISPRS 3D semantic labeling contest (Vaihingen 3D) and a proprietary data set (Hessigheim 3D).
Logical reasoning is of vital importance to natural language understanding. Previous studies either employ graph-based models to incorporate prior knowledge about logical relations, or introduce symbolic logic into neural models through data augmentation. These methods, however, heavily depend on annotated training data, and thus suffer from over-fitting and poor generalization problems due to the dataset sparsity. To address these two problems, in this paper, we propose MERIt, a MEta-path guided contrastive learning method for logical ReasonIng of text, to perform self-supervised pre-training on abundant unlabeled text data. Two novel strategies serve as indispensable components of our method. In particular, a strategy based on meta-path is devised to discover the logical structure in natural texts, followed by a counterfactual data augmentation strategy to eliminate the information shortcut induced by pre-training. The experimental results on two challenging logical reasoning benchmarks, i.e., ReClor and LogiQA, demonstrate that our method outperforms the SOTA baselines with significant improvements.
Machine learning is being widely adapted in industrial applications owing to the capabilities of commercially available hardware and rapidly advancing research. Volkswagen Financial Services (VWFS), as a market leader in vehicle leasing services, aims to leverage existing proprietary data and the latest research to enhance existing and derive new business processes. The collaboration between Information Systems and Machine Learning Lab (ISMLL) and VWFS serves to realize this goal. In this paper, we propose methods in the fields of recommender systems, object detection, and forecasting that enable data-driven decisions for the vehicle life-cycle at VWFS.
We analyze a sequential decision making model in which decision makers (or, players) take their decisions based on their own private information as well as the actions of previous decision makers. Such decision making processes often lead to what is known as the \emph{information cascade} or \emph{herding} phenomenon. Specifically, a cascade develops when it seems rational for some players to abandon their own private information and imitate the actions of earlier players. The risk, however, is that if the initial decisions were wrong, then the whole cascade will be wrong. Nonetheless, information cascade are known to be fragile: there exists a sequence of \emph{revealing} probabilities $\{p_{\ell}\}_{\ell\geq1}$, such that if with probability $p_{\ell}$ player $\ell$ ignores the decisions of previous players, and rely on his private information only, then wrong cascades can be avoided. Previous related papers which study the fragility of information cascades always assume that the revealing probabilities are known to all players perfectly, which might be unrealistic in practice. Accordingly, in this paper we study a mismatch model where players believe that the revealing probabilities are $\{q_\ell\}_{\ell\in\mathbb{N}}$ when they truly are $\{p_\ell\}_{\ell\in\mathbb{N}}$, and study the effect of this mismatch on information cascades. We consider both adversarial and probabilistic sequential decision making models, and derive closed-form expressions for the optimal learning rates at which the error probability associated with a certain decision maker goes to zero. We prove several novel phase transitions in the behaviour of the asymptotic learning rate.
In recent years, several unsupervised, "contrastive" learning algorithms in vision have been shown to learn representations that perform remarkably well on transfer tasks. We show that this family of algorithms maximizes a lower bound on the mutual information between two or more "views" of an image where typical views come from a composition of image augmentations. Our bound generalizes the InfoNCE objective to support negative sampling from a restricted region of "difficult" contrasts. We find that the choice of negative samples and views are critical to the success of these algorithms. Reformulating previous learning objectives in terms of mutual information also simplifies and stabilizes them. In practice, our new objectives yield representations that outperform those learned with previous approaches for transfer to classification, bounding box detection, instance segmentation, and keypoint detection. % experiments show that choosing more difficult negative samples results in a stronger representation, outperforming those learned with IR, LA, and CMC in classification, bounding box detection, instance segmentation, and keypoint detection. The mutual information framework provides a unifying comparison of approaches to contrastive learning and uncovers the choices that impact representation learning.
Data out-of-distribution is a meta-challenge for all statistical learning algorithms that strongly rely on the i.i.d. assumption. It leads to unavoidable labor costs and confidence crises in realistic applications. For that, domain generalization aims at mining domain-irrelevant knowledge from multiple source domains that can generalize to unseen target domains with unknown distributions. In this paper, leveraging the image frequency domain, we uniquely work with two key observations: (i) the high-frequency information of images depict object edge structure, which is naturally consistent across different domains, and (ii) the low-frequency component retains object smooth structure but are much more domain-specific. Motivated by these insights, we introduce (i) an encoder-decoder structure for high-frequency and low-frequency feature disentangling, (ii) an information interaction mechanism that ensures helpful knowledge from both two parts can cooperate effectively, and (iii) a novel data augmentation technique that works on the frequency domain for encouraging robustness of the network. The proposed method obtains state-of-the-art results on three widely used domain generalization benchmarks (Digit-DG, Office-Home, and PACS).
Photoplethysmography (PPG) signal comprises physiological information related to cardiorespiratory health. However, while recording, these PPG signals are easily corrupted by motion artifacts and body movements, leading to noise enriched, poor quality signals. Therefore ensuring high-quality signals is necessary to extract cardiorespiratory information accurately. Although there exists several rule-based and Machine-Learning (ML) - based approaches for PPG signal quality estimation, those algorithms' efficacy is questionable. Thus, this work proposes a lightweight CNN architecture for signal quality assessment employing a novel Quantum pattern recognition (QPR) technique. The proposed algorithm is validated on manually annotated data obtained from the University of Queensland database. A total of 28366, 5s signal segments are preprocessed and transformed into image files of 20 x 500 pixels. The image files are treated as an input to the 2D CNN architecture. The developed model classifies the PPG signal as `good' or `bad' with an accuracy of 98.3% with 99.3% sensitivity, 94.5% specificity and 98.9% F1-score. Finally, the performance of the proposed framework is validated against the noisy `Welltory app' collected PPG database. Even in a noisy environment, the proposed architecture proved its competence. Experimental analysis concludes that a slim architecture along with a novel Spatio-temporal pattern recognition technique improve the system's performance. Hence, the proposed approach can be useful to classify good and bad PPG signals for a resource-constrained wearable implementation.
We introduce a novel representation learning method to disentangle pose-dependent as well as view-dependent factors from 2D human poses. The method trains a network using cross-view mutual information maximization (CV-MIM) which maximizes mutual information of the same pose performed from different viewpoints in a contrastive learning manner. We further propose two regularization terms to ensure disentanglement and smoothness of the learned representations. The resulting pose representations can be used for cross-view action recognition. To evaluate the power of the learned representations, in addition to the conventional fully-supervised action recognition settings, we introduce a novel task called single-shot cross-view action recognition. This task trains models with actions from only one single viewpoint while models are evaluated on poses captured from all possible viewpoints. We evaluate the learned representations on standard benchmarks for action recognition, and show that (i) CV-MIM performs competitively compared with the state-of-the-art models in the fully-supervised scenarios; (ii) CV-MIM outperforms other competing methods by a large margin in the single-shot cross-view setting; (iii) and the learned representations can significantly boost the performance when reducing the amount of supervised training data.
This paper investigates a distant proactive eavesdropping system in cooperative cognitive radio (CR) networks. Specifically, an amplify-and-forward (AF) full-duplex (FD) secondary transmitter assists to relay the received signal from suspicious users to legitimate monitor for wireless information surveillance. In return, the secondary transmitter is granted to share the spectrum belonging to the suspicious users for its own information transmission. To improve the eavesdropping, the transmitted secondary user's signal can also be used as a jamming signal to moderate the data rate of the suspicious link. We consider two cases, i.e., non-negligible processing delay (NNPD) and negligible processing delay (NPD) at secondary transmitter. Our target is to maximize network energy efficiency (NEE) via jointly optimizing the AF relay matrix and precoding vector at the secondary transmitter, as well as the receiver combining vector at monitor, subject to the maximum power constraint at the secondary transmitter and minimum data rate requirement of the secondary user. We also guarantee that the achievable data rate of the eavesdropping link should be no less than that of the suspicious link for efficient surveillance. Due to the non-convexity of the formulated NEE maximization problem, we develop an efficient path-following algorithm and a robust alternating optimization (AO) method as solutions under perfect and imperfect channel state information (CSI) conditions, respectively. We also analyze the convergence and computational complexity of the proposed schemes. Numerical results are provided to validate the effectiveness of our proposed schemes.