We propose a compressive classification framework for settings where the data dimensionality is significantly higher than the sample size. The proposed method, referred to as compressive regularized discriminant analysis (CRDA) is based on linear discriminant analysis and has the ability to select significant features by using joint-sparsity promoting hard thresholding in the discriminant rule. Since the number of features is larger than the sample size, the method also uses state-of-the-art regularized sample covariance matrix estimators. Several analysis examples on real data sets, including image, speech signal and gene expression data illustrate the promising improvements offered by the proposed CRDA classifier in practise. Overall, the proposed method gives fewer misclassification errors than its competitors, while at the same time achieving accurate feature selection results. The open-source R package and MATLAB toolbox of the proposed method (named compressiveRDA) is freely available.
Nowadays, full face synthesis and partial face manipulation by virtue of the generative adversarial networks (GANs) have raised wide public concern. In the digital media forensics area, detecting and ultimately locating the image forgery have become imperative. Although many methods focus on fake detection, only a few put emphasis on the localization of the fake regions. Through analyzing the imperfection in the upsampling procedures of the GAN-based methods and recasting the fake localization problem as a modified semantic segmentation one, our proposed FakeLocator can obtain high localization accuracy, at full resolution, on manipulated facial images. To the best of our knowledge, this is the very first attempt to solve the GAN-based fake localization problem with a semantic segmentation map. As an improvement, the real-numbered segmentation map proposed by us preserves more information of fake regions. For this new type segmentation map, we also find suitable loss functions for it. Experimental results on the CelebA and FFHQ databases with seven different SOTA GAN-based face generation methods show the effectiveness of our method. Compared with the baseline, our method performs several times better on various metrics. Moreover, the proposed method is robust against various real-world facial image degradations such as JPEG compression, low-resolution, noise, and blur.
Despite remarkable advances in automated visual recognition by machines, some visual tasks remain challenging for machines. Fleuret et al. (2011) introduced the Synthetic Visual Reasoning Test (SVRT) to highlight this point, which required classification of images consisting of randomly generated shapes based on hidden abstract rules using only a few examples. Ellis et al. (2015) demonstrated that a program synthesis approach could solve some of the SVRT problems with unsupervised, few-shot learning, whereas they remained challenging for several convolutional neural networks trained with thousands of examples. Here we re-considered the human and machine experiments, because they followed different protocols and yielded different statistics. We thus proposed a quantitative reintepretation of the data between the protocols, so that we could make fair comparison between human and machine performance. We improved the program synthesis classifier by correcting the image parsings, and compared the results to the performance of other machine agents and human subjects. We grouped the SVRT problems into different types by the two aspects of the core characteristics for classification: shape specification and location relation. We found that the program synthesis classifier could not solve problems involving shape distances, because it relied on symbolic computation which scales poorly with input dimension and adding distances into such computation would increase the dimension combinatorially with the number of shapes in an image. Therefore, although the program synthesis classifier is capable of abstract reasoning, its performance is highly constrained by the accessible information in image parsings.
In this paper, triangular networks refer to feedforward neural networks with triangular block matrices as their connection weights, and they are studied for density estimation. A special two layer triangular monotonic neural network unit is designed and shown to be universal approximator for invertible mappings with triangular Jacobians based on the simple observation that positively weighted sum of monotonically increasing functions is still monotonic. Then, deep invertible neural networks consisting of stacked such monotonic triangular network units and permutations are proposed as universal density estimators. Our method is most closely related to neural autoregressive density estimations, especially the block neural autoregressive flow. But, unlike many autoregressive models, our designs are highly modular, parameter economy, computationally efficient, and applicable to density estimation of data with high dimensions. Experimental results on image density estimation benchmarks are reported for performance comparisons.
The extraction of labels from radiology text reports enables large-scale training of medical imaging models. Existing approaches to report labeling typically rely either on sophisticated feature engineering based on medical domain knowledge or manual annotations by experts. In this work, we introduce a BERT-based approach to medical image report labeling that exploits both the scale of available rule-based systems and the quality of expert annotations. We demonstrate superior performance of a biomedically pretrained BERT model first trained on annotations of a rule-based labeler and then finetuned on a small set of expert annotations augmented with automated backtranslation. We find that our final model, CheXbert, is able to outperform the previous best rules-based labeler with statistical significance, setting a new SOTA for report labeling on one of the largest datasets of chest x-rays.
We study why overparameterization -- increasing model size well beyond the point of zero training error -- can hurt test error on minority groups despite improving average test error when there are spurious correlations in the data. Through simulations and experiments on two image datasets, we identify two key properties of the training data that drive this behavior: the proportions of majority versus minority groups, and the signal-to-noise ratio of the spurious correlations. We then analyze a linear setting and show theoretically how the inductive bias of models towards "memorizing" fewer examples can cause overparameterization to hurt. Our analysis leads to a counterintuitive approach of subsampling the majority group, which empirically achieves low minority error in the overparameterized regime, even though the standard approach of upweighting the minority fails. Overall, our results suggest a tension between using overparameterized models versus using all the training data for achieving low worst-group error.
We propose a sparse reconstruction framework (aNETT) for solving inverse problems. Opposed to existing sparse reconstruction techniques that are based on linear sparsifying transforms, we train an autoencoder network $D \circ E$ with $E$ acting as a nonlinear sparsifying transform and minimize a Tikhonov functional with learned regularizer formed by the $\ell^q$-norm of the encoder coefficients and a penalty for the distance to the data manifold. We propose a strategy for training an autoencoder based on a sample set of the underlying image class such that the autoencoder is independent of the forward operator and is subsequently adapted to the specific forward model. Numerical results are presented for sparse view CT, which clearly demonstrate the feasibility, robustness and the improved generalization capability and stability of aNETT over post-processing networks.
We model local texture patterns using the co-occurrence statistics of pixel values. We then train a generative adversarial network, conditioned on co-occurrence statistics, to synthesize new textures from the co-occurrence statistics and a random noise seed. Co-occurrences have long been used to measure similarity between textures. That is, two textures are considered similar if their corresponding co-occurrence matrices are similar. By the same token, we show that multiple textures generated from the same co-occurrence matrix are similar to each other. This gives rise to a new texture synthesis algorithm. We show that co-occurrences offer a stable, intuitive and interpretable latent representation for texture synthesis. Our technique can be used to generate a smooth texture morph between two textures, by interpolating between their corresponding co-occurrence matrices. We further show an interactive texture tool that allows a user to adjust local characteristics of the synthesized texture image using the co-occurrence values directly.
Objective : Abdominal anatomy segmentation is crucial for numerous applications from computer-assisted diagnosis to image-guided surgery. In this context, we address fully-automated multi-organ segmentation from abdominal CT and MR images using deep learning. Methods: The proposed model extends standard conditional generative adversarial networks. Additionally to the discriminator which enforces the model to create realistic organ delineations, it embeds cascaded partially pre-trained convolutional encoder-decoders as generator. Encoder fine-tuning from a large amount of non-medical images alleviates data scarcity limitations. The network is trained end-to-end to benefit from simultaneous multi-level segmentation refinements using auto-context. Results : Employed for healthy liver, kidneys and spleen segmentation, our pipeline provides promising results by outperforming state-of-the-art encoder-decoder schemes. Followed for the Combined Healthy Abdominal Organ Segmentation (CHAOS) challenge organized in conjunction with the IEEE International Symposium on Biomedical Imaging 2019, it gave us the first rank for three competition categories: liver CT, liver MR and multi-organ MR segmentation. Conclusion : Combining cascaded convolutional and adversarial networks strengthens the ability of deep learning pipelines to automatically delineate multiple abdominal organs, with good generalization capability. Significance : The comprehensive evaluation provided suggests that better guidance could be achieved to help clinicians in abdominal image interpretation and clinical decision making.
In this paper, we propose a new approach to learn multimodal multilingual embeddings for matching images and their relevant captions in two languages. We combine two existing objective functions to make images and captions close in a joint embedding space while adapting the alignment of word embeddings between existing languages in our model. We show that our approach enables better generalization, achieving state-of-the-art performance in text-to-image and image-to-text retrieval task, and caption-caption similarity task. Two multimodal multilingual datasets are used for evaluation: Multi30k with German and English captions and Microsoft-COCO with English and Japanese captions.