Partial Multi-label Learning (PML) is a type of weakly supervised learning where each training instance corresponds to a set of candidate labels, among which only some are true. In this paper, we introduce \our{}, a novel probabilistic approach to this problem that extends the binary cross entropy to the PML setup. In contrast to existing methods, it does not require suboptimal disambiguation and, as such, can be applied to any deep architecture. Furthermore, experiments conducted on artificial and real-world datasets indicate that \our{} outperforms existing approaches, especially for high noise in a candidate set.
Video generation is important, especially in medicine, as much data is given in this form. However, video generation of high-resolution data is a very demanding task for generative models, due to the large need for memory. In this paper, we propose Memory Efficient Video GAN (MeVGAN) - a Generative Adversarial Network (GAN) which uses plugin-type architecture. We use a pre-trained 2D-image GAN and only add a simple neural network to construct respective trajectories in the noise space, so that the trajectory forwarded through the GAN model constructs a real-life video. We apply MeVGAN in the task of generating colonoscopy videos. Colonoscopy is an important medical procedure, especially beneficial in screening and managing colorectal cancer. However, because colonoscopy is difficult and time-consuming to learn, colonoscopy simulators are widely used in educating young colonoscopists. We show that MeVGAN can produce good quality synthetic colonoscopy videos, which can be potentially used in virtual simulators.
Prototypical parts-based networks are becoming increasingly popular due to their faithful self-explanations. However, their similarity maps are calculated in the penultimate network layer. Therefore, the receptive field of the prototype activation region often depends on parts of the image outside this region, which can lead to misleading interpretations. We name this undesired behavior a spatial explanation misalignment and introduce an interpretability benchmark with a set of dedicated metrics for quantifying this phenomenon. In addition, we propose a method for misalignment compensation and apply it to existing state-of-the-art models. We show the expressiveness of our benchmark and the effectiveness of the proposed compensation methodology through extensive empirical studies.
Multiple Instance Learning (MIL) is a weakly-supervised problem in which one label is assigned to the whole bag of instances. An important class of MIL models is instance-based, where we first classify instances and then aggregate those predictions to obtain a bag label. The most common MIL model is when we consider a bag as positive if at least one of its instances has a positive label. However, this reasoning does not hold in many real-life scenarios, where the positive bag label is often a consequence of a certain percentage of positive instances. To address this issue, we introduce a dedicated instance-based method called ProMIL, based on deep neural networks and Bernstein polynomial estimation. An important advantage of ProMIL is that it can automatically detect the optimal percentage level for decision-making. We show that ProMIL outperforms standard instance-based MIL in real-world medical applications. We make the code available.
Nowadays artificial neural network models achieve remarkable results in many disciplines. Functions mapping the representation provided by the model to the probability distribution are the inseparable aspect of deep learning solutions. Although softmax is a commonly accepted probability mapping function in the machine learning community, it cannot return sparse outputs and always spreads the positive probability to all positions. In this paper, we propose r-softmax, a modification of the softmax, outputting sparse probability distribution with controllable sparsity rate. In contrast to the existing sparse probability mapping functions, we provide an intuitive mechanism for controlling the output sparsity level. We show on several multi-label datasets that r-softmax outperforms other sparse alternatives to softmax and is highly competitive with the original softmax. We also apply r-softmax to the self-attention module of a pre-trained transformer language model and demonstrate that it leads to improved performance when fine-tuning the model on different natural language processing tasks.
We introduce ProtoSeg, a novel model for interpretable semantic image segmentation, which constructs its predictions using similar patches from the training set. To achieve accuracy comparable to baseline methods, we adapt the mechanism of prototypical parts and introduce a diversity loss function that increases the variety of prototypes within each class. We show that ProtoSeg discovers semantic concepts, in contrast to standard segmentation models. Experiments conducted on Pascal VOC and Cityscapes datasets confirm the precision and transparency of the presented method.
Partial label learning is a type of weakly supervised learning, where each training instance corresponds to a set of candidate labels, among which only one is true. In this paper, we introduce ProPaLL, a novel probabilistic approach to this problem, which has at least three advantages compared to the existing approaches: it simplifies the training process, improves performance, and can be applied to any deep architecture. Experiments conducted on artificial and real-world datasets indicate that ProPaLL outperforms the existing approaches.
Many crucial problems in deep learning and statistics are caused by a variational gap, i.e., a difference between evidence and evidence lower bound (ELBO). As a consequence, in the classical VAE model, we obtain only the lower bound on the log-likelihood since ELBO is used as a cost function, and therefore we cannot compare log-likelihood between models. In this paper, we present a general and effective upper bound of the variational gap, which allows us to efficiently estimate the true evidence. We provide an extensive theoretical study of the proposed approach. Moreover, we show that by applying our estimation, we can easily obtain lower and upper bounds for the log-likelihood of VAE models.
We introduce ProtoPool, an interpretable image classification model with a pool of prototypes shared by the classes. The training is more straightforward than in the existing methods because it does not require the pruning stage. It is obtained by introducing a fully differentiable assignment of prototypes to particular classes. Moreover, we introduce a novel focal similarity function to focus the model on the rare foreground features. We show that ProtoPool obtains state-of-the-art accuracy on the CUB-200-2011 and the Stanford Cars datasets, substantially reducing the number of prototypes. We provide a theoretical analysis of the method and a user study to show that our prototypes are more distinctive than those obtained with competitive methods.