Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Image": models, code, and papers

Towards Better Adversarial Synthesis of Human Images from Text

Jul 05, 2021
Rania Briq, Pratika Kochar, Juergen Gall

Figure 1 for Towards Better Adversarial Synthesis of Human Images from Text

Figure 2 for Towards Better Adversarial Synthesis of Human Images from Text

Figure 3 for Towards Better Adversarial Synthesis of Human Images from Text

Figure 4 for Towards Better Adversarial Synthesis of Human Images from Text

This paper proposes an approach that generates multiple 3D human meshes from text. The human shapes are represented by 3D meshes based on the SMPL model. The model's performance is evaluated on the COCO dataset, which contains challenging human shapes and intricate interactions between individuals. The model is able to capture the dynamics of the scene and the interactions between individuals based on text. We further show how using such a shape as input to image synthesis frameworks helps to constrain the network to synthesize humans with realistic human shapes.

Via

Access Paper or Ask Questions

Instagram Filter Removal on Fashionable Images

Apr 11, 2021
Furkan Kınlı, Barış Özcan, Furkan Kıraç

Figure 1 for Instagram Filter Removal on Fashionable Images

Figure 2 for Instagram Filter Removal on Fashionable Images

Figure 3 for Instagram Filter Removal on Fashionable Images

Figure 4 for Instagram Filter Removal on Fashionable Images

Social media images are generally transformed by filtering to obtain aesthetically more pleasing appearances. However, CNNs generally fail to interpret both the image and its filtered version as the same in the visual analysis of social media images. We introduce Instagram Filter Removal Network (IFRNet) to mitigate the effects of image filters for social media analysis applications. To achieve this, we assume any filter applied to an image substantially injects a piece of additional style information to it, and we consider this problem as a reverse style transfer problem. The visual effects of filtering can be directly removed by adaptively normalizing external style information in each level of the encoder. Experiments demonstrate that IFRNet outperforms all compared methods in quantitative and qualitative comparisons, and has the ability to remove the visual effects to a great extent. Additionally, we present the filter classification performance of our proposed model, and analyze the dominant color estimation on the images unfiltered by all compared methods.

* 10 pages, 7 figures, Accepted to New Trends in Image Restoration and Enhancement workshop and challenges on image and video processing in conjunction with CVPR 2021

Via

Access Paper or Ask Questions

SegNAS3D: Network Architecture Search with Derivative-Free Global Optimization for 3D Image Segmentation

Sep 12, 2019
Ken C. L. Wong, Mehdi Moradi

Figure 1 for SegNAS3D: Network Architecture Search with Derivative-Free Global Optimization for 3D Image Segmentation

Figure 2 for SegNAS3D: Network Architecture Search with Derivative-Free Global Optimization for 3D Image Segmentation

Figure 3 for SegNAS3D: Network Architecture Search with Derivative-Free Global Optimization for 3D Image Segmentation

Figure 4 for SegNAS3D: Network Architecture Search with Derivative-Free Global Optimization for 3D Image Segmentation

Deep learning has largely reduced the need for manual feature selection in image segmentation. Nevertheless, network architecture optimization and hyperparameter tuning are mostly manual and time consuming. Although there are increasing research efforts on network architecture search in computer vision, most works concentrate on image classification but not segmentation, and there are very limited efforts on medical image segmentation especially in 3D. To remedy this, here we propose a framework, SegNAS3D, for network architecture search of 3D image segmentation. In this framework, a network architecture comprises interconnected building blocks that consist of operations such as convolution and skip connection. By representing the block structure as a learnable directed acyclic graph, hyperparameters such as the number of feature channels and the option of using deep supervision can be learned together through derivative-free global optimization. Experiments on 43 3D brain magnetic resonance images with 19 structures achieved an average Dice coefficient of 82%. Each architecture search required less than three days on three GPUs and produced architectures that were much smaller than the state-of-the-art manually created architectures.

* This paper was accepted by the International Conference on Medical Image Computing and Computer-Assisted Intervention - MICCAI 2019

Via

Access Paper or Ask Questions

Axial-to-lateral super-resolution for 3D fluorescence microscopy using unsupervised deep learning

Apr 19, 2021
Hyoungjun Park, Myeongsu Na, Bumju Kim, Soohyun Park, Ki Hean Kim, Sunghoe Chang, Jong Chul Ye

Figure 1 for Axial-to-lateral super-resolution for 3D fluorescence microscopy using unsupervised deep learning

Figure 2 for Axial-to-lateral super-resolution for 3D fluorescence microscopy using unsupervised deep learning

Figure 3 for Axial-to-lateral super-resolution for 3D fluorescence microscopy using unsupervised deep learning

Figure 4 for Axial-to-lateral super-resolution for 3D fluorescence microscopy using unsupervised deep learning

Volumetric imaging by fluorescence microscopy is often limited by anisotropic spatial resolution from inferior axial resolution compared to the lateral resolution. To address this problem, here we present a deep-learning-enabled unsupervised super-resolution technique that enhances anisotropic images in volumetric fluorescence microscopy. In contrast to the existing deep learning approaches that require matched high-resolution target volume images, our method greatly reduces the effort to put into practice as the training of a network requires as little as a single 3D image stack, without a priori knowledge of the image formation process, registration of training data, or separate acquisition of target data. This is achieved based on the optimal transport driven cycle-consistent generative adversarial network that learns from an unpaired matching between high-resolution 2D images in lateral image plane and low-resolution 2D images in the other planes. Using fluorescence confocal microscopy and light-sheet microscopy, we demonstrate that the trained network not only enhances axial resolution beyond the diffraction limit, but also enhances suppressed visual details between the imaging planes and removes imaging artifacts.

Via

Access Paper or Ask Questions

DiaRet: A browser-based application for the grading of Diabetic Retinopathy with Integrated Gradients

Apr 11, 2021
Shaswat Patel, Maithili Lohakare, Samyak Prajapati, Shaanya Singh, Nancy Patel

Figure 1 for DiaRet: A browser-based application for the grading of Diabetic Retinopathy with Integrated Gradients

Figure 2 for DiaRet: A browser-based application for the grading of Diabetic Retinopathy with Integrated Gradients

Figure 3 for DiaRet: A browser-based application for the grading of Diabetic Retinopathy with Integrated Gradients

Figure 4 for DiaRet: A browser-based application for the grading of Diabetic Retinopathy with Integrated Gradients

Patients with long-standing diabetes often fall prey to Diabetic Retinopathy (DR) resulting in changes in the retina of the human eye, which may lead to loss of vision in extreme cases. The aim of this study is two-fold: (a) create deep learning models that were trained to grade degraded retinal fundus images and (b) to create a browser-based application that will aid in diagnostic procedures by highlighting the key features of the fundus image. In this research work, we have emulated the images plagued by distortions by degrading the images based on multiple different combinations of Light Transmission Disturbance, Image Blurring and insertion of Retinal Artifacts. InceptionV3, ResNet-50 and InceptionResNetV2 were trained and used to classify retinal fundus images based on their severity level and then further used in the creation of a browser-based application, which implements the Integration Gradient (IG) Attribution Mask on the input image and demonstrates the predictions made by the model and the probability associated with each class.

* Modified abstract and replaced figures

Via

Access Paper or Ask Questions

Data, Assemble: Leveraging Multiple Datasets with Heterogeneous and Partial Labels

Sep 25, 2021
Mintong Kang, Yongyi Lu, Alan L. Yuille, Zongwei Zhou

Figure 1 for Data, Assemble: Leveraging Multiple Datasets with Heterogeneous and Partial Labels

Figure 2 for Data, Assemble: Leveraging Multiple Datasets with Heterogeneous and Partial Labels

Figure 3 for Data, Assemble: Leveraging Multiple Datasets with Heterogeneous and Partial Labels

Figure 4 for Data, Assemble: Leveraging Multiple Datasets with Heterogeneous and Partial Labels

The success of deep learning relies heavily on large datasets with extensive labels, but we often only have access to several small, heterogeneous datasets associated with partial labels, particularly in the field of medical imaging. When learning from multiple datasets, existing challenges include incomparable, heterogeneous, or even conflicting labeling protocols across datasets. In this paper, we propose a new initiative--"data, assemble"--which aims to unleash the full potential of partially labeled data and enormous unlabeled data from an assembly of datasets. To accommodate the supervised learning paradigm to partial labels, we introduce a dynamic adapter that encodes multiple visual tasks and aggregates image features in a question-and-answer manner. Furthermore, we employ pseudo-labeling and consistency constraints to harness images with missing labels and to mitigate the domain gap across datasets. From proof-of-concept studies on three natural imaging datasets and rigorous evaluations on two large-scale thorax X-ray benchmarks, we discover that learning from "negative examples" facilitates both classification and segmentation of classes of interest. This sheds new light on the computer-aided diagnosis of rare diseases and emerging pandemics, wherein "positive examples" are hard to collect, yet "negative examples" are relatively easier to assemble. As a result, besides exceeding the prior art in the NIH ChestXray benchmark, our model is particularly strong in identifying diseases of minority classes, yielding over 3-point improvement on average. Remarkably, when using existing partial labels, our model performance is on-par (p>0.05) with that using a fully curated dataset with exhaustive labels, eliminating the need for additional 40% annotation costs.

Via

Access Paper or Ask Questions

DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension

Aug 31, 2021
Roman Shapovalov, David Novotny, Benjamin Graham, Patrick Labatut, Andrea Vedaldi

Figure 1 for DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension

Figure 2 for DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension

Figure 3 for DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension

Figure 4 for DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension

We tackle the problem of monocular 3D reconstruction of articulated objects like humans and animals. We contribute DensePose 3D, a method that can learn such reconstructions in a weakly supervised fashion from 2D image annotations only. This is in stark contrast with previous deformable reconstruction methods that use parametric models such as SMPL pre-trained on a large dataset of 3D object scans. Because it does not require 3D scans, DensePose 3D can be used for learning a wide range of articulated categories such as different animal species. The method learns, in an end-to-end fashion, a soft partition of a given category-specific 3D template mesh into rigid parts together with a monocular reconstruction network that predicts the part motions such that they reproject correctly onto 2D DensePose-like surface annotations of the object. The decomposition of the object into parts is regularized by expressing part assignments as a combination of the smooth eigenfunctions of the Laplace-Beltrami operator. We show significant improvements compared to state-of-the-art non-rigid structure-from-motion baselines on both synthetic and real data on categories of humans and animals.

* Accepted for ICCV 2021

Via

Access Paper or Ask Questions

Predify: Augmenting deep neural networks with brain-inspired predictive coding dynamics

Jun 04, 2021
Bhavin Choksi, Milad Mozafari, Callum Biggs O'May, Benjamin Ador, Andrea Alamia, Rufin VanRullen

Figure 1 for Predify: Augmenting deep neural networks with brain-inspired predictive coding dynamics

Figure 2 for Predify: Augmenting deep neural networks with brain-inspired predictive coding dynamics

Figure 3 for Predify: Augmenting deep neural networks with brain-inspired predictive coding dynamics

Figure 4 for Predify: Augmenting deep neural networks with brain-inspired predictive coding dynamics

Deep neural networks excel at image classification, but their performance is far less robust to input perturbations than human perception. In this work we explore whether this shortcoming may be partly addressed by incorporating brain-inspired recurrent dynamics in deep convolutional networks. We take inspiration from a popular framework in neuroscience: 'predictive coding'. At each layer of the hierarchical model, generative feedback 'predicts' (i.e., reconstructs) the pattern of activity in the previous layer. The reconstruction errors are used to iteratively update the network's representations across timesteps, and to optimize the network's feedback weights over the natural image dataset-a form of unsupervised training. We show that implementing this strategy into two popular networks, VGG16 and EfficientNetB0, improves their robustness against various corruptions. We hypothesize that other feedforward networks could similarly benefit from the proposed framework. To promote research in this direction, we provide an open-sourced PyTorch-based package called Predify, which can be used to implement and investigate the impacts of the predictive coding dynamics in any convolutional neural network.

* Preprint under review

Via

Access Paper or Ask Questions

Morphence: Moving Target Defense Against Adversarial Examples

Aug 31, 2021
Abderrahmen Amich, Birhanu Eshete

Figure 1 for Morphence: Moving Target Defense Against Adversarial Examples

Figure 2 for Morphence: Moving Target Defense Against Adversarial Examples

Figure 3 for Morphence: Moving Target Defense Against Adversarial Examples

Figure 4 for Morphence: Moving Target Defense Against Adversarial Examples

Robustness to adversarial examples of machine learning models remains an open topic of research. Attacks often succeed by repeatedly probing a fixed target model with adversarial examples purposely crafted to fool it. In this paper, we introduce Morphence, an approach that shifts the defense landscape by making a model a moving target against adversarial examples. By regularly moving the decision function of a model, Morphence makes it significantly challenging for repeated or correlated attacks to succeed. Morphence deploys a pool of models generated from a base model in a manner that introduces sufficient randomness when it responds to prediction queries. To ensure repeated or correlated attacks fail, the deployed pool of models automatically expires after a query budget is reached and the model pool is seamlessly replaced by a new model pool generated in advance. We evaluate Morphence on two benchmark image classification datasets (MNIST and CIFAR10) against five reference attacks (2 white-box and 3 black-box). In all cases, Morphence consistently outperforms the thus-far effective defense, adversarial training, even in the face of strong white-box attacks, while preserving accuracy on clean data.

* ACSAC 2021 - Annual Computer Security Applications Conference

Via

Access Paper or Ask Questions

A Bayesian Perspective on the Deep Image Prior

Apr 16, 2019
Zezhou Cheng, Matheus Gadelha, Subhransu Maji, Daniel Sheldon

Figure 1 for A Bayesian Perspective on the Deep Image Prior

Figure 2 for A Bayesian Perspective on the Deep Image Prior

Figure 3 for A Bayesian Perspective on the Deep Image Prior

Figure 4 for A Bayesian Perspective on the Deep Image Prior

The deep image prior was recently introduced as a prior for natural images. It represents images as the output of a convolutional network with random inputs. For "inference", gradient descent is performed to adjust network parameters to make the output match observations. This approach yields good performance on a range of image reconstruction tasks. We show that the deep image prior is asymptotically equivalent to a stationary Gaussian process prior in the limit as the number of channels in each layer of the network goes to infinity, and derive the corresponding kernel. This informs a Bayesian approach to inference. We show that by conducting posterior inference using stochastic gradient Langevin we avoid the need for early stopping, which is a drawback of the current approach, and improve results for denoising and impainting tasks. We illustrate these intuitions on a number of 1D and 2D signal reconstruction tasks.

* CVPR 2019

Via

Access Paper or Ask Questions