Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Image To Image Translation": models, code, and papers

A Time-Frequency Generative Adversarial based method for Audio Packet Loss Concealment

Jul 28, 2023
Carlo Aironi, Samuele Cornell, Luca Serafini, Stefano Squartini

Figure 1 for A Time-Frequency Generative Adversarial based method for Audio Packet Loss Concealment

Figure 2 for A Time-Frequency Generative Adversarial based method for Audio Packet Loss Concealment

Figure 3 for A Time-Frequency Generative Adversarial based method for Audio Packet Loss Concealment

Figure 4 for A Time-Frequency Generative Adversarial based method for Audio Packet Loss Concealment

Packet loss is a major cause of voice quality degradation in VoIP transmissions with serious impact on intelligibility and user experience. This paper describes a system based on a generative adversarial approach, which aims to repair the lost fragments during the transmission of audio streams. Inspired by the powerful image-to-image translation capability of Generative Adversarial Networks (GANs), we propose bin2bin, an improved pix2pix framework to achieve the translation task from magnitude spectrograms of audio frames with lost packets, to noncorrupted speech spectrograms. In order to better maintain the structural information after spectrogram translation, this paper introduces the combination of two STFT-based loss functions, mixed with the traditional GAN objective. Furthermore, we employ a modified PatchGAN structure as discriminator and we lower the concealment time by a proper initialization of the phase reconstruction algorithm. Experimental results show that the proposed method has obvious advantages when compared with the current state-of-the-art methods, as it can better handle both high packet loss rates and large gaps.

* Accepted at EUSIPCO - 31st European Signal Processing Conference, 2023

Via

Access Paper or Ask Questions

Test-time image-to-image translation ensembling improves out-of-distribution generalization in histopathology

Jun 30, 2022
Marin Scalbert, Maria Vakalopoulou, Florent Couzinié-Devy

Figure 1 for Test-time image-to-image translation ensembling improves out-of-distribution generalization in histopathology

Figure 2 for Test-time image-to-image translation ensembling improves out-of-distribution generalization in histopathology

Figure 3 for Test-time image-to-image translation ensembling improves out-of-distribution generalization in histopathology

Figure 4 for Test-time image-to-image translation ensembling improves out-of-distribution generalization in histopathology

Histopathology whole slide images (WSIs) can reveal significant inter-hospital variability such as illumination, color or optical artifacts. These variations, caused by the use of different scanning protocols across medical centers (staining, scanner), can strongly harm algorithms generalization on unseen protocols. This motivates development of new methods to limit such drop of performances. In this paper, to enhance robustness on unseen target protocols, we propose a new test-time data augmentation based on multi domain image-to-image translation. It allows to project images from unseen protocol into each source domain before classifying them and ensembling the predictions. This test-time augmentation method results in a significant boost of performances for domain generalization. To demonstrate its effectiveness, our method has been evaluated on 2 different histopathology tasks where it outperforms conventional domain generalization, standard H&E specific color augmentation/normalization and standard test-time augmentation techniques. Our code is publicly available at https://gitlab.com/vitadx/articles/test-time-i2i-translation-ensembling.

* Accepted at MICCAI2022 Conference

Via

Access Paper or Ask Questions

Multi-View Image-to-Image Translation Supervised by 3D Pose

Apr 12, 2021
Idit Diamant, Oranit Dror, Hai Victor Habi, Arnon Netzer

Figure 1 for Multi-View Image-to-Image Translation Supervised by 3D Pose

Figure 2 for Multi-View Image-to-Image Translation Supervised by 3D Pose

Figure 3 for Multi-View Image-to-Image Translation Supervised by 3D Pose

Figure 4 for Multi-View Image-to-Image Translation Supervised by 3D Pose

We address the task of multi-view image-to-image translation for person image generation. The goal is to synthesize photo-realistic multi-view images with pose-consistency across all views. Our proposed end-to-end framework is based on a joint learning of multiple unpaired image-to-image translation models, one per camera viewpoint. The joint learning is imposed by constraints on the shared 3D human pose in order to encourage the 2D pose projections in all views to be consistent. Experimental results on the CMU-Panoptic dataset demonstrate the effectiveness of the suggested framework in generating photo-realistic images of persons with new poses that are more consistent across all views in comparison to a standard Image-to-Image baseline. The code is available at: https://github.com/sony-si/MultiView-Img2Img

* *equal contribution

Via

Access Paper or Ask Questions

Image-to-image Translation via Hierarchical Style Disentanglement

Mar 02, 2021
Xinyang Li, Shengchuan Zhang, Jie Hu, Liujuan Cao, Xiaopeng Hong, Xudong Mao, Feiyue Huang, Yongjian Wu, Rongrong Ji

Figure 1 for Image-to-image Translation via Hierarchical Style Disentanglement

Figure 2 for Image-to-image Translation via Hierarchical Style Disentanglement

Figure 3 for Image-to-image Translation via Hierarchical Style Disentanglement

Figure 4 for Image-to-image Translation via Hierarchical Style Disentanglement

Recently, image-to-image translation has made significant progress in achieving both multi-label (\ie, translation conditioned on different labels) and multi-style (\ie, generation with diverse styles) tasks. However, due to the unexplored independence and exclusiveness in the labels, existing endeavors are defeated by involving uncontrolled manipulations to the translation results. In this paper, we propose Hierarchical Style Disentanglement (HiSD) to address this issue. Specifically, we organize the labels into a hierarchical tree structure, in which independent tags, exclusive attributes, and disentangled styles are allocated from top to bottom. Correspondingly, a new translation process is designed to adapt the above structure, in which the styles are identified for controllable translations. Both qualitative and quantitative results on the CelebA-HQ dataset verify the ability of the proposed HiSD. We hope our method will serve as a solid baseline and provide fresh insights with the hierarchically organized annotations for future research in image-to-image translation. The code has been released at https://github.com/imlixinyang/HiSD.

* CVPR 2021. The code will be released at at https://github.com/imlixinyang/HiSD

Via

Access Paper or Ask Questions

ProSpire: Proactive Spatial Prediction of Radio Environment Using Deep Learning

Aug 20, 2023
Shamik Sarkar, Dongning Guo, Danijela Cabric

Figure 1 for ProSpire: Proactive Spatial Prediction of Radio Environment Using Deep Learning

Figure 2 for ProSpire: Proactive Spatial Prediction of Radio Environment Using Deep Learning

Figure 3 for ProSpire: Proactive Spatial Prediction of Radio Environment Using Deep Learning

Figure 4 for ProSpire: Proactive Spatial Prediction of Radio Environment Using Deep Learning

Spatial prediction of the radio propagation environment of a transmitter can assist and improve various aspects of wireless networks. The majority of research in this domain can be categorized as 'reactive' spatial prediction, where the predictions are made based on a small set of measurements from an active transmitter whose radio environment is to be predicted. Emerging spectrum-sharing paradigms would benefit from 'proactive' spatial prediction of the radio environment, where the spatial predictions must be done for a transmitter for which no measurement has been collected. This paper proposes a novel, supervised deep learning-based framework, ProSpire, that enables spectrum sharing by leveraging the idea of proactive spatial prediction. We carefully address several challenges in ProSpire, such as designing a framework that conveniently collects training data for learning, performing the predictions in a fast manner, enabling operations without an area map, and ensuring that the predictions do not lead to undesired interference. ProSpire relies on the crowdsourcing of transmitters and receivers during their normal operations to address some of the aforementioned challenges. The core component of ProSpire is a deep learning-based image-to-image translation method, which we call RSSu-net. We generate several diverse datasets using ray tracing software and numerically evaluate ProSpire. Our evaluations show that RSSu-net performs reasonably well in terms of signal strength prediction, 5 dB mean absolute error, which is comparable to the average error of other relevant methods. Importantly, due to the merits of RSSu-net, ProSpire creates proactive boundaries around transmitters such that they can be activated with 97% probability of not causing interference. In this regard, the performance of RSSu-net is 19% better than that of other comparable methods.

* 9 pages

Via

Access Paper or Ask Questions

Generative AI for Medical Imaging: extending the MONAI Framework

Jul 27, 2023
Walter H. L. Pinaya, Mark S. Graham, Eric Kerfoot, Petru-Daniel Tudosiu, Jessica Dafflon, Virginia Fernandez, Pedro Sanchez, Julia Wolleb, Pedro F. da Costa, Ashay Patel, Hyungjin Chung, Can Zhao, Wei Peng, Zelong Liu, Xueyan Mei, Oeslle Lucena, Jong Chul Ye, Sotirios A. Tsaftaris, Prerna Dogra, Andrew Feng, Marc Modat, Parashkev Nachev, Sebastien Ourselin, M. Jorge Cardoso

Figure 1 for Generative AI for Medical Imaging: extending the MONAI Framework

Figure 2 for Generative AI for Medical Imaging: extending the MONAI Framework

Figure 3 for Generative AI for Medical Imaging: extending the MONAI Framework

Figure 4 for Generative AI for Medical Imaging: extending the MONAI Framework

Recent advances in generative AI have brought incredible breakthroughs in several areas, including medical imaging. These generative models have tremendous potential not only to help safely share medical data via synthetic datasets but also to perform an array of diverse applications, such as anomaly detection, image-to-image translation, denoising, and MRI reconstruction. However, due to the complexity of these models, their implementation and reproducibility can be difficult. This complexity can hinder progress, act as a use barrier, and dissuade the comparison of new methods with existing works. In this study, we present MONAI Generative Models, a freely available open-source platform that allows researchers and developers to easily train, evaluate, and deploy generative models and related applications. Our platform reproduces state-of-art studies in a standardised way involving different architectures (such as diffusion models, autoregressive transformers, and GANs), and provides pre-trained models for the community. We have implemented these models in a generalisable fashion, illustrating that their results can be extended to 2D or 3D scenarios, including medical images with different modalities (like CT, MRI, and X-Ray data) and from different anatomical areas. Finally, we adopt a modular and extensible approach, ensuring long-term maintainability and the extension of current applications for future features.

Via

Access Paper or Ask Questions

Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation

Mar 12, 2022
Linfeng Zhang, Xin Chen, Xiaobing Tu, Pengfei Wan, Ning Xu, Kaisheng Ma

Figure 1 for Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation

Figure 2 for Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation

Figure 3 for Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation

Figure 4 for Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation

Remarkable achievements have been attained with Generative Adversarial Networks (GANs) in image-to-image translation. However, due to a tremendous amount of parameters, state-of-the-art GANs usually suffer from low efficiency and bulky memory usage. To tackle this challenge, firstly, this paper investigates GANs performance from a frequency perspective. The results show that GANs, especially small GANs lack the ability to generate high-quality high frequency information. To address this problem, we propose a novel knowledge distillation method referred to as wavelet knowledge distillation. Instead of directly distilling the generated images of teachers, wavelet knowledge distillation first decomposes the images into different frequency bands with discrete wavelet transformation and then only distills the high frequency bands. As a result, the student GAN can pay more attention to its learning on high frequency bands. Experiments demonstrate that our method leads to 7.08 times compression and 6.80 times acceleration on CycleGAN with almost no performance drop. Additionally, we have studied the relation between discriminators and generators which shows that the compression of discriminators can promote the performance of compressed generators.

* Accepted by CVPR2022

Via

Access Paper or Ask Questions

Conditional Invertible Neural Networks for Diverse Image-to-Image Translation

May 05, 2021
Lynton Ardizzone, Jakob Kruse, Carsten Lüth, Niels Bracher, Carsten Rother, Ullrich Köthe

Figure 1 for Conditional Invertible Neural Networks for Diverse Image-to-Image Translation

Figure 2 for Conditional Invertible Neural Networks for Diverse Image-to-Image Translation

Figure 3 for Conditional Invertible Neural Networks for Diverse Image-to-Image Translation

Figure 4 for Conditional Invertible Neural Networks for Diverse Image-to-Image Translation

We introduce a new architecture called a conditional invertible neural network (cINN), and use it to address the task of diverse image-to-image translation for natural images. This is not easily possible with existing INN models due to some fundamental limitations. The cINN combines the purely generative INN model with an unconstrained feed-forward network, which efficiently preprocesses the conditioning image into maximally informative features. All parameters of a cINN are jointly optimized with a stable, maximum likelihood-based training procedure. Even though INN-based models have received far less attention in the literature than GANs, they have been shown to have some remarkable properties absent in GANs, e.g. apparent immunity to mode collapse. We find that our cINNs leverage these properties for image-to-image translation, demonstrated on day to night translation and image colorization. Furthermore, we take advantage of our bidirectional cINN architecture to explore and manipulate emergent properties of the latent space, such as changing the image style in an intuitive way.

* arXiv admin note: text overlap with arXiv:1907.02392

Via

Access Paper or Ask Questions

Semantic Relation Preserving Knowledge Distillation for Image-to-Image Translation

May 19, 2021
Zeqi Li, Ruowei Jiang, Parham Aarabi

Figure 1 for Semantic Relation Preserving Knowledge Distillation for Image-to-Image Translation

Figure 2 for Semantic Relation Preserving Knowledge Distillation for Image-to-Image Translation

Figure 3 for Semantic Relation Preserving Knowledge Distillation for Image-to-Image Translation

Figure 4 for Semantic Relation Preserving Knowledge Distillation for Image-to-Image Translation

Generative adversarial networks (GANs) have shown significant potential in modeling high dimensional distributions of image data, especially on image-to-image translation tasks. However, due to the complexity of these tasks, state-of-the-art models often contain a tremendous amount of parameters, which results in large model size and long inference time. In this work, we propose a novel method to address this problem by applying knowledge distillation together with distillation of a semantic relation preserving matrix. This matrix, derived from the teacher's feature encoding, helps the student model learn better semantic relations. In contrast to existing compression methods designed for classification tasks, our proposed method adapts well to the image-to-image translation task on GANs. Experiments conducted on 5 different datasets and 3 different pairs of teacher and student models provide strong evidence that our methods achieve impressive results both qualitatively and quantitatively.

* Accepted to ECCV 2020

Via

Access Paper or Ask Questions