Alert button
Picture for Iasonas Kokkinos

Iasonas Kokkinos

Alert button

Deformably-Scaled Transposed Convolution

Oct 17, 2022
Stefano B. Blumberg, Daniele Raví, Mou-Cheng Xu, Matteo Figini, Iasonas Kokkinos, Daniel C. Alexander

Figure 1 for Deformably-Scaled Transposed Convolution
Figure 2 for Deformably-Scaled Transposed Convolution
Figure 3 for Deformably-Scaled Transposed Convolution
Figure 4 for Deformably-Scaled Transposed Convolution

Transposed convolution is crucial for generating high-resolution outputs, yet has received little attention compared to convolution layers. In this work we revisit transposed convolution and introduce a novel layer that allows us to place information in the image selectively and choose the `stroke breadth' at which the image is synthesized, whilst incurring a small additional parameter cost. For this we introduce three ideas: firstly, we regress offsets to the positions where the transpose convolution results are placed; secondly we broadcast the offset weight locations over a learnable neighborhood; and thirdly we use a compact parametrization to share weights and restrict offsets. We show that simply substituting upsampling operators with our novel layer produces substantial improvements across tasks as diverse as instance segmentation, object detection, semantic segmentation, generative image modeling, and 3D magnetic resonance image enhancement, while outperforming all existing variants of transposed convolutions. Our novel layer can be used as a drop-in replacement for 2D and 3D upsampling operators and the code will be publicly available.

Viaarxiv icon

Beyond Deterministic Translation for Unsupervised Domain Adaptation

Mar 11, 2022
Eleni Chiou, Eleftheria Panagiotaki, Iasonas Kokkinos

Figure 1 for Beyond Deterministic Translation for Unsupervised Domain Adaptation
Figure 2 for Beyond Deterministic Translation for Unsupervised Domain Adaptation
Figure 3 for Beyond Deterministic Translation for Unsupervised Domain Adaptation
Figure 4 for Beyond Deterministic Translation for Unsupervised Domain Adaptation

In this work we challenge the common approach of using a one-to-one mapping ('translation') between the source and target domains in unsupervised domain adaptation (UDA). Instead, we rely on stochastic translation to capture inherent translation ambiguities. This allows us to (i) train more accurate target networks by generating multiple outputs conditioned on the same source image, leveraging both accurate translation and data augmentation for appearance variability, (ii) impute robust pseudo-labels for the target data by averaging the predictions of a source network on multiple translated versions of a single target image and (iii) train and ensemble diverse networks in the target domain by modulating the degree of stochasticity in the translations. We report improvements over strong recent baselines, leading to state-of-the-art UDA results on two challenging semantic segmentation benchmarks.

Viaarxiv icon

Unsupervised Domain Adaptation with Semantic Consistency across Heterogeneous Modalities for MRI Prostate Lesion Segmentation

Sep 19, 2021
Eleni Chiou, Francesco Giganti, Shonit Punwani, Iasonas Kokkinos, Eleftheria Panagiotaki

Figure 1 for Unsupervised Domain Adaptation with Semantic Consistency across Heterogeneous Modalities for MRI Prostate Lesion Segmentation
Figure 2 for Unsupervised Domain Adaptation with Semantic Consistency across Heterogeneous Modalities for MRI Prostate Lesion Segmentation
Figure 3 for Unsupervised Domain Adaptation with Semantic Consistency across Heterogeneous Modalities for MRI Prostate Lesion Segmentation
Figure 4 for Unsupervised Domain Adaptation with Semantic Consistency across Heterogeneous Modalities for MRI Prostate Lesion Segmentation

Any novel medical imaging modality that differs from previous protocols e.g. in the number of imaging channels, introduces a new domain that is heterogeneous from previous ones. This common medical imaging scenario is rarely considered in the domain adaptation literature, which handles shifts across domains of the same dimensionality. In our work we rely on stochastic generative modeling to translate across two heterogeneous domains at pixel space and introduce two new loss functions that promote semantic consistency. Firstly, we introduce a semantic cycle-consistency loss in the source domain to ensure that the translation preserves the semantics. Secondly, we introduce a pseudo-labelling loss, where we translate target data to source, label them by a source-domain network, and use the generated pseudo-labels to supervise the target-domain network. Our results show that this allows us to extract systematically better representations for the target domain. In particular, we address the challenge of enhancing performance on VERDICT-MRI, an advanced diffusion-weighted imaging technique, by exploiting labeled mp-MRI data. When compared to several unsupervised domain adaptation approaches, our approach yields substantial improvements, that consistently carry over to the semi-supervised and supervised learning settings.

* Accepted at MICCAI 2021 Workshop on Domain Adaptation and Representation Transfer (DART). arXiv admin note: text overlap with arXiv:2010.07411 
Viaarxiv icon

To The Point: Correspondence-driven monocular 3D category reconstruction

Jun 10, 2021
Filippos Kokkinos, Iasonas Kokkinos

Figure 1 for To The Point: Correspondence-driven monocular 3D category reconstruction
Figure 2 for To The Point: Correspondence-driven monocular 3D category reconstruction
Figure 3 for To The Point: Correspondence-driven monocular 3D category reconstruction
Figure 4 for To The Point: Correspondence-driven monocular 3D category reconstruction

We present To The Point (TTP), a method for reconstructing 3D objects from a single image using 2D to 3D correspondences learned from weak supervision. We recover a 3D shape from a 2D image by first regressing the 2D positions corresponding to the 3D template vertices and then jointly estimating a rigid camera transform and non-rigid template deformation that optimally explain the 2D positions through the 3D shape projection. By relying on 3D-2D correspondences we use a simple per-sample optimization problem to replace CNN-based regression of camera pose and non-rigid deformation and thereby obtain substantially more accurate 3D reconstructions. We treat this optimization as a differentiable layer and train the whole system in an end-to-end manner. We report systematic quantitative improvements on multiple categories and provide qualitative results comprising diverse shape, pose and texture prediction examples. Project website: https://fkokkinos.github.io/to_the_point/.

Viaarxiv icon

Learning monocular 3D reconstruction of articulated categories from motion

Apr 27, 2021
Filippos Kokkinos, Iasonas Kokkinos

Figure 1 for Learning monocular 3D reconstruction of articulated categories from motion
Figure 2 for Learning monocular 3D reconstruction of articulated categories from motion
Figure 3 for Learning monocular 3D reconstruction of articulated categories from motion
Figure 4 for Learning monocular 3D reconstruction of articulated categories from motion

Monocular 3D reconstruction of articulated object categories is challenging due to the lack of training data and the inherent ill-posedness of the problem. In this work we use video self-supervision, forcing the consistency of consecutive 3D reconstructions by a motion-based cycle loss. This largely improves both optimization-based and learning-based 3D mesh reconstruction. We further introduce an interpretable model of 3D template deformations that controls a 3D surface through the displacement of a small number of local, learnable handles. We formulate this operation as a structured layer relying on mesh-laplacian regularization and show that it can be trained in an end-to-end manner. We finally introduce a per-sample numerical optimisation approach that jointly optimises over mesh displacements and cameras within a video, boosting accuracy both for training and also as test time post-processing. While relying exclusively on a small set of videos collected per category for supervision, we obtain state-of-the-art reconstructions with diverse shapes, viewpoints and textures for multiple articulated object categories.

* Accepted to CVPR2021. For project website see https://fkokkinos.github.io/video_3d_reconstruction/ 
Viaarxiv icon

Harnessing Uncertainty in Domain Adaptation for MRI Prostate Lesion Segmentation

Oct 14, 2020
Eleni Chiou, Francesco Giganti, Shonit Punwani, Iasonas Kokkinos, Eleftheria Panagiotaki

Figure 1 for Harnessing Uncertainty in Domain Adaptation for MRI Prostate Lesion Segmentation
Figure 2 for Harnessing Uncertainty in Domain Adaptation for MRI Prostate Lesion Segmentation
Figure 3 for Harnessing Uncertainty in Domain Adaptation for MRI Prostate Lesion Segmentation
Figure 4 for Harnessing Uncertainty in Domain Adaptation for MRI Prostate Lesion Segmentation

The need for training data can impede the adoption of novel imaging modalities for learning-based medical image analysis. Domain adaptation methods partially mitigate this problem by translating training data from a related source domain to a novel target domain, but typically assume that a one-to-one translation is possible. Our work addresses the challenge of adapting to a more informative target domain where multiple target samples can emerge from a single source sample. In particular we consider translating from mp-MRI to VERDICT, a richer MRI modality involving an optimized acquisition protocol for cancer characterization. We explicitly account for the inherent uncertainty of this mapping and exploit it to generate multiple outputs conditioned on a single input. Our results show that this allows us to extract systematically better image representations for the target domain, when used in tandem with both simple, CycleGAN-based baselines, as well as more powerful approaches that integrate discriminative segmentation losses and/or residual adapters. When compared to its deterministic counterparts, our approach yields substantial improvements across a broad range of dataset sizes, increasingly strong baselines, and evaluation measures.

* Accepted at MICCAI 2020 
Viaarxiv icon

Holistic Multi-View Building Analysis in the Wild with Projection Pooling

Sep 25, 2020
Zbigniew Wojna, Krzysztof Maziarz, Łukasz Jocz, Robert Pałuba, Robert Kozikowski, Iasonas Kokkinos

Figure 1 for Holistic Multi-View Building Analysis in the Wild with Projection Pooling
Figure 2 for Holistic Multi-View Building Analysis in the Wild with Projection Pooling
Figure 3 for Holistic Multi-View Building Analysis in the Wild with Projection Pooling
Figure 4 for Holistic Multi-View Building Analysis in the Wild with Projection Pooling

We address six different classification tasks related to fine-grained building attributes: construction type, number of floors, pitch and geometry of the roof, facade material, and occupancy class. Tackling such a problem of remote building analysis became possible only recently due to growing large scale datasets of urban scenes. To this end, we introduce a new benchmarking dataset, consisting of 49426 top-view and street-view images of 9674 buildings. These photos are further assembled, together with the geometric metadata. The dataset showcases a variety of real-world challenges, such as occlusions, blur, partially visible objects, and a broad spectrum of buildings. We propose a new projection pooling layer, creating a unified, top-view representation of the top-view and the side views in a high-dimensional space. It allows us to utilize the building and imagery metadata seamlessly. Introducing this layer improves classification accuracy - compared to highly tuned baseline models - indicating its suitability for building analysis.

Viaarxiv icon