Alert button
Picture for Joakim Lindblad

Joakim Lindblad

Alert button

Roadmap on Deep Learning for Microscopy

Mar 07, 2023
Giovanni Volpe, Carolina Wählby, Lei Tian, Michael Hecht, Artur Yakimovich, Kristina Monakhova, Laura Waller, Ivo F. Sbalzarini, Christopher A. Metzler, Mingyang Xie, Kevin Zhang, Isaac C. D. Lenton, Halina Rubinsztein-Dunlop, Daniel Brunner, Bijie Bai, Aydogan Ozcan, Daniel Midtvedt, Hao Wang, Nataša Sladoje, Joakim Lindblad, Jason T. Smith, Marien Ochoa, Margarida Barroso, Xavier Intes, Tong Qiu, Li-Yu Yu, Sixian You, Yongtao Liu, Maxim A. Ziatdinov, Sergei V. Kalinin, Arlo Sheridan, Uri Manor, Elias Nehme, Ofri Goldenberg, Yoav Shechtman, Henrik K. Moberg, Christoph Langhammer, Barbora Špačková, Saga Helgadottir, Benjamin Midtvedt, Aykut Argun, Tobias Thalheim, Frank Cichos, Stefano Bo, Lars Hubatsch, Jesus Pineda, Carlo Manzo, Harshith Bachimanchi, Erik Selander, Antoni Homs-Corbera, Martin Fränzl, Kevin de Haan, Yair Rivenson, Zofia Korczak, Caroline Beck Adiels, Mite Mijalkov, Dániel Veréb, Yu-Wei Chang, Joana B. Pereira, Damian Matuszewski, Gustaf Kylberg, Ida-Maria Sintorn, Juan C. Caicedo, Beth A Cimini, Muyinatu A. Lediju Bell, Bruno M. Saraiva, Guillaume Jacquemet, Ricardo Henriques, Wei Ouyang, Trang Le, Estibaliz Gómez-de-Mariscal, Daniel Sage, Arrate Muñoz-Barrutia, Ebba Josefson Lindqvist, Johanna Bergman

Figure 1 for Roadmap on Deep Learning for Microscopy
Figure 2 for Roadmap on Deep Learning for Microscopy
Figure 3 for Roadmap on Deep Learning for Microscopy
Figure 4 for Roadmap on Deep Learning for Microscopy

Through digital imaging, microscopy has evolved from primarily being a means for visual observation of life at the micro- and nano-scale, to a quantitative tool with ever-increasing resolution and throughput. Artificial intelligence, deep neural networks, and machine learning are all niche terms describing computational methods that have gained a pivotal role in microscopy-based research over the past decade. This Roadmap is written collectively by prominent researchers and encompasses selected aspects of how machine learning is applied to microscopy image data, with the aim of gaining scientific knowledge by improved image quality, automated detection, segmentation, classification and tracking of objects, and efficient merging of information from multiple imaging modalities. We aim to give the reader an overview of the key developments and an understanding of possibilities and limitations of machine learning for microscopy. It will be of interest to a wide cross-disciplinary audience in the physical sciences and life sciences.

Viaarxiv icon

Can representation learning for multimodal image registration be improved by supervision of intermediate layers?

Mar 01, 2023
Elisabeth Wetzer, Joakim Lindblad, Nataša Sladoje

Figure 1 for Can representation learning for multimodal image registration be improved by supervision of intermediate layers?
Figure 2 for Can representation learning for multimodal image registration be improved by supervision of intermediate layers?
Figure 3 for Can representation learning for multimodal image registration be improved by supervision of intermediate layers?
Figure 4 for Can representation learning for multimodal image registration be improved by supervision of intermediate layers?

Multimodal imaging and correlative analysis typically require image alignment. Contrastive learning can generate representations of multimodal images, reducing the challenging task of multimodal image registration to a monomodal one. Previously, additional supervision on intermediate layers in contrastive learning has improved biomedical image classification. We evaluate if a similar approach improves representations learned for registration to boost registration performance. We explore three approaches to add contrastive supervision to the latent features of the bottleneck layer in the U-Nets encoding the multimodal images and evaluate three different critic functions. Our results show that representations learned without additional supervision on latent features perform best in the downstream task of registration on two public biomedical datasets. We investigate the performance drop by exploiting recent insights in contrastive learning in classification and self-supervised learning. We visualize the spatial relations of the learned representations by means of multidimensional scaling, and show that additional supervision on the bottleneck layer can lead to partial dimensional collapse of the intermediate embedding space.

* 15 Pages + 9 Pages Appendix, 10 Figures 
Viaarxiv icon

End-to-end Multiple Instance Learning with Gradient Accumulation

Mar 08, 2022
Axel Andersson, Nadezhda Koriakina, Nataša Sladoje, Joakim Lindblad

Figure 1 for End-to-end Multiple Instance Learning with Gradient Accumulation
Figure 2 for End-to-end Multiple Instance Learning with Gradient Accumulation
Figure 3 for End-to-end Multiple Instance Learning with Gradient Accumulation

Being able to learn on weakly labeled data, and provide interpretability, are two of the main reasons why attention-based deep multiple instance learning (ABMIL) methods have become particularly popular for classification of histopathological images. Such image data usually come in the form of gigapixel-sized whole-slide-images (WSI) that are cropped into smaller patches (instances). However, the sheer size of the data makes training of ABMIL models challenging. All the instances from one WSI cannot be processed at once by conventional GPUs. Existing solutions compromise training by relying on pre-trained models, strategic sampling or selection of instances, or self-supervised learning. We propose a training strategy based on gradient accumulation that enables direct end-to-end training of ABMIL models without being limited by GPU memory. We conduct experiments on both QMNIST and Imagenette to investigate the performance and training time, and compare with the conventional memory-expensive baseline and a recent sampled-based approach. This memory-efficient approach, although slower, reaches performance indistinguishable from the memory-expensive baseline.

Viaarxiv icon

Oral cancer detection and interpretation: Deep multiple instance learning versus conventional deep single instance learning

Feb 03, 2022
Nadezhda Koriakina, Nataša Sladoje, Vladimir Bašić, Joakim Lindblad

Figure 1 for Oral cancer detection and interpretation: Deep multiple instance learning versus conventional deep single instance learning
Figure 2 for Oral cancer detection and interpretation: Deep multiple instance learning versus conventional deep single instance learning
Figure 3 for Oral cancer detection and interpretation: Deep multiple instance learning versus conventional deep single instance learning
Figure 4 for Oral cancer detection and interpretation: Deep multiple instance learning versus conventional deep single instance learning

The current medical standard for setting an oral cancer (OC) diagnosis is histological examination of a tissue sample from the oral cavity. This process is time consuming and more invasive than an alternative approach of acquiring a brush sample followed by cytological analysis. Skilled cytotechnologists are able to detect changes due to malignancy, however, to introduce this approach into clinical routine is associated with challenges such as a lack of experts and labour-intensive work. To design a trustworthy OC detection system that would assist cytotechnologists, we are interested in AI-based methods that reliably can detect cancer given only per-patient labels (minimizing annotation bias), and also provide information on which cells are most relevant for the diagnosis (enabling supervision and understanding). We, therefore, perform a comparison of a conventional single instance learning (SIL) approach and a modern multiple instance learning (MIL) method suitable for OC detection and interpretation, utilizing three different neural network architectures. To facilitate systematic evaluation of the considered approaches, we introduce a synthetic PAP-QMNIST dataset, that serves as a model of OC data, while offering access to per-instance ground truth. Our study indicates that on PAP-QMNIST, the SIL performs better, on average, than the MIL approach. Performance at the bag level on real-world cytological data is similar for both methods, yet the single instance approach performs better on average. Visual examination by cytotechnologist indicates that the methods manage to identify cells which deviate from normality, including malignant cells as well as those suspicious for dysplasia. We share the code as open source at https://github.com/MIDA-group/OralCancerMILvsSIL

Viaarxiv icon

Cross-Modality Sub-Image Retrieval using Contrastive Multimodal Image Representations

Jan 10, 2022
Eva Breznik, Elisabeth Wetzer, Joakim Lindblad, Nataša Sladoje

Figure 1 for Cross-Modality Sub-Image Retrieval using Contrastive Multimodal Image Representations
Figure 2 for Cross-Modality Sub-Image Retrieval using Contrastive Multimodal Image Representations
Figure 3 for Cross-Modality Sub-Image Retrieval using Contrastive Multimodal Image Representations
Figure 4 for Cross-Modality Sub-Image Retrieval using Contrastive Multimodal Image Representations

In tissue characterization and cancer diagnostics, multimodal imaging has emerged as a powerful technique. Thanks to computational advances, large datasets can be exploited to improve diagnosis and discover patterns in pathologies. However, this requires efficient and scalable image retrieval methods. Cross-modality image retrieval is particularly demanding, as images of the same content captured in different modalities may display little common information. We propose a content-based image retrieval system (CBIR) for reverse (sub-)image search to retrieve microscopy images in one modality given a corresponding image captured by a different modality, where images are not aligned and share only few structures. We propose to combine deep learning to generate representations which embed both modalities in a common space, with classic, fast, and robust feature extractors (SIFT, SURF) to create a bag-of-words model for efficient and reliable retrieval. Our application-independent approach shows promising results on a publicly available dataset of brightfield and second harmonic generation microscopy images. We obtain 75.4% and 83.6% top-10 retrieval success for retrieval in one or the other direction. Our proposed method significantly outperforms both direct retrieval of the original multimodal (sub-)images, as well as their corresponding generative adversarial network (GAN)-based image-to-image translations. We establish that the proposed method performs better in comparison with a recent sub-image retrieval toolkit, GAN-based image-to-image translations, and learnt feature extractors for the downstream task of cross-modal image retrieval. We highlight the shortcomings of the latter methods and observe the importance of equivariance and invariance properties of the learnt representations and feature extractors in the CBIR pipeline. Code will be available at github.com/MIDA-group.

Viaarxiv icon

Cross-Sim-NGF: FFT-Based Global Rigid Multimodal Alignment of Image Volumes using Normalized Gradient Fields

Oct 19, 2021
Johan Öfverstedt, Joakim Lindblad, Nataša Sladoje

Figure 1 for Cross-Sim-NGF: FFT-Based Global Rigid Multimodal Alignment of Image Volumes using Normalized Gradient Fields
Figure 2 for Cross-Sim-NGF: FFT-Based Global Rigid Multimodal Alignment of Image Volumes using Normalized Gradient Fields
Figure 3 for Cross-Sim-NGF: FFT-Based Global Rigid Multimodal Alignment of Image Volumes using Normalized Gradient Fields
Figure 4 for Cross-Sim-NGF: FFT-Based Global Rigid Multimodal Alignment of Image Volumes using Normalized Gradient Fields

Multimodal image alignment involves finding spatial correspondences between volumes varying in appearance and structure. Automated alignment methods are often based on local optimization that can be highly sensitive to their initialization. We propose a global optimization method for rigid multimodal 3D image alignment, based on a novel efficient algorithm for computing similarity of normalized gradient fields (NGF) in the frequency domain. We validate the method experimentally on a dataset comprised of 20 brain volumes acquired in four modalities (T1w, Flair, CT, [18F] FDG PET), synthetically displaced with known transformations. The proposed method exhibits excellent performance on all six possible modality combinations, and outperforms all four reference methods by a large margin. The method is fast; a 3.4Mvoxel global rigid alignment requires approximately 40 seconds of computation, and the proposed algorithm outperforms a direct algorithm for the same task by more than three orders of magnitude. Open-source implementation is provided.

* 5 pages, 3 figures, 3 tables. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 
Viaarxiv icon

Fast computation of mutual information in the frequency domain with applications to global multimodal image alignment

Jun 28, 2021
Johan Öfverstedt, Joakim Lindblad, Nataša Sladoje

Figure 1 for Fast computation of mutual information in the frequency domain with applications to global multimodal image alignment
Figure 2 for Fast computation of mutual information in the frequency domain with applications to global multimodal image alignment
Figure 3 for Fast computation of mutual information in the frequency domain with applications to global multimodal image alignment
Figure 4 for Fast computation of mutual information in the frequency domain with applications to global multimodal image alignment

Multimodal image alignment is the process of finding spatial correspondences between images formed by different imaging techniques or under different conditions, to facilitate heterogeneous data fusion and correlative analysis. The information-theoretic concept of mutual information (MI) is widely used as a similarity measure to guide multimodal alignment processes, where most works have focused on local maximization of MI that typically works well only for small displacements; this points to a need for global maximization of MI, which has previously been computationally infeasible due to the high run-time complexity of existing algorithms. We propose an efficient algorithm for computing MI for all discrete displacements (formalized as the cross-mutual information function (CMIF)), which is based on cross-correlation computed in the frequency domain. We show that the algorithm is equivalent to a direct method while asymptotically superior in terms of run-time. Furthermore, we propose a method for multimodal image alignment for transformation models with few degrees of freedom (e.g. rigid) based on the proposed CMIF-algorithm. We evaluate the efficacy of the proposed method on three distinct benchmark datasets, of aerial images, cytological images, and histological images, and we observe excellent success-rates (in recovering known rigid transformations), overall outperforming alternative methods, including local optimization of MI as well as several recent deep learning-based approaches. We also evaluate the run-times of a GPU implementation of the proposed algorithm and observe speed-ups from 100 to more than 10,000 times for realistic image sizes compared to a GPU implementation of a direct method. Code is shared as open-source at \url{github.com/MIDA-group/globalign}.

* 7 pages, 4 figures, 2 tables. The article is under consideration at Pattern Recognition Letters 
Viaarxiv icon

Is Image-to-Image Translation the Panacea for Multimodal Image Registration? A Comparative Study

Mar 30, 2021
Jiahao Lu, Johan Öfverstedt, Joakim Lindblad, Nataša Sladoje

Figure 1 for Is Image-to-Image Translation the Panacea for Multimodal Image Registration? A Comparative Study
Figure 2 for Is Image-to-Image Translation the Panacea for Multimodal Image Registration? A Comparative Study
Figure 3 for Is Image-to-Image Translation the Panacea for Multimodal Image Registration? A Comparative Study
Figure 4 for Is Image-to-Image Translation the Panacea for Multimodal Image Registration? A Comparative Study

Despite current advancement in the field of biomedical image processing, propelled by the deep learning revolution, multimodal image registration, due to its several challenges, is still often performed manually by specialists. The recent success of image-to-image (I2I) translation in computer vision applications and its growing use in biomedical areas provide a tempting possibility of transforming the multimodal registration problem into a, potentially easier, monomodal one. We conduct an empirical study of the applicability of modern I2I translation methods for the task of multimodal biomedical image registration. We compare the performance of four Generative Adversarial Network (GAN)-based methods and one contrastive representation learning method, subsequently combined with two representative monomodal registration methods, to judge the effectiveness of modality translation for multimodal image registration. We evaluate these method combinations on three publicly available multimodal datasets of increasing difficulty, and compare with the performance of registration by Mutual Information maximisation and one modern data-specific multimodal registration method. Our results suggest that, although I2I translation may be helpful when the modalities to register are clearly correlated, registration of modalities which express distinctly different properties of the sample are not well handled by the I2I translation approach. When less information is shared between the modalities, the I2I translation methods struggle to provide good predictions, which impairs the registration performance. The evaluated representation learning method, which aims to find an in-between representation, manages better, and so does the Mutual Information maximisation approach. We share our complete experimental setup as open-source (https://github.com/Noodles-321/Registration).

* 32 pages, 7 figures 
Viaarxiv icon

INSPIRE: Intensity and Spatial Information-Based Deformable Image Registration

Dec 14, 2020
Johan Öfverstedt, Joakim Lindblad, Nataša Sladoje

Figure 1 for INSPIRE: Intensity and Spatial Information-Based Deformable Image Registration
Figure 2 for INSPIRE: Intensity and Spatial Information-Based Deformable Image Registration
Figure 3 for INSPIRE: Intensity and Spatial Information-Based Deformable Image Registration
Figure 4 for INSPIRE: Intensity and Spatial Information-Based Deformable Image Registration

We present INSPIRE, a top-performing general-purpose method for deformable image registration. INSPIRE extends our existing symmetric registration framework based on distances combining intensity and spatial information to an elastic B-splines based transformation model. We also present several theoretical and algorithmic improvements which provide high computational efficiency and thereby applicability of the framework in a wide range of real scenarios. We show that the proposed method delivers both highly accurate as well as stable and robust registration results. We evaluate the method on a synthetic dataset created from retinal images, consisting of thin networks of vessels, where INSPIRE exhibits excellent performance, substantially outperforming the reference methods. We also evaluate the method on four benchmark datasets of 3D images of brains, for a total of 2088 pairwise registrations; a comparison with 15 other state-of-the-art methods reveals that INSPIRE provides the best overall performance. Code is available at github.com/MIDA-group/inspire.

* 13 pages, 7 figures, 3 tables 
Viaarxiv icon

CoMIR: Contrastive Multimodal Image Representation for Registration

Jun 11, 2020
Nicolas Pielawski, Elisabeth Wetzer, Johan Öfverstedt, Jiahao Lu, Carolina Wählby, Joakim Lindblad, Nataša Sladoje

Figure 1 for CoMIR: Contrastive Multimodal Image Representation for Registration
Figure 2 for CoMIR: Contrastive Multimodal Image Representation for Registration
Figure 3 for CoMIR: Contrastive Multimodal Image Representation for Registration
Figure 4 for CoMIR: Contrastive Multimodal Image Representation for Registration

We propose contrastive coding to learn shared, dense image representations, referred to as CoMIRs (Contrastive Multimodal Image Representations). CoMIRs enable the registration of multimodal images where existing registration methods often fail due to a lack of sufficiently similar image structures. CoMIRs reduce the multimodal registration problem to a monomodal one in which general intensity-based, as well as feature-based, registration algorithms can be applied. The method involves training one neural network per modality on aligned images, using a contrastive loss based on noise-contrastive estimation (InfoNCE). Unlike other contrastive coding methods, used for e.g. classification, our approach generates image-like representations that contain the information shared between modalities. We introduce a novel, hyperparameter-free modification to InfoNCE, to enforce rotational equivariance of the learnt representations, a property essential to the registration task. We assess the extent of achieved rotational equivariance and the stability of the representations with respect to weight initialization, training set, and hyperparameter settings, on a remote sensing dataset of RGB and near-infrared images. We evaluate the learnt representations through registration of a biomedical dataset of bright-field and second-harmonic generation microscopy images; two modalities with very little apparent correlation. The proposed approach based on CoMIRs significantly outperforms registration of representations created by GAN-based image-to-image translation, as well as a state-of-the-art, application-specific method which takes additional knowledge about the data into account. Code is available at: https://github.com/dqiamsdoayehccdvulyy/CoMIR.

* 21 pages, 11 figures 
Viaarxiv icon