Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Image": models, code, and papers

Two-Stage Convolutional Neural Network for Breast Cancer Histology Image Classification

Apr 01, 2018
Kamyar Nazeri, Azad Aminpour, Mehran Ebrahimi

Figure 1 for Two-Stage Convolutional Neural Network for Breast Cancer Histology Image Classification

Figure 2 for Two-Stage Convolutional Neural Network for Breast Cancer Histology Image Classification

Figure 3 for Two-Stage Convolutional Neural Network for Breast Cancer Histology Image Classification

Figure 4 for Two-Stage Convolutional Neural Network for Breast Cancer Histology Image Classification

This paper explores the problem of breast tissue classification of microscopy images. Based on the predominant cancer type the goal is to classify images into four categories of normal, benign, in situ carcinoma, and invasive carcinoma. Given a suitable training dataset, we utilize deep learning techniques to address the classification problem. Due to the large size of each image in the training dataset, we propose a patch-based technique which consists of two consecutive convolutional neural networks. The first "patch-wise" network acts as an auto-encoder that extracts the most salient features of image patches while the second "image-wise" network performs classification of the whole image. The first network is pre-trained and aimed at extracting local information while the second network obtains global information of an input image. We trained the networks using the ICIAR 2018 grand challenge on BreAst Cancer Histology (BACH) dataset. The proposed method yields 95 % accuracy on the validation set compared to previously reported 77 % accuracy rates in the literature. Our code is publicly available at https://github.com/ImagingLab/ICIAR2018

* LNCS 10882 (2018) 717-726
* 10 pages, 5 figures, ICIAR 2018 conference

Via

Access Paper or Ask Questions

Multi-Stage Fusion for One-Click Segmentation

Oct 20, 2020
Soumajit Majumder, Ansh Khurana, Abhinav Rai, Angela Yao

Figure 1 for Multi-Stage Fusion for One-Click Segmentation

Figure 2 for Multi-Stage Fusion for One-Click Segmentation

Figure 3 for Multi-Stage Fusion for One-Click Segmentation

Figure 4 for Multi-Stage Fusion for One-Click Segmentation

Segmenting objects of interest in an image is an essential building block of applications such as photo-editing and image analysis. Under interactive settings, one should achieve good segmentations while minimizing user input. Current deep learning-based interactive segmentation approaches use early fusion and incorporate user cues at the image input layer. Since segmentation CNNs have many layers, early fusion may weaken the influence of user interactions on the final prediction results. As such, we propose a new multi-stage guidance framework for interactive segmentation. By incorporating user cues at different stages of the network, we allow user interactions to impact the final segmentation output in a more direct way. Our proposed framework has a negligible increase in parameter count compared to early-fusion frameworks. We perform extensive experimentation on the standard interactive instance segmentation and one-click segmentation benchmarks and report state-of-the-art performance.

* A preprint of the accepted paper at GCPR 2020

Via

Access Paper or Ask Questions

Trainable Activation Function in Image Classification

Jun 05, 2020
Zhaohe Liao

Figure 1 for Trainable Activation Function in Image Classification

Figure 2 for Trainable Activation Function in Image Classification

Figure 3 for Trainable Activation Function in Image Classification

Figure 4 for Trainable Activation Function in Image Classification

In the current research of neural networks, the activation function is manually specified by human and not able to change themselves during training. This paper focus on how to make the activation function trainable for deep neural networks. We use series and linear combination of different activation functions make activation functions continuously variable. Also, we test the performance of CNNs with Fourier series simulated activation(Fourier-CNN) and CNNs with linear combined activation function (LC-CNN) on Cifar-10 dataset. The result shows our trainable activation function reveals better performance than the most used ReLU activation function. Finally, we improves the performance of Fourier-CNN with Autoencoder, and test the performance of PSO algorithm in optimizing the parameters of networks

Via

Access Paper or Ask Questions

Frustratingly Simple Domain Generalization via Image Stylization

Jul 10, 2020
Nathan Somavarapu, Chih-Yao Ma, Zsolt Kira

Figure 1 for Frustratingly Simple Domain Generalization via Image Stylization

Figure 2 for Frustratingly Simple Domain Generalization via Image Stylization

Figure 3 for Frustratingly Simple Domain Generalization via Image Stylization

Figure 4 for Frustratingly Simple Domain Generalization via Image Stylization

Convolutional Neural Networks (CNNs) show impressive performance in the standard classification setting where training and testing data are drawn i.i.d. from a given domain. However, CNNs do not readily generalize to new domains with different statistics, a setting that is simple for humans. In this work, we address the Domain Generalization problem, where the classifier must generalize to an unknown target domain. Inspired by recent works that have shown a difference in biases between CNNs and humans, we demonstrate an extremely simple yet effective method, namely correcting this bias by augmenting the dataset with stylized images. In contrast with existing stylization works, which use external data sources such as art, we further introduce a method that is entirely in-domain using no such extra sources of data. We provide a detailed analysis as to the mechanism by which the method works, verifying our claim that it changes the shape/texture bias, and demonstrate results surpassing or comparable to the state of the arts that utilize much more complex methods.

* Code: https://github.com/GT-RIPL/DomainGeneralization-Stylization

Via

Access Paper or Ask Questions

Understanding image motion with group representations

Feb 26, 2018
Andrew Jaegle, Stephen Phillips, Daphne Ippolito, Kostas Daniilidis

Figure 1 for Understanding image motion with group representations

Figure 2 for Understanding image motion with group representations

Figure 3 for Understanding image motion with group representations

Figure 4 for Understanding image motion with group representations

Motion is an important signal for agents in dynamic environments, but learning to represent motion from unlabeled video is a difficult and underconstrained problem. We propose a model of motion based on elementary group properties of transformations and use it to train a representation of image motion. While most methods of estimating motion are based on pixel-level constraints, we use these group properties to constrain the abstract representation of motion itself. We demonstrate that a deep neural network trained using this method captures motion in both synthetic 2D sequences and real-world sequences of vehicle motion, without requiring any labels. Networks trained to respect these constraints implicitly identify the image characteristic of motion in different sequence types. In the context of vehicle motion, this method extracts information useful for localization, tracking, and odometry. Our results demonstrate that this representation is useful for learning motion in the general setting where explicit labels are difficult to obtain.

* Published as a conference paper at ICLR 2018; 14 pages, including references and supplement

Via

Access Paper or Ask Questions

Parallax Attention for Unsupervised Stereo Correspondence Learning

Sep 16, 2020
Longguang Wang, Yulan Guo, Yingqian Wang, Zhengfa Liang, Zaiping Lin, Jungang Yang, Wei An

Figure 1 for Parallax Attention for Unsupervised Stereo Correspondence Learning

Figure 2 for Parallax Attention for Unsupervised Stereo Correspondence Learning

Figure 3 for Parallax Attention for Unsupervised Stereo Correspondence Learning

Figure 4 for Parallax Attention for Unsupervised Stereo Correspondence Learning

Stereo image pairs encode 3D scene cues into stereo correspondences between the left and right images. To exploit 3D cues within stereo images, recent CNN based methods commonly use cost volume techniques to capture stereo correspondence over large disparities. However, since disparities can vary significantly for stereo cameras with different baselines, focal lengths and resolutions, the fixed maximum disparity used in cost volume techniques hinders them to handle different stereo image pairs with large disparity variations. In this paper, we propose a generic parallax-attention mechanism (PAM) to capture stereo correspondence regardless of disparity variations. Our PAM integrates epipolar constraints with attention mechanism to calculate feature similarities along the epipolar line to capture stereo correspondence. Based on our PAM, we propose a parallax-attention stereo matching network (PASMnet) and a parallax-attention stereo image super-resolution network (PASSRnet) for stereo matching and stereo image super-resolution tasks. Moreover, we introduce a new and large-scale dataset named Flickr1024 for stereo image super-resolution. Experimental results show that our PAM is generic and can effectively learn stereo correspondence under large disparity variations in an unsupervised manner. Comparative results show that our PASMnet and PASSRnet achieve the state-of-the-art performance.

* Accepted by IEEE TPAMI 2020. arXiv admin note: text overlap with arXiv:1903.05784

Via

Access Paper or Ask Questions

A Comparative Study of Quality and Content-Based Spatial Pooling Strategies in Image Quality Assessment

Nov 21, 2018
Dogancan Temel, Ghassan AlRegib

Figure 1 for A Comparative Study of Quality and Content-Based Spatial Pooling Strategies in Image Quality Assessment

Figure 2 for A Comparative Study of Quality and Content-Based Spatial Pooling Strategies in Image Quality Assessment

Figure 3 for A Comparative Study of Quality and Content-Based Spatial Pooling Strategies in Image Quality Assessment

Figure 4 for A Comparative Study of Quality and Content-Based Spatial Pooling Strategies in Image Quality Assessment

The process of quantifying image quality consists of engineering the quality features and pooling these features to obtain a value or a map. There has been a significant research interest in designing the quality features but pooling is usually overlooked compared to feature design. In this work, we compare the state of the art quality and content-based spatial pooling strategies and show that although features are the key in any image quality assessment, pooling also matters. We also propose a quality-based spatial pooling strategy that is based on linearly weighted percentile pooling (WPP). Pooling strategies are analyzed for squared error, SSIM and PerSIM in LIVE, multiply distorted LIVE and TID2013 image databases.

* 2015 IEEE GlobalSIP, Orlando, FL, 2015, pp. 732-736
* Paper: 5 pages, 8 figures, Presentation: 21 slides [Ancillary files]

Via

Access Paper or Ask Questions

A Straightforward Framework For Video Retrieval Using CLIP

Feb 24, 2021
Jesús Andrés Portillo-Quintero, José Carlos Ortiz-Bayliss, Hugo Terashima-Marín

Figure 1 for A Straightforward Framework For Video Retrieval Using CLIP

Figure 2 for A Straightforward Framework For Video Retrieval Using CLIP

Figure 3 for A Straightforward Framework For Video Retrieval Using CLIP

Figure 4 for A Straightforward Framework For Video Retrieval Using CLIP

Video Retrieval is a challenging task where a text query is matched to a video or vice versa. Most of the existing approaches for addressing such a problem rely on annotations made by the users. Although simple, this approach is not always feasible in practice. In this work, we explore the application of the language-image model, CLIP, to obtain video representations without the need for said annotations. This model was explicitly trained to learn a common space where images and text can be compared. Using various techniques described in this document, we extended its application to videos, obtaining state-of-the-art results on the MSR-VTT and MSVD benchmarks.

* 10 pages, 1 figure, submitted to Mexican Conference for Pattern Recognition (MCPR 2021)

Via

Access Paper or Ask Questions

Pulmonary embolism identification in computerized tomography pulmonary angiography scans with deep learning technologies in COVID-19 patients

May 28, 2021
Chairi Kiourt, Georgios Feretzakis, Konstantinos Dalamarinis, Dimitris Kalles, Georgios Pantos, Ioannis Papadopoulos, Spyros Kouris, George Ioannakis, Evangelos Loupelis, Petros Antonopoulos, Aikaterini Sakagianni

Figure 1 for Pulmonary embolism identification in computerized tomography pulmonary angiography scans with deep learning technologies in COVID-19 patients

Figure 2 for Pulmonary embolism identification in computerized tomography pulmonary angiography scans with deep learning technologies in COVID-19 patients

Figure 3 for Pulmonary embolism identification in computerized tomography pulmonary angiography scans with deep learning technologies in COVID-19 patients

Figure 4 for Pulmonary embolism identification in computerized tomography pulmonary angiography scans with deep learning technologies in COVID-19 patients

The main objective of this work is to utilize state-of-the-art deep learning approaches for the identification of pulmonary embolism in CTPA-Scans for COVID-19 patients, provide an initial assessment of their performance and, ultimately, provide a fast-track prototype solution (system). We adopted and assessed some of the most popular convolutional neural network architectures through transfer learning approaches, to strive to combine good model accuracy with fast training. Additionally, we exploited one of the most popular one-stage object detection models for the localization (through object detection) of the pulmonary embolism regions-of-interests. The models of both approaches are trained on an original CTPA-Scan dataset, where we annotated of 673 CTPA-Scan images with 1,465 bounding boxes in total, highlighting pulmonary embolism regions-of-interests. We provide a brief assessment of some state-of-the-art image classification models by achieving validation accuracies of 91% in pulmonary embolism classification. Additionally, we achieved a precision of about 68% on average in the object detection model for the pulmonary embolism localization under 50% IoU threshold. For both approaches, we provide the entire training pipelines for future studies (step by step processes through source code). In this study, we present some of the most accurate and fast deep learning models for pulmonary embolism identification in CTPA-Scans images, through classification and localization (object detection) approaches for patients infected by COVID-19. We provide a fast-track solution (system) for the research community of the area, which combines both classification and object detection models for improving the precision of identifying pulmonary embolisms.

* 16 pages, 6 figures, 1 table, Submitted to the European Radiology journal of Springer

Via

Access Paper or Ask Questions

End-to-end learning of keypoint detection and matching for relative pose estimation

Apr 02, 2021
Antoine Fond, Luca Del Pero, Nikola Sivacki, Marco Paladini

Figure 1 for End-to-end learning of keypoint detection and matching for relative pose estimation

Figure 2 for End-to-end learning of keypoint detection and matching for relative pose estimation

Figure 3 for End-to-end learning of keypoint detection and matching for relative pose estimation

Figure 4 for End-to-end learning of keypoint detection and matching for relative pose estimation

We propose a new method for estimating the relative pose between two images, where we jointly learn keypoint detection, description extraction, matching and robust pose estimation. While our architecture follows the traditional pipeline for pose estimation from geometric computer vision, all steps are learnt in an end-to-end fashion, including feature matching. We demonstrate our method for the task of visual localization of a query image within a database of images with known pose. Pairwise pose estimation has many practical applications for robotic mapping, navigation, and AR. For example, the display of persistent AR objects in the scene relies on a precise camera localization to make the digital models appear anchored to the physical environment. We train our pipeline end-to-end specifically for the problem of visual localization. We evaluate our proposed approach on localization accuracy, robustness and runtime speed. Our method achieves state of the art localization accuracy on the 7 Scenes dataset.

Via

Access Paper or Ask Questions