Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"cancer detection": models, code, and papers

Identifying Cancer Patients at Risk for Heart Failure Using Machine Learning Methods

Oct 01, 2019
Xi Yang, Yan Gong, Nida Waheed, Keith March, Jiang Bian, William R. Hogan, Yonghui Wu

Cardiotoxicity related to cancer therapies has become a serious issue, diminishing cancer treatment outcomes and quality of life. Early detection of cancer patients at risk for cardiotoxicity before cardiotoxic treatments and providing preventive measures are potential solutions to improve cancer patients's quality of life. This study focuses on predicting the development of heart failure in cancer patients after cancer diagnoses using historical electronic health record (EHR) data. We examined four machine learning algorithms using 143,199 cancer patients from the University of Florida Health (UF Health) Integrated Data Repository (IDR). We identified a total number of 1,958 qualified cases and matched them to 15,488 controls by gender, age, race, and major cancer type. Two feature encoding strategies were compared to encode variables as machine learning features. The gradient boosting (GB) based model achieved the best AUC score of 0.9077 (with a sensitivity of 0.8520 and a specificity of 0.8138), outperforming other machine learning methods. We also looked into the subgroup of cancer patients with exposure to chemotherapy drugs and observed a lower specificity score (0.7089). The experimental results show that machine learning methods are able to capture clinical factors that are known to be associated with heart failure and that it is feasible to use machine learning methods to identify cancer patients at risk for cancer therapy-related heart failure.

* 6 pages, 1 figure, 3 tables, accepted by AMIA 2019 

Handling uncertainty using features from pathology: opportunities in primary care data for developing high risk cancer survival methods

Dec 17, 2020
Goce Ristanoski, Jon Emery, Javiera Martinez-Gutierrez, Damien Mccarthy, Uwe Aickelin

More than 144 000 Australians were diagnosed with cancer in 2019. The majority will first present to their GP symptomatically, even for cancer for which screening programs exist. Diagnosing cancer in primary care is challenging due to the non-specific nature of cancer symptoms and its low prevalence. Understanding the epidemiology of cancer symptoms and patterns of presentation in patient's medical history from primary care data could be important to improve earlier detection and cancer outcomes. As past medical data about a patient can be incomplete, irregular or missing, this creates additional challenges when attempting to use the patient's history for any new diagnosis. Our research aims to investigate the opportunities in a patient's pathology history available to a GP, initially focused on the results within the frequently ordered full blood count to determine relevance to a future high-risk cancer prognosis, and treatment outcome. We investigated how past pathology test results can lead to deriving features that can be used to predict cancer outcomes, with emphasis on patients at risk of not surviving the cancer within 2-year period. This initial work focuses on patients with lung cancer, although the methodology can be applied to other types of cancer and other data within the medical record. Our findings indicate that even in cases of incomplete or obscure patient history, hematological measures can be useful in generating features relevant for predicting cancer risk and survival. The results strongly indicate to add the use of pathology test data for potential high-risk cancer diagnosis, and the utilize additional pathology metrics or other primary care datasets even more for similar purposes.

* 14th Australasian Conference on Health Informatics and Knowledge Management HIKM 2021 

From Human Mesenchymal Stromal Cells to Osteosarcoma Cells Classification by Deep Learning

Aug 04, 2020
Mario D'Acunto, Massimo Martinelli, Davide Moroni

Early diagnosis of cancer often allows for a more vast choice of therapy opportunities. After a cancer diagnosis, staging provides essential information about the extent of disease in the body and the expected response to a particular treatment. The leading importance of classifying cancer patients at the early stage into high or low-risk groups has led many research teams, both from the biomedical and bioinformatics field, to study the application of Deep Learning (DL) methods. The ability of DL to detect critical features from complex datasets is a significant achievement in early diagnosis and cell cancer progression. In this paper, we focus the attention on osteosarcoma. Osteosarcoma is one of the primary malignant bone tumors which usually afflicts people in adolescence. Our contribution to the classification of osteosarcoma cells is made as follows: a DL approach is applied to discriminate human Mesenchymal Stromal Cells (MSCs) from osteosarcoma cells and to classify the different cell populations under investigation. Glass slides of differ-ent cell populations were cultured including MSCs, differentiated in healthy bone cells (osteoblasts) and osteosarcoma cells, both single cell populations or mixed. Images of such samples of isolated cells (single-type of mixed) are recorded with traditional optical microscopy. DL is then applied to identify and classify single cells. Proper data augmentation techniques and cross-fold validation are used to appreciate the capabilities of a convolutional neural network to address the cell detection and classification problem. Based on the results obtained on individual cells, and to the versatility and scalability of our DL approach, the next step will be its application to discriminate and classify healthy or cancer tissues to advance digital pathology.

* Journal of Intelligent & Fuzzy Systems, vol. 37, no. 6, pp. 7199-7206, 2019 
* Submitted authors' version 

Coarse-to-Fine Classification via Parametric and Nonparametric Models for Computer-Aided Diagnosis

May 16, 2014
Meizhu Liu, Le Lu, Xiaojing Ye, Shipeng Yu

Classification is one of the core problems in Computer-Aided Diagnosis (CAD), targeting for early cancer detection using 3D medical imaging interpretation. High detection sensitivity with desirably low false positive (FP) rate is critical for a CAD system to be accepted as a valuable or even indispensable tool in radiologists' workflow. Given various spurious imagery noises which cause observation uncertainties, this remains a very challenging task. In this paper, we propose a novel, two-tiered coarse-to-fine (CTF) classification cascade framework to tackle this problem. We first obtain classification-critical data samples (e.g., samples on the decision boundary) extracted from the holistic data distributions using a robust parametric model (e.g., \cite{Raykar08}); then we build a graph-embedding based nonparametric classifier on sampled data, which can more accurately preserve or formulate the complex classification boundary. These two steps can also be considered as effective "sample pruning" and "feature pursuing + $k$NN/template matching", respectively. Our approach is validated comprehensively in colorectal polyp detection and lung nodule detection CAD systems, as the top two deadly cancers, using hospital scale, multi-site clinical datasets. The results show that our method achieves overall better classification/detection performance than existing state-of-the-art algorithms using single-layer classifiers, such as the support vector machine variants \cite{Wang08}, boosting \cite{Slabaugh10}, logistic regression \cite{Ravesteijn10}, relevance vector machine \cite{Raykar08}, $k$-nearest neighbor \cite{Murphy09} or spectral projections on graph \cite{Cai08}.


Augmented Networks for Faster Brain Metastases Detection in T1-Weighted Contrast-Enhanced 3D MRI

May 27, 2021
Engin Dikici, Xuan V. Nguyen, Matthew Bigelow, Luciano M. Prevedello

Early detection of brain metastases (BM) is one of the determining factors for the successful treatment of patients with cancer; however, the accurate detection of small BM lesions (< 15mm) remains a challenging task. We previously described a framework for the detection of small BM in single-sequence gadolinium-enhanced T1-weighted 3D MRI datasets. It combined classical image processing (IP) with a dedicated convolutional neural network, taking approximately 30 seconds to process each dataset due to computation-intensive IP stages. To overcome the speed limitation, this study aims to reformulate the framework via an augmented pair of CNNs (eliminating the IP) to reduce the processing times while preserving the BM detection performance. Our previous implementation of the BM detection algorithm utilized Laplacian of Gaussians (LoG) for the candidate selection portion of the solution. In this study, we introduce a novel BM candidate detection CNN (cdCNN) to replace this classical IP stage. The network is formulated to have (1) a similar receptive field as the LoG method, and (2) a bias for the detection of BM lesion loci. The proposed CNN is later augmented with a classification CNN to perform the BM detection task. The cdCNN achieved 97.4% BM detection sensitivity when producing 60K candidates per 3D MRI dataset, while the LoG achieved 96.5% detection sensitivity with 73K candidates. The augmented BM detection framework generated on average 9.20 false-positive BM detections per patient for 90% sensitivity, which is comparable with our previous results. However, it processes each 3D data in 1.9 seconds, presenting a 93.5% reduction in the computation time.


Machine Learning Characterization of Cancer Patients-Derived Extracellular Vesicles using Vibrational Spectroscopies

Aug 25, 2021
Abicumaran Uthamacumaran, Samir Elouatik, Mohamed Abdouh, Michael Berteau-Rainville, Zu-hua Gao, Goffredo Arena

The early detection of cancer is a challenging problem in medicine. The blood sera of cancer patients are enriched with heterogeneous secretory lipid bound extracellular vesicles (EVs), which present a complex repertoire of information and biomarkers, representing their cell of origin, that are being currently studied in the field of liquid biopsy and cancer screening. Vibrational spectroscopies provide non-invasive approaches for the assessment of structural and biophysical properties in complex biological samples. In this study, multiple Raman spectroscopy measurements were performed on the EVs extracted from the blood sera of 9 patients consisting of four different cancer subtypes (colorectal cancer, hepatocellular carcinoma, breast cancer and pancreatic cancer) and five healthy patients (controls). FTIR(Fourier Transform Infrared) spectroscopy measurements were performed as a complementary approach to Raman analysis, on two of the four cancer subtypes. The AdaBoost Random Forest Classifier, Decision Trees, and Support Vector Machines (SVM) distinguished the baseline corrected Raman spectra of cancer EVs from those of healthy controls (18 spectra) with a classification accuracy of greater than 90% when reduced to a spectral frequency range of 1800 to 1940 inverse cm, and subjected to a 0.5 training/testing split. FTIR classification accuracy on 14 spectra showed an 80% classification accuracy. Our findings demonstrate that basic machine learning algorithms are powerful tools to distinguish the complex vibrational spectra of cancer patient EVs from those of healthy patients. These experimental methods hold promise as valid and efficient liquid biopsy for machine intelligence-assisted early cancer screening.

* 50 pages 

Dependency detection with similarity constraints

Jan 31, 2011
Leo Lahti, Samuel Myllykangas, Sakari Knuutila, Samuel Kaski

Unsupervised two-view learning, or detection of dependencies between two paired data sets, is typically done by some variant of canonical correlation analysis (CCA). CCA searches for a linear projection for each view, such that the correlations between the projections are maximized. The solution is invariant to any linear transformation of either or both of the views; for tasks with small sample size such flexibility implies overfitting, which is even worse for more flexible nonparametric or kernel-based dependency discovery methods. We develop variants which reduce the degrees of freedom by assuming constraints on similarity of the projections in the two views. A particular example is provided by a cancer gene discovery application where chromosomal distance affects the dependencies between gene copy number and activity levels. Similarity constraints are shown to improve detection performance of known cancer genes.

* In T{\"u}lay Adali, Jocelyn Chanussot, Christian Jutten, and Jan Larsen, editors, Proceedings of the 2009 IEEE International Workshop on Machine Learning for Signal Processing XIX, pages 89--94. IEEE, Piscataway, NJ, USA, 2009 
* 9 pages, 3 figures. Appeared in proceedings of the 2009 IEEE International Workshop on Machine Learning for Signal Processing XIX (MLSP'09). Implementation of the method available at 

A Pulmonary Nodule Detection Model Based on Progressive Resolution and Hierarchical Saliency

Jul 02, 2018
Junjie Zhang, Yong Xia, Yanning Zhang

Detection of pulmonary nodules on chest CT is an essential step in the early diagnosis of lung cancer, which is critical for best patient care. Although a number of computer-aided nodule detection methods have been published in the literature, these methods still have two major drawbacks: missing out true nodules during the detection of nodule candidates and less-accurate identification of nodules from non-nodule. In this paper, we propose an automated pulmonary nodule detection algorithm that jointly combines progressive resolution and hierarchical saliency. Specifically, we design a 3D progressive resolution-based densely dilated FCN, namely the progressive resolution network (PRN), to detect nodule candidates inside the lung, and construct a densely dilated 3D CNN with hierarchical saliency, namely the hierarchical saliency network (HSN), to simultaneously identify genuine nodules from those candidates and estimate the diameters of nodules. We evaluated our algorithm on the benchmark LUng Nodule Analysis 2016 (LUNA16) dataset and achieved a state-of-the-art detection score. Our results suggest that the proposed algorithm can effectively detect pulmonary nodules on chest CT and accurately estimate their diameters.

* 8 pages,4 figures,1 table