What is cancer detection? Cancer detection using Artificial Intelligence (AI) involves leveraging advanced machine learning algorithms and techniques to identify and diagnose cancer from various medical data sources. The goal is to enhance early detection, improve diagnostic accuracy, and potentially reduce the need for invasive procedures.
Papers and Code
May 08, 2025
Abstract:Colorectal cancer is one of the deadliest cancers today, but it can be prevented through early detection of malignant polyps in the colon, primarily via colonoscopies. While this method has saved many lives, human error remains a significant challenge, as missing a polyp could have fatal consequences for the patient. Deep learning (DL) polyp detectors offer a promising solution. However, existing DL polyp detectors often mistake white light reflections from the endoscope for polyps, which can lead to false positives.To address this challenge, in this paper, we propose a novel data augmentation approach that artificially adds more white light reflections to create harder training scenarios. Specifically, we first generate a bank of artificial lights using the training dataset. Then we find the regions of the training images that we should not add these artificial lights on. Finally, we propose a sliding window method to add the artificial light to the areas that fit of the training images, resulting in augmented images. By providing the model with more opportunities to make mistakes, we hypothesize that it will also have more chances to learn from those mistakes, ultimately improving its performance in polyp detection. Experimental results demonstrate the effectiveness of our new data augmentation method.
* 5 pages, 4 Figures, paper accepted by the ISBI (International
Symposium on Biomedical Imaging) 2025 Conference
Via

Apr 28, 2025
Abstract:Purpose: The scarcity of high-quality curated labeled medical training data remains one of the major limitations in applying artificial intelligence (AI) systems to breast cancer diagnosis. Deep models for mammogram analysis and mass (or micro-calcification) detection require training with a large volume of labeled images, which are often expensive and time-consuming to collect. To reduce this challenge, we proposed a novel method that leverages self-supervised learning (SSL) and a deep hybrid model, named \textbf{HybMNet}, which combines local self-attention and fine-grained feature extraction to enhance breast cancer detection on screening mammograms. Approach: Our method employs a two-stage learning process: (1) SSL Pretraining: We utilize EsViT, a SSL technique, to pretrain a Swin Transformer (Swin-T) using a limited set of mammograms. The pretrained Swin-T then serves as the backbone for the downstream task. (2) Downstream Training: The proposed HybMNet combines the Swin-T backbone with a CNN-based network and a novel fusion strategy. The Swin-T employs local self-attention to identify informative patch regions from the high-resolution mammogram, while the CNN-based network extracts fine-grained local features from the selected patches. A fusion module then integrates global and local information from both networks to generate robust predictions. The HybMNet is trained end-to-end, with the loss function combining the outputs of the Swin-T and CNN modules to optimize feature extraction and classification performance. Results: The proposed method was evaluated for its ability to detect breast cancer by distinguishing between benign (normal) and malignant mammograms. Leveraging SSL pretraining and the HybMNet model, it achieved AUC of 0.864 (95% CI: 0.852, 0.875) on the CMMD dataset and 0.889 (95% CI: 0.875, 0.903) on the INbreast dataset, highlighting its effectiveness.
Via

Apr 28, 2025
Abstract:Colorectal polyps are key indicators for early detection of colorectal cancer. However, traditional endoscopic imaging often struggles with accurate polyp localization and lacks comprehensive contextual awareness, which can limit the explainability of diagnoses. To address these issues, we propose the Dynamic Contextual Attention Network (DCAN). This novel approach transforms spatial representations into adaptive contextual insights, using an attention mechanism that enhances focus on critical polyp regions without explicit localization modules. By integrating contextual awareness into the classification process, DCAN improves decision interpretability and overall diagnostic performance. This advancement in imaging could lead to more reliable colorectal cancer detection, enabling better patient outcomes.
* Accepted at 47th Annual International Conference of the IEEE
Engineering in Medicine and Biology Society (EMBC) 2025
Via

Apr 28, 2025
Abstract:This paper proposes an Incremental Learning (IL) approach to enhance the accuracy and efficiency of deep learning models in analyzing T2-weighted (T2w) MRI medical images prostate cancer detection using the PI-CAI dataset. We used multiple health centers' artificial intelligence and radiology data, focused on different tasks that looked at prostate cancer detection using MRI (PI-CAI). We utilized Knowledge Distillation (KD), as it employs generated images from past tasks to guide the training of models for subsequent tasks. The approach yielded improved performance and faster convergence of the models. To demonstrate the versatility and robustness of our approach, we evaluated it on the PI-CAI dataset, a diverse set of medical imaging modalities including OCT and PathMNIST, and the benchmark continual learning dataset CIFAR-10. Our results indicate that KD can be a promising technique for IL in medical image analysis in which data is sourced from individual health centers and the storage of large datasets is not feasible. By using generated images from prior tasks, our method enables the model to retain and apply previously acquired knowledge without direct access to the original data.
* 15 Pages, 3 Figures, 3 Tables, 1 Algorithm, This paper will be
updated
Via

Apr 30, 2025
Abstract:Objective: A number of machine learning models have utilized semantic features, deep features, or both to assess lung nodule malignancy. However, their reliance on manual annotation during inference, limited interpretability, and sensitivity to imaging variations hinder their application in real-world clinical settings. Thus, this research aims to integrate semantic features derived from radiologists' assessments of nodules, allowing the model to learn clinically relevant, robust, and explainable features for predicting lung cancer. Methods: We obtained 938 low-dose CT scans from the National Lung Screening Trial with 1,246 nodules and semantic features. The Lung Image Database Consortium dataset contains 1,018 CT scans, with 2,625 lesions annotated for nodule characteristics. Three external datasets were obtained from UCLA Health, the LUNGx Challenge, and the Duke Lung Cancer Screening. We finetuned a pretrained Contrastive Language-Image Pretraining model with a parameter-efficient fine-tuning approach to align imaging and semantic features and predict the one-year lung cancer diagnosis. Results: We evaluated the performance of the one-year diagnosis of lung cancer with AUROC and AUPRC and compared it to three state-of-the-art models. Our model demonstrated an AUROC of 0.90 and AUPRC of 0.78, outperforming baseline state-of-the-art models on external datasets. Using CLIP, we also obtained predictions on semantic features, such as nodule margin (AUROC: 0.81), nodule consistency (0.81), and pleural attachment (0.84), that can be used to explain model predictions. Conclusion: Our approach accurately classifies lung nodules as benign or malignant, providing explainable outputs, aiding clinicians in comprehending the underlying meaning of model predictions. This approach also prevents the model from learning shortcuts and generalizes across clinical settings.
Via

May 14, 2025
Abstract:Research projects, including those focused on cancer, rely on the manual extraction of information from clinical reports. This process is time-consuming and prone to errors, limiting the efficiency of data-driven approaches in healthcare. To address these challenges, Natural Language Processing (NLP) offers an alternative for automating the extraction of relevant data from electronic health records (EHRs). In this study, we focus on lung and breast cancer due to their high incidence and the significant impact they have on public health. Early detection and effective data management in both types of cancer are crucial for improving patient outcomes. To enhance the accuracy and efficiency of data extraction, we utilized GMV's NLP tool uQuery, which excels at identifying relevant entities in clinical texts and converting them into standardized formats such as SNOMED and OMOP. uQuery not only detects and classifies entities but also associates them with contextual information, including negated entities, temporal aspects, and patient-related details. In this work, we explore the use of NLP techniques, specifically Named Entity Recognition (NER), to automatically identify and extract key clinical information from EHRs related to these two cancers. A dataset from Health Research Institute Hospital La Fe (IIS La Fe), comprising 200 annotated breast cancer and 400 lung cancer reports, was used, with eight clinical entities manually labeled using the Doccano platform. To perform NER, we fine-tuned the bsc-bio-ehr-en3 model, a RoBERTa-based biomedical linguistic model pre-trained in Spanish. Fine-tuning was performed using the Transformers architecture, enabling accurate recognition of clinical entities in these cancer types. Our results demonstrate strong overall performance, particularly in identifying entities like MET and PAT, although challenges remain with less frequent entities like EVOL.
Via

Jun 08, 2025
Abstract:Brain tumors, regardless of being benign or malignant, pose considerable health risks, with malignant tumors being more perilous due to their swift and uncontrolled proliferation, resulting in malignancy. Timely identification is crucial for enhancing patient outcomes, particularly in nations such as Bangladesh, where healthcare infrastructure is constrained. Manual MRI analysis is arduous and susceptible to inaccuracies, rendering it inefficient for prompt diagnosis. This research sought to tackle these problems by creating an automated brain tumor classification system utilizing MRI data obtained from many hospitals in Bangladesh. Advanced deep learning models, including VGG16, VGG19, and ResNet50, were utilized to classify glioma, meningioma, and various brain cancers. Explainable AI (XAI) methodologies, such as Grad-CAM and Grad-CAM++, were employed to improve model interpretability by emphasizing the critical areas in MRI scans that influenced the categorization. VGG16 achieved the most accuracy, attaining 99.17%. The integration of XAI enhanced the system's transparency and stability, rendering it more appropriate for clinical application in resource-limited environments such as Bangladesh. This study highlights the capability of deep learning models, in conjunction with explainable artificial intelligence (XAI), to enhance brain tumor detection and identification in areas with restricted access to advanced medical technologies.
* 2024 6th International Conference on Sustainable Technologies for
Industry 5.0 (STI)
Via

Jun 04, 2025
Abstract:Gene set analysis (GSA) is a foundational approach for interpreting genomic data of diseases by linking genes to biological processes. However, conventional GSA methods overlook clinical context of the analyses, often generating long lists of enriched pathways with redundant, nonspecific, or irrelevant results. Interpreting these requires extensive, ad-hoc manual effort, reducing both reliability and reproducibility. To address this limitation, we introduce cGSA, a novel AI-driven framework that enhances GSA by incorporating context-aware pathway prioritization. cGSA integrates gene cluster detection, enrichment analysis, and large language models to identify pathways that are not only statistically significant but also biologically meaningful. Benchmarking on 102 manually curated gene sets across 19 diseases and ten disease-related biological mechanisms shows that cGSA outperforms baseline methods by over 30%, with expert validation confirming its increased precision and interpretability. Two independent case studies in melanoma and breast cancer further demonstrate its potential to uncover context-specific insights and support targeted hypothesis generation.
* 56 pages, 9 figures, 1 table
Via

May 27, 2025
Abstract:Effective and accurate diagnosis of diseases such as cancer, diabetes, and heart failure is crucial for timely medical intervention and improving patient survival rates. Machine learning has revolutionized diagnostic methods in recent years by developing classification models that detect diseases based on selected features. However, these classification tasks are often highly imbalanced, limiting the performance of classical models. Quantum models offer a promising alternative, exploiting their ability to express complex patterns by operating in a higher-dimensional computational space through superposition and entanglement. These unique properties make quantum models potentially more effective in addressing the challenges of imbalanced datasets. This work evaluates the potential of quantum classifiers in healthcare, focusing on Quantum Neural Networks (QNNs) and Quantum Support Vector Machines (QSVMs), comparing them with popular classical models. The study is based on three well-known healthcare datasets -- Prostate Cancer, Heart Failure, and Diabetes. The results indicate that QSVMs outperform QNNs across all datasets due to their susceptibility to overfitting. Furthermore, quantum models prove the ability to overcome classical models in scenarios with high dataset imbalance. Although preliminary, these findings highlight the potential of quantum models in healthcare classification tasks and lead the way for further research in this domain.
Via

Apr 15, 2025
Abstract:Deep learning has been reported to achieve high performances in the detection of skin cancer, yet many challenges regarding the reproducibility of results and biases remain. This study is a replication (different data, same analysis) of a study on Alzheimer's disease [28] which studied robustness of logistic regression (LR) and convolutional neural networks (CNN) across patient sexes. We explore sex bias in skin cancer detection, using the PAD-UFES-20 dataset with LR trained on handcrafted features reflecting dermatological guidelines (ABCDE and the 7-point checklist), and a pre-trained ResNet-50 model. We evaluate these models in alignment with [28]: across multiple training datasets with varied sex composition to determine their robustness. Our results show that both the LR and the CNN were robust to the sex distributions, but the results also revealed that the CNN had a significantly higher accuracy (ACC) and area under the receiver operating characteristics (AUROC) for male patients than for female patients. We hope these findings to contribute to the growing field of investigating potential bias in popular medical machine learning methods. The data and relevant scripts to reproduce our results can be found in our Github.
* 16 pages (excluding appendix), 2 figures (excluding appendix),
submitted to MIUA 2025 conference (response pending)
Via
