Cancer detection using Artificial Intelligence (AI) involves leveraging advanced machine learning algorithms and techniques to identify and diagnose cancer from various medical data sources. The goal is to enhance early detection, improve diagnostic accuracy, and potentially reduce the need for invasive procedures.
Breast cancer is a major global health concern, and mammography screening plays a central role in early detection. The large volume of screening examinations creates a substantial workload for radiologists, making accurate and consistent report generation a critical clinical challenge. Existing automated mammography report generation methods primarily focus on direct visual-to-text mapping, while overlooking the structured clinical reasoning process followed by radiologists in real-world practice. To address this limitation, we propose MammoRG, a mammography report generation framework that explicitly simulates the clinical reporting workflow by following the BI-RADS guideline and incorporating prior clinical knowledge to produce diagnostic reports. Specifically, MammoRG adopts a two-stage training framework. In the first stage, the model learns to integrate clinically relevant prior knowledge from a patient's four-view mammograms through classification-based supervision. In the second stage, a terminology-aware supervised fine-tuning strategy is introduced to model mammography-specific clinical terms as atomic semantic units, enabling the generation of high-quality reports with improved clinical consistency. To facilitate clinical efficacy evaluation of generated reports, we further develop MammoRGTool, a dedicated mammography report parsing tool that extracts structured clinical information from free-text reports. Extensive experiments demonstrate that MammoRG consistently outperforms existing methods across multiple clinical efficacy metrics, particularly in diagnosis-related BI-RADS F1, where it surpasses the second-best model by 2.73%, 2.04%, 1.90%, and 3.27% on the internal, external 1, external 2, and VinDr-Mammo datasets, respectively.
Biological systems are governed by structured molecular interactions, where pathways, regulatory circuits, and functional gene relationships shape cellular behavior and disease progression. Much of this knowledge is naturally represented as graphs. However, most biomedical AI models cannot directly use graph-encoded biological knowledge and instead require compressed low-dimensional representations, which can lose important structure and reduce performance, especially in limited-sample clinical studies. Here, we introduce Graph-in-Graph (GiG), a knowledge graph-modulated deep learning framework for data-efficient clinical prediction. GiG represents each patient as a standalone modular graph, in which curated biological knowledge graphs define edges and patient-specific measurements, such as gene expression, define node features. This design allows multiple biological knowledge graphs to be integrated while preserving gene-gene interactions and pathway topology during patient-level representation learning. Across cohorts comprising nearly 9,700 patients and five clinical tasks, including liquid biopsy cancer detection, prostate cancer diagnosis, and 32-class pan-cancer classification, GiG consistently outperforms traditional and state-of-the-art methods, with the largest gains in limited-sample settings. On the challenging prostate cancer diagnosis task, GiG improves macro-F1 by up to 49 percentage points relative to competing methods. Control experiments replacing real pathway graphs with random topologies confirm that these gains arise from biologically grounded knowledge graph structure rather than graph modeling alone. These findings show that knowledge graph-modulated deep learning can improve robustness, interpretability, and sample efficiency in clinical data analysis, and provide a principled framework for integrating biological knowledge graphs into predictive modeling.
Scintillation detectors with excellent timing resolution enable more precise localization of radiation sources in positron emission tomography, leading to substantial improvements in diagnostic capability for diseases such as cancer and dementia. At the extreme timing precision required for such applications at the picosecond scale, detector performance is governed by the microscopic dynamics of scintillation photons generated within the detector and their subsequent detection processes. However, detector signals have conventionally been treated only as collective responses of many photons due to structural constraints inherent to photodetectors. In this study, we overcome this fundamental limitation using deep learning, enabling direct access to the timing information of individual photons. The proposed method estimates photon-by-photon arrival times directly from detector waveforms without requiring any modification to the detector structure; the method operates on an event-by-event basis without ground-truth labels by integrating an unsupervised learning framework with a physically informed detector-response model. Through comprehensive validation combining Monte Carlo simulation and experimental measurements across various detector configurations, we experimentally demonstrate improved timing resolution, visualized depth-of-interaction-dependent photon transport, and classified Cherenkov and scintillation photons based on the estimated photon-level timing information using a unified deep learning-based framework. These results provide experimental access to photon dynamics, bridging the gap between theoretical modeling and experimental observation, and they open a new data-driven pathway for discovery in detector physics and optimization.
In histopathology, human experts primarily rely on color as a means of enhancing contrast to interpret tissue morphology, whereas machine vision models process color as raw statistical information. This distinction raises a fundamental question: to what extent can pixel intensity alone, independent of structural and morphological cues, support cancer classification? To address this question, we systematically evaluated the standalone discriminative power of global color features while deliberately excluding all morphological information. Specifically, we extracted statistical color moments and discretized RGB and HSV color histograms, and assessed their performance across ten diverse experimental settings using classical machine learning classifiers. Our results demonstrate that color features alone can achieve strong performance in binary diagnostic tasks (e.g., benign versus malignant), with classification accuracies reaching up to 89%. This performance is likely attributable to global chromatic shifts associated with malignancy. Importantly, these simple color-based representations consistently outperformed random baselines by a substantial margin, indicating that raw color distributions encode a non-random and diagnostically relevant signal for cancer detection. Consequently, this study suggests that simple, computationally efficient color features can serve as an effective pre-screening tool. By identifying samples with strong chromatic indicators of malignancy, these lightweight models could function as a first-pass triage system, reducing the computational burden on complex deep learning architectures.
Cancer is one of the leading causes of death worldwide, making the development of rapid, minimally invasive, label-free and scalable diagnostic strategies a major challenge in modern oncology. In this context, spectroscopic liquid biopsy has emerged as a promising alternative, as it enables the holistic characterization of biochemical alterations in biological fluids. In this work, we propose a multimodal spectroscopic liquid biopsy framework for multicancer detection based on the combination of Fourier Transform Infrared (FTIR) spectroscopy, Raman spectroscopy, and Excitation-Emission Matrix (EEM) fluorescence spectroscopy together with Machine Learning (ML) methodologies. Serum samples from breast cancer patients, colorectal cancer patients, and healthy controls were analyzed through the three spectroscopic modalities. After modality-specific preprocessing, low-level data fusion (LLDF) was employed to integrate the complementary biochemical information encoded within the different spectroscopic measurements, and classification was performed using XGBoost models. Seven experimental configurations were evaluated, including the three unimodal approaches, all pairwise bimodal configurations, and the full multimodal approach of FTIR, Raman, and EEM fluorescence. The results show that although several individual modalities achieved high discrimination performance, the multimodal fusion provided the most balanced overall results, reaching a ROC-AUC of 0.997 for breast cancer and 0.994 for colorectal cancer, together with highly balanced sensitivity and specificity values.
Cancer screening is a reasoning task. A radiologist observes findings, compares them to prior scans, integrates clinical context, and reaches a diagnostic conclusion confirmed by pathology. We present RadThinking, a Visual Question Answering (VQA) dataset that makes this reasoning explicit and trainable. RadThinking releases VQA pairs at three difficulty tiers. Foundation VQAs are atomic perception questions. Single-step reasoning VQAs apply one clinical rule. Compositional VQAs require multi-step chain-of-thought to reach a guideline category such as LI-RADS-5. For every compositional VQA, we release the chain of foundation VQAs that solves it. The chain follows the rules of the governing clinical reporting standard. The dataset spans 20,362 CT scans from 9,131 patients across 43 cancer groups, plus 2,077 verified healthy controls with >1-year follow-up. To our knowledge, RadThinking is the first cancer-screening VQA corpus that stratifies questions by reasoning depth and grounds compositions in clinical reporting standards. The foundation tier supplies atomic perception supervision. The compositional tier supplies chain-of-thought data and verifiable rewards for reinforcement-learning recipes such as DeepSeek-R1 and OpenAI o1. RadThinking enables systematic training and evaluation of whether AI systems can reason about cancer, not merely detect it.
Oral cancer is a significant global health burden, and early detection remains a critical clinical need. Electrical impedance spectroscopy (EIS) offers a promising non-invasive approach for real-time tissue characterization, but classification frameworks that jointly leverage multiple impedance features for in vivo oral lesion discrimination remain underdeveloped. This paper presents a machine-learning (ML) pipeline to optimize classification of in vivo oral pathology from EIS data collected using a handheld, bedside device. Impedance measurements were acquired from 104 patients undergoing oral cancer resection or biopsy. Three classification tasks were evaluated: (1) healthy vs. cancer, (2) multi-class lesion-type discrimination (cancer, high-grade dysplasia, non-malignant), and (3) multi-class discrimination between the three lesion pathologies and healthy tissue. For each task, signal frequencies were independently ranked and reduced using PCA, and different current injection/voltage measurement (IIVV) pattern geometries were tested. Classification performance was assessed through leave-one-patient-group-out cross-validation to ensure robustness on unseen patients. Input data dimensionality was reduced by up to 99% across all tasks while improving diagnostic accuracy over baseline models trained on the full dataset. A logistic regression model achieved the highest binary classification accuracy of 80% with an AUC of 0.90, while multi-class scenarios maintained AUCs above 0.82. All top-performing models utilized the significantly reduced IIVV set as input. The proposed pipeline advances EIS-based cancer detection by providing a robust, computationally efficient, and clinically practical framework for early diagnosis of oral cancer lesions, with a methodology readily generalizable to other EIS devices and applications.
Clinical trial studies indicate benefit of watch-and-wait (WW) surveillance for patients with rectal cancer showing a complete or near clinical response (CR) directly after treatment (restaging). However, there are no objectively accurate methods to early detect local tumor regrowth (LR) in patients undergoing WW from follow-up exams. Hence, we developed Temporal Rectal Endoscopy Cross-attention (TREX), a longitudinal deep learning approach that combines pairs of images acquired at restaging and follow-up to distinguish CR from LR. TREX uses pretrained Swin Transformers in a siamese setting to extract features from longitudinal images and dual cross-attention to combine the features without spatial co-registration between image pairs. TREX and Swin-based baselines were trained under two settings: (a) detecting LR or CR at the last available follow-up and (b) early detection of LR at 3--6, 6--12, and 12--24 months before clinical confirmation. TREX achieved the highest accuracy in detecting LR with a high sensitivity of 97% $\pm$ 6% and a balanced accuracy of 90% $\pm$ 3%, and outperformed all baselines in early detection at both 3--6 (74% $\pm$ 1%) and 6--12 months (62% $\pm$ 4%) prior to clinical detection. Clinical validation via a surgeon survey showed that TREX matched attending-level overall accuracy (TREX: 86.21% vs.\ Clinicians: 87.84% $\pm$ 1.28%). Finally, we explored TREX's ability to predict treatment response by combining pre-treatment (pre-TNT) and restaging endoscopies, achieving a balanced accuracy of 73% $\pm$ 12%. These results show that longitudinal deep learning analysis of endoscopy may improve surveillance and enable earlier identification of rectal cancer regrowth.
Low-dose computed tomography (LDCT) is the standard modality for lung cancer screening, known for its low radiation dose but high noise levels. While existing literature focuses on denoising LDCT images, comparative research on simulating LDCT characteristics to directly use these images for model development is lacking. This study shifts the focus from denoising images to degrading available standard-dose CT (SDCT) data, generating synthetic images for data augmentation to train classifiers for screening-detected nodules. We compare three degradation methods: (1) a sinogram domain statistical noise insertion; (2) replicate a validated physics-based simulation using Pix2Pix; and (3) unpaired CycleGAN. The generated images were utilized to simulate LDCT screening scenario replacing 695 SDCT cases from the LIDC-IDRI dataset, from which radiomic features were extracted to train machine learning models for lung nodule classification. Regarding image quality, CycleGAN achieved the best Fréchet inception distance (0.1734) and kernel inception distance (0.0813; 0.1002) scores, indicating distributional alignment with the target low-dose domain. In the nodule classification task, results confirmed the necessity of domain adaptation since a baseline model trained on non-degraded SDCT data failed to generalize to the real LDCT set (AUC 0.789) with a low sensitivity (0.571). Degraded images generated using CycleGAN approach led to the most balanced performance on the classification task using Adam Booster classifier, achieving an AUC of 0.861, sensitivity of 0.743 and specificity of 0.858 in the independent test. Our findings confirm that generating synthetic LDCT data from standard-dose scans is a viable strategy for training robust nodule classifiers for screening detected nodules.
Accurate classification of breast cancer subtypes from gene expression data is critical for diagnosis and treatment selection. However, such datasets are characterized by high dimensionality and limited sample size, posing challenges for machine learning models. In this study, we evaluate the impact of model complexity and feature selection on subtype classification performance using TCGA-BRCA gene expression data. Logistic regression, random forest, and support vector machine (SVM) models were trained using varying numbers of highly variable genes (50 to 20,518). Performance was evaluated using stratified 5-fold cross-validation and assessed with accuracy and macro F1 score. While all models achieved high accuracy, macro F1 analysis revealed substantial differences in subtype-level performance. Logistic regression demonstrated the most stable and balanced performance across subtypes, including improved detection of rare classes. Random forest underperformed on minority subtypes despite strong overall accuracy, while SVM showed sensitivity to feature dimensionality. These findings highlight the importance of model simplicity, evaluation metrics, and feature selection in high-dimensional biological classification tasks.