Deep learning has shown to have great potential in medical applications. In critical domains as such, it is of high interest to have trustworthy algorithms which are able to tell when reliable assessments cannot be guaranteed. Detecting out-of-distribution (OOD) samples is a crucial step towards building a safe classifier. Following a previous study, showing that it is possible to classify breast cancer in point-of-care ultrasound images, this study investigates OOD detection using three different methods: softmax, energy score and deep ensembles. All methods are tested on three different OOD data sets. The results show that the energy score method outperforms the softmax method, performing well on two of the data sets. The ensemble method is the most robust, performing the best at detecting OOD samples for all three OOD data sets.
Lung cancer (LC) remains the primary cause of cancer-related mortality, largely due to late-stage diagnoses. Effective strategies for early detection are therefore of paramount importance. In recent years, machine learning (ML) has demonstrated considerable potential in healthcare by facilitating the detection of various diseases. In this retrospective development and validation study, we developed an ML model based on dynamic ensemble selection (DES) for LC detection. The model leverages standard blood sample analysis and smoking history data from a large population at risk in Denmark. The study includes all patients examined on suspicion of LC in the Region of Southern Denmark from 2009 to 2018. We validated and compared the predictions by the DES model with diagnoses provided by five pulmonologists. Among the 38,944 patients, 9,940 had complete data of which 2,505 (25\%) had LC. The DES model achieved an area under the roc curve of 0.77$\pm$0.01, sensitivity of 76.2\%$\pm$2.4\%, specificity of 63.8\%$\pm$2.3\%, positive predictive value of 41.6\%$\pm$1.2\%, and F\textsubscript{1}-score of 53.8\%$\pm$1.1\%. The DES model outperformed all five pulmonologists, achieving a sensitivity 9\% higher than their average. The model identified smoking status, age, total calcium levels, neutrophil count, and lactate dehydrogenase as the most important factors for the detection of LC. The results highlight the successful application of the ML approach in detecting LC, surpassing pulmonologists' performance. Incorporating clinical and laboratory data in future risk assessment models can improve decision-making and facilitate timely referrals.
Lung cancer is one of the significant causes of cancer-related deaths globally. Early detection and treatment improve the chances of survival. Traditionally CT scans have been used to extract the most significant lung infection information and diagnose cancer. This process is carried out manually by an expert radiologist. The imbalance in the radiologists-to-population ratio in a country like India implies significant work pressure on them and thus raises the need to automate a few of their responsibilities. The tendency of modern-day Deep Neural networks to make overconfident mistakes limit their usage to detect cancer. In this paper, we propose a new task-specific loss function to calibrate the neural network to reduce the risk of overconfident mistakes. We use the state-of-the-art Multi-class Difference in Confidence and Accuracy (MDCA) loss in conjunction with the proposed task-specific loss function to achieve the same. We also integrate post-hoc calibration by performing temperature scaling on top of the train-time calibrated model. We demonstrate 5.98% improvement in the Expected Calibration Error (ECE) and a 17.9% improvement in Maximum Calibration Error (MCE) as compared to the best-performing SOTA algorithm.
Detection and diagnosis of colon polyps are key to preventing colorectal cancer. Recent evidence suggests that AI-based computer-aided detection (CADe) and computer-aided diagnosis (CADx) systems can enhance endoscopists' performance and boost colonoscopy effectiveness. However, most available public datasets primarily consist of still images or video clips, often at a down-sampled resolution, and do not accurately represent real-world colonoscopy procedures. We introduce the REAL-Colon (Real-world multi-center Endoscopy Annotated video Library) dataset: a compilation of 2.7M native video frames from sixty full-resolution, real-world colonoscopy recordings across multiple centers. The dataset contains 350k bounding-box annotations, each created under the supervision of expert gastroenterologists. Comprehensive patient clinical data, colonoscopy acquisition information, and polyp histopathological information are also included in each video. With its unprecedented size, quality, and heterogeneity, the REAL-Colon dataset is a unique resource for researchers and developers aiming to advance AI research in colonoscopy. Its openness and transparency facilitate rigorous and reproducible research, fostering the development and benchmarking of more accurate and reliable colonoscopy-related algorithms and models.
This paper discusses the role of Transfer Learning (TL) and transformers in cancer detection based on image analysis. With the enormous evolution of cancer patients, the identification of cancer cells in a patient's body has emerged as a trend in the field of Artificial Intelligence (AI). This process involves analyzing medical images, such as Computed Tomography (CT) scans and Magnetic Resonance Imaging (MRIs), to identify abnormal growths that may help in cancer detection. Many techniques and methods have been realized to improve the quality and performance of cancer classification and detection, such as TL, which allows the transfer of knowledge from one task to another with the same task or domain. TL englobes many methods, particularly those used in image analysis, such as transformers and Convolutional Neural Network (CNN) models trained on the ImageNet dataset. This paper analyzes and criticizes each method of TL based on image analysis and compares the results of each method, showing that transformers have achieved the best results with an accuracy of 97.41% for colon cancer detection and 94.71% for Histopathological Lung cancer. Future directions for cancer detection based on image analysis are also discussed.
Lung cancer is one of the prevalence diseases in the world which cause many deaths. Detecting early stages of lung cancer is so necessary. So, modeling and simulating some intelligent medical systems is an essential which can help specialist to accurately determine and diagnose the disease. So this paper contributes a new lung cancer detection model in CT images which use machine learning methods. There are three steps in this model: noise reduction (pre-processing), segmentation (middle-processing) and optimize segmentation for detect exact are of nodules. This article use some filters for noise reduction and then use Independent Recurrent Neural Networks (IndRNN) as deep learning methods for segmentation which optimize and tune by Genetic Algorithm. The results represented that the proposed method can detect exact area of nodules in CT images.
The Deep Learning revolution has enabled groundbreaking achievements in recent years. From breast cancer detection to protein folding, deep learning algorithms have been at the core of very important advancements. However, these modern advancements are becoming more and more data-hungry, especially on labeled data whose availability is scarce: this is even more prevalent in the medical context. In this work, we show how active learning could be very effective in data scarcity situations, where obtaining labeled data (or annotation budget is very limited). We compare several selection criteria (BALD, MeanSTD, and MaxEntropy) on the ISIC 2016 dataset. We also explored the effect of acquired pool size on the model's performance. Our results suggest that uncertainty is useful to the Melanoma detection task, and confirms the hypotheses of the author of the paper of interest, that \textit{bald} performs on average better than other acquisition functions. Our extended analyses however revealed that all acquisition functions perform badly on the positive (cancerous) samples, suggesting exploitation of class unbalance, which could be crucial in real-world settings. We finish by suggesting future work directions that would be useful to improve this current work. The code of our implementation is open-sourced at \url{https://github.com/bonaventuredossou/ece526_course_project}
Large language models (LLMs) such as ChatGPT have gained considerable interest across diverse research communities. Their notable ability for text completion and generation has inaugurated a novel paradigm for language-interfaced problem solving. However, the potential and efficacy of these models in bioinformatics remain incompletely explored. In this work, we study the performance LLMs on a wide spectrum of crucial bioinformatics tasks. These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems. Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks. In addition, we provide a thorough analysis of their limitations in the context of complicated bioinformatics tasks. In conclusion, we believe that this work can provide new perspectives and motivate future research in the field of LLMs applications, AI for Science and bioinformatics.
Diagnosis of breast cancer malignancy at the early stages is a crucial step for controlling its side effects. Histopathological analysis provides a unique opportunity for malignant breast cancer detection. However, such a task would be tedious and time-consuming for the histopathologists. Deep Neural Networks enable us to learn informative features directly from raw histopathological images without manual feature extraction. Although Convolutional Neural Networks (CNNs) have been the dominant architectures in the computer vision realm, Transformer-based architectures have shown promising results in different computer vision tasks. Although harnessing the capability of Transformer-based architectures for medical image analysis seems interesting, these architectures are large, have a significant number of trainable parameters, and require large datasets to be trained on, which are usually rare in the medical domain. It has been claimed and empirically proved that at least part of the superior performance of Transformer-based architectures in Computer Vision domain originates from patch embedding operation. In this paper, we borrowed the previously introduced idea of integrating a fully Convolutional Neural Network architecture with Patch Embedding operation and presented an efficient CNN architecture for breast cancer malignancy detection from histopathological images. Despite the number of parameters that is significantly smaller than other methods, the accuracy performance metrics achieved 97.65%, 98.92%, 99.21%, and 98.01% for 40x, 100x, 200x, and 400x magnifications respectively. We took a step forward and modified the architecture using Group Convolution and Channel Shuffling ideas and reduced the number of trainable parameters even more with a negligible decline in performance and achieved 95.42%, 98.16%, 96.05%, and 97.92% accuracy for the mentioned magnifications respectively.
A standard treatment protocol for breast cancer entails administering neoadjuvant therapy followed by surgical removal of the tumor and surrounding tissue. Pathologists typically rely on cabinet X-ray radiographs, known as Faxitron, to examine the excised breast tissue and diagnose the extent of residual disease. However, accurately determining the location, size, and focality of residual cancer can be challenging, and incorrect assessments can lead to clinical consequences. The utilization of automated methods can improve the histopathology process, allowing pathologists to choose regions for sampling more effectively and precisely. Despite the recognized necessity, there are currently no such methods available. Training such automated detection models require accurate ground truth labels on ex-vivo radiology images, which can be acquired through registering Faxitron and histopathology images and mapping the extent of cancer from histopathology to x-ray images. This study introduces a deep learning-based image registration approach trained on mono-modal synthetic image pairs. The models were trained using data from 50 women who received neoadjuvant chemotherapy and underwent surgery. The results demonstrate that our method is faster and yields significantly lower average landmark error ($2.1\pm1.96$ mm) over the state-of-the-art iterative ($4.43\pm4.1$ mm) and deep learning ($4.02\pm3.15$ mm) approaches. Improved performance of our approach in integrating radiology and pathology information facilitates generating large datasets, which allows training models for more accurate breast cancer detection.