Abstract:Early exits (EEs) offer a promising approach to reducing computational costs and latency by dynamically terminating inference once a satisfactory prediction confidence on a data sample is achieved. Although many works integrate EEs into encoder-only Transformers, their application to decoder-only architectures and, more importantly, Mamba models, a novel family of state-space architectures in the LLM realm, remains insufficiently explored. This work introduces DYNAMAX, the first framework to exploit the unique properties of Mamba architectures for early exit mechanisms. We not only integrate EEs into Mamba but also repurpose Mamba as an efficient EE classifier for both Mamba-based and transformer-based LLMs, showcasing its versatility. Our experiments employ the Mistral 7B transformer compared to the Codestral 7B Mamba model, using data sets such as TruthfulQA, CoQA, and TriviaQA to evaluate computational savings, accuracy, and consistency. The results highlight the adaptability of Mamba as a powerful EE classifier and its efficiency in balancing computational cost and performance quality across NLP tasks. By leveraging Mamba's inherent design for dynamic processing, we open pathways for scalable and efficient inference in embedded applications and resource-constrained environments. This study underscores the transformative potential of Mamba in redefining dynamic computing paradigms for LLMs.
Abstract:This study focuses on analyzing dermoscopy images to determine the depth of melanomas, which is a critical factor in diagnosing and treating skin cancer. The Breslow depth, measured from the top of the granular layer to the deepest point of tumor invasion, serves as a crucial parameter for staging melanoma and guiding treatment decisions. This research aims to improve the prediction of the depth of melanoma through the use of machine learning models, specifically deep learning, while also providing an analysis of the possible existance of graduation in the images characteristics which correlates with the depth of the melanomas. Various datasets, including ISIC and private collections, were used, comprising a total of 1162 images. The datasets were combined and balanced to ensure robust model training. The study utilized pre-trained Convolutional Neural Networks (CNNs). Results indicated that the models achieved significant improvements over previous methods. Additionally, the study conducted a correlation analysis between model's predictions and actual melanoma thickness, revealing a moderate correlation that improves with higher thickness values. Explainability methods such as feature visualization through Principal Component Analysis (PCA) demonstrated the capability of deep features to distinguish between different depths of melanoma, providing insight into the data distribution and model behavior. In summary, this research presents a dual contribution: enhancing the state-of-the-art classification results through advanced training techniques and offering a detailed analysis of the data and model behavior to better understand the relationship between dermoscopy images and melanoma thickness.