Topic:Multi Label Classification
What is Multi Label Classification? Multi-label classification is the task of assigning labels to entities where multiple labels may be assigned to each entity, allowing it to belong to more than one category simultaneously.
Papers and Code
Jun 24, 2025
Abstract:Diabetic Peripheral Neuropathy (DPN) affects nearly half of diabetes patients, requiring early detection. Corneal Confocal Microscopy (CCM) enables non-invasive diagnosis, but automated methods suffer from inefficient feature extraction, reliance on handcrafted priors, and data limitations. We propose HMSViT, a novel Hierarchical Masked Self-Supervised Vision Transformer (HMSViT) designed for corneal nerve segmentation and DPN diagnosis. Unlike existing methods, HMSViT employs pooling-based hierarchical and dual attention mechanisms with absolute positional encoding, enabling efficient multi-scale feature extraction by capturing fine-grained local details in early layers and integrating global context in deeper layers, all at a lower computational cost. A block-masked self supervised learning framework is designed for the HMSViT that reduces reliance on labelled data, enhancing feature robustness, while a multi-scale decoder is used for segmentation and classification by fusing hierarchical features. Experiments on clinical CCM datasets showed HMSViT achieves state-of-the-art performance, with 61.34% mIoU for nerve segmentation and 70.40% diagnostic accuracy, outperforming leading hierarchical models like the Swin Transformer and HiViT by margins of up to 6.39% in segmentation accuracy while using fewer parameters. Detailed ablation studies further reveal that integrating block-masked SSL with hierarchical multi-scale feature extraction substantially enhances performance compared to conventional supervised training. Overall, these comprehensive experiments confirm that HMSViT delivers excellent, robust, and clinically viable results, demonstrating its potential for scalable deployment in real-world diagnostic applications.
Via

Jun 23, 2025
Abstract:Ensuring the moral reasoning capabilities of Large Language Models (LLMs) is a growing concern as these systems are used in socially sensitive tasks. Nevertheless, current evaluation benchmarks present two major shortcomings: a lack of annotations that justify moral classifications, which limits transparency and interpretability; and a predominant focus on English, which constrains the assessment of moral reasoning across diverse cultural settings. In this paper, we introduce MFTCXplain, a multilingual benchmark dataset for evaluating the moral reasoning of LLMs via hate speech multi-hop explanation using Moral Foundation Theory (MFT). The dataset comprises 3,000 tweets across Portuguese, Italian, Persian, and English, annotated with binary hate speech labels, moral categories, and text span-level rationales. Empirical results highlight a misalignment between LLM outputs and human annotations in moral reasoning tasks. While LLMs perform well in hate speech detection (F1 up to 0.836), their ability to predict moral sentiments is notably weak (F1 < 0.35). Furthermore, rationale alignment remains limited mainly in underrepresented languages. These findings show the limited capacity of current LLMs to internalize and reflect human moral reasoning.
* Under Review
Via

Jun 18, 2025
Abstract:Medical imaging data contain sensitive patient information requiring strong privacy protection. Many analytical setups require data to be sent to a server for inference purposes. Homomorphic encryption (HE) provides a solution by allowing computations to be performed on encrypted data without revealing the original information. However, HE inference is computationally expensive, particularly for large images (e.g., chest X-rays). In this study, we propose an HE inference framework for medical images that uses VQGAN to compress images into latent representations, thereby significantly reducing the computational burden while preserving image quality. We approximate the activation functions with lower-degree polynomials to balance the accuracy and efficiency in compliance with HE requirements. We observed that a downsampling factor of eight for compression achieved an optimal balance between performance and computational cost. We further adapted the squeeze and excitation module, which is known to improve traditional CNNs, to enhance the HE framework. Our method was tested on two chest X-ray datasets for multi-label classification tasks using vanilla CNN backbones. Although HE inference remains relatively slow and introduces minor performance differences compared with unencrypted inference, our approach shows strong potential for practical use in medical images
* 11 pages, 5 figures
Via

Jun 18, 2025
Abstract:In 2012, the United Nations introduced 17 Sustainable Development Goals (SDGs) aimed at creating a more sustainable and improved future by 2030. However, tracking progress toward these goals is difficult because of the extensive scale and complexity of the data involved. Text classification models have become vital tools in this area, automating the analysis of vast amounts of text from a variety of sources. Additionally, large language models (LLMs) have recently proven indispensable for many natural language processing tasks, including text classification, thanks to their ability to recognize complex linguistic patterns and semantics. This study analyzes various proprietary and open-source LLMs for a single-label, multi-class text classification task focused on the SDGs. Then, it also evaluates the effectiveness of task adaptation techniques (i.e., in-context learning approaches), namely Zero-Shot and Few-Shot Learning, as well as Fine-Tuning within this domain. The results reveal that smaller models, when optimized through prompt engineering, can perform on par with larger models like OpenAI's GPT (Generative Pre-trained Transformer).
* Submitted to IEEE Access
Via

Jun 17, 2025
Abstract:In the era of digitalization, as individuals increasingly rely on digital platforms for communication and news consumption, various actors employ linguistic strategies to influence public perception. While models have become proficient at detecting explicit patterns, which typically appear in texts as single remarks referred to as utterances, such as social media posts, malicious actors have shifted toward utilizing implicit influential verbal patterns embedded within conversations. These verbal patterns aim to mentally penetrate the victim's mind in order to influence them, enabling the actor to obtain the desired information through implicit means. This paper presents an improved approach for detecting such implicit influential patterns. Furthermore, the proposed model is capable of identifying the specific locations of these influential elements within a conversation. To achieve this, the existing dataset was augmented using the reasoning capabilities of state-of-the-art language models. Our designed framework resulted in a 6% improvement in the detection of implicit influential patterns in conversations. Moreover, this approach improved the multi-label classification tasks related to both the techniques used for influence and the vulnerability of victims by 33% and 43%, respectively.
* Accepted at the HCI International conference 2025
Via

Jun 16, 2025
Abstract:Advancements in Industrial Internet of Things (IIoT) sensors enable sophisticated Predictive Maintenance (PM) with high temporal resolution. For cost-efficient solutions, vibration-based condition monitoring is especially of interest. However, analyzing high-resolution vibration data via traditional cloud approaches incurs significant energy and communication costs, hindering battery-powered edge deployments. This necessitates shifting intelligence to the sensor edge. Due to their event-driven nature, Spiking Neural Networks (SNNs) offer a promising pathway toward energy-efficient on-device processing. This paper investigates a recurrent SNN for simultaneous regression (flow, pressure, pump speed) and multi-label classification (normal, overpressure, cavitation) for an industrial progressing cavity pump (PCP) using 3-axis vibration data. Furthermore, we provide energy consumption estimates comparing the SNN approach on conventional (x86, ARM) and neuromorphic (Loihi) hardware platforms. Results demonstrate high classification accuracy (>97%) with zero False Negative Rates for critical Overpressure and Cavitation faults. Smoothed regression outputs achieve Mean Relative Percentage Errors below 1% for flow and pump speed, approaching industrial sensor standards, although pressure prediction requires further refinement. Energy estimates indicate significant power savings, with the Loihi consumption (0.0032 J/inf) being up to 3 orders of magnitude less compared to the estimated x86 CPU (11.3 J/inf) and ARM CPU (1.18 J/inf) execution. Our findings underscore the potential of SNNs for multi-task PM directly on resource-constrained edge devices, enabling scalable and energy-efficient industrial monitoring solutions.
* This paper has been accepted and will be presented at the
International Conference on Neuromorphic Systems (ICONS) 2025, July 29-31,
2025. The proceedings will be published later
Via

Jun 17, 2025
Abstract:Fairness has been identified as an important aspect of Machine Learning and Artificial Intelligence solutions for decision making. Recent literature offers a variety of approaches for debiasing, however many of them fall short when the data collection is imbalanced. In this paper, we focus on a particular case, fairness in doubly imbalanced datasets, such that the data collection is imbalanced both for the label and the groups in the sensitive attribute. Firstly, we present an exploratory analysis to illustrate limitations in debiasing on a doubly imbalanced dataset. Then, a multi-criteria based solution is proposed for finding the most suitable sampling and distribution for label and sensitive attribute, in terms of fairness and classification accuracy
* 33 pages, 3 figures, submitted to AI Review
Via

Jun 17, 2025
Abstract:Label-free imaging has gained broad interest because of its potential to omit elaborate staining procedures which is especially relevant for in vivo use. Label-free multiphoton microscopy (MPM), for instance, exploits two-photon excitation of natural autofluorescence (AF) from native, metabolic proteins, making it ideal for in vivo endomicroscopy. Deep learning (DL) models have been widely used in other optical imaging technologies to predict specific target annotations and thereby digitally augment the specificity of these label-free images. However, this computational specificity has only rarely been implemented for MPM. In this work, we used a data set of label-free MPM images from a series of different immune cell types (5,075 individual cells for binary classification in mixed samples and 3,424 cells for a multi-class classification task) and trained a convolutional neural network (CNN) to classify cell types based on this label-free AF as input. A low-complexity squeezeNet architecture was able to achieve reliable immune cell classification results (0.89 ROC-AUC, 0.95 PR-AUC, for binary classification in mixed samples; 0.689 F1 score, 0.697 precision, 0.748 recall, and 0.683 MCC for six-class classification in isolated samples). Perturbation tests confirmed that the model is not confused by extracellular environment and that both input AF channels (NADH and FAD) are about equally important to the classification. In the future, such predictive DL models could directly detect specific immune cells in unstained images and thus, computationally improve the specificity of label-free MPM which would have great potential for in vivo endomicroscopy.
Via

Jun 17, 2025
Abstract:In this paper, we construct two research objectives: i) explore the learned embedding space of BiomedCLIP, an open-source large vision language model, to analyse meaningful class separations, and ii) quantify the limitations of BiomedCLIP when applied to a highly imbalanced, out-of-distribution multi-label medical dataset. We experiment on IU-xray dataset, which exhibits the aforementioned criteria, and evaluate BiomedCLIP in classifying images (radiographs) in three contexts: zero-shot inference, full finetuning, and linear probing. The results show that the model under zero-shot settings over-predicts all labels, leading to poor precision and inter-class separability. Full fine-tuning improves classification of distinct diseases, while linear probing detects overlapping features. We demonstrate visual understanding of the model using Grad-CAM heatmaps and compare with 15 annotations by a radiologist. We highlight the need for careful adaptations of the models to foster reliability and applicability in a real-world setting. The code for the experiments in this work is available and maintained on GitHub.
Via

Jun 13, 2025
Abstract:Remote sensing has emerged as a critical tool for large-scale Earth monitoring and land management. In this paper, we introduce AgriPotential, a novel benchmark dataset composed of Sentinel-2 satellite imagery spanning multiple months. The dataset provides pixel-level annotations of agricultural potentials for three major crop types - viticulture, market gardening, and field crops - across five ordinal classes. AgriPotential supports a broad range of machine learning tasks, including ordinal regression, multi-label classification, and spatio-temporal modeling. The data covers diverse areas in Southern France, offering rich spectral information. AgriPotential is the first public dataset designed specifically for agricultural potential prediction, aiming to improve data-driven approaches to sustainable land use planning. The dataset and the code are freely accessible at: https://zenodo.org/records/15556484
Via
