What is Resnet? ResNet (Residual Neural Network) is a deep-learning architecture that uses residual connections to enable training of very deep neural networks.
Papers and Code
Sep 11, 2025
Abstract:Magnetic resonance imaging (MRI) is the gold standard for brain imaging. Deep learning (DL) algorithms have been proposed to aid in the diagnosis of diseases such as Alzheimer's disease (AD) from MRI scans. However, DL algorithms can suffer from shortcut learning, in which spurious features, not directly related to the output label, are used for prediction. When these features are related to protected attributes, they can lead to performance bias against underrepresented protected groups, such as those defined by race and sex. In this work, we explore the potential for shortcut learning and demographic bias in DL based AD diagnosis from MRI. We first investigate if DL algorithms can identify race or sex from 3D brain MRI scans to establish the presence or otherwise of race and sex based distributional shifts. Next, we investigate whether training set imbalance by race or sex can cause a drop in model performance, indicating shortcut learning and bias. Finally, we conduct a quantitative and qualitative analysis of feature attributions in different brain regions for both the protected attribute and AD classification tasks. Through these experiments, and using multiple datasets and DL models (ResNet and SwinTransformer), we demonstrate the existence of both race and sex based shortcut learning and bias in DL based AD classification. Our work lays the foundation for fairer DL diagnostic tools in brain MRI. The code is provided at https://github.com/acharaakshit/ShortMR
* FAIMI @ MICCAI 2025
Via

Sep 10, 2025
Abstract:The robustness of contrastive self-supervised learning (CSSL) for imbalanced datasets is largely unexplored. CSSL usually makes use of \emph{multi-view} assumptions to learn discriminatory features via similar and dissimilar data samples. CSSL works well on balanced datasets, but does not generalize well for imbalanced datasets. In a very recent paper, as part of future work, Yann LeCun pointed out that the self-supervised multiview framework can be extended to cases involving \emph{more than two views}. Taking a cue from this insight we propose a theoretical justification based on the concept of \emph{mutual information} to support the \emph{more than two views} objective and apply it to the problem of dataset imbalance in self-supervised learning. The proposed method helps extract representative characteristics of the tail classes by segregating between \emph{intra} and \emph{inter} discriminatory characteristics. We introduce a loss function that helps us to learn better representations by filtering out extreme features. Experimental evaluation on a variety of self-supervised frameworks (both contrastive and non-contrastive) also prove that the \emph{more than two view} objective works well for imbalanced datasets. We achieve a new state-of-the-art accuracy in self-supervised imbalanced dataset classification (2\% improvement in Cifar10-LT using Resnet-18, 5\% improvement in Cifar100-LT using Resnet-18, 3\% improvement in Imagenet-LT (1k) using Resnet-50).
Via

Sep 09, 2025
Abstract:Implicit neural representations have recently been extended to represent convolutional neural network weights via neural representation for neural networks, offering promising parameter compression benefits. However, standard multi-layer perceptrons used in neural representation for neural networks exhibit a pronounced spectral bias, hampering their ability to reconstruct high-frequency details effectively. In this paper, we propose SBS, a parameter-efficient enhancement to neural representation for neural networks that suppresses spectral bias using two techniques: (1) a unidirectional ordering-based smoothing that improves kernel smoothness in the output space, and (2) unidirectional ordering-based smoothing aware random fourier features that adaptively modulate the frequency bandwidth of input encodings based on layer-wise parameter count. Extensive evaluations on various ResNet models with datasets CIFAR-10, CIFAR-100, and ImageNet, demonstrate that SBS achieves significantly better reconstruction accuracy with less parameters compared to SOTA.
* Accepted by ICONIP 2025
Via

Sep 09, 2025
Abstract:One of the key issues in Deep Neural Networks (DNNs) is the black-box nature of their internal feature extraction process. Targeting vision-related domains, this paper focuses on analysing the feature space of a DNN by proposing a decoder that can generate images whose features are guaranteed to closely match a user-specified feature. Owing to this guarantee that is missed in past studies, our decoder allows us to evidence which of various attributes in an image are encoded into a feature by the DNN, by generating images whose features are in proximity to that feature. Our decoder is implemented as a guided diffusion model that guides the reverse image generation of a pre-trained diffusion model to minimise the Euclidean distance between the feature of a clean image estimated at each step and the user-specified feature. One practical advantage of our decoder is that it can analyse feature spaces of different DNNs with no additional training and run on a single COTS GPU. The experimental results targeting CLIP's image encoder, ResNet-50 and vision transformer demonstrate that images generated by our decoder have features remarkably similar to the user-specified ones and reveal valuable insights into these DNNs' feature spaces.
Via

Sep 09, 2025
Abstract:Deep Neural Networks (DNNs) are vulnerable to adversarial attacks, posing significant security threats to their deployment in remote sensing applications. Research on adversarial attacks not only reveals model vulnerabilities but also provides critical insights for enhancing robustness. Although current mixing-based strategies have been proposed to increase the transferability of adversarial examples, they either perform global blending or directly exchange a region in the images, which may destroy global semantic features and mislead the optimization of adversarial examples. Furthermore, their reliance on cross-entropy loss for perturbation optimization leads to gradient diminishing during iterative updates, compromising adversarial example quality. To address these limitations, we focus on non-targeted attacks and propose a novel framework via local mixing and logits optimization. First, we present a local mixing strategy to generate diverse yet semantically consistent inputs. Different from MixUp, which globally blends two images, and MixCut, which stitches images together, our method merely blends local regions to preserve global semantic information. Second, we adapt the logit loss from targeted attacks to non-targeted scenarios, mitigating the gradient vanishing problem of cross-entropy loss. Third, a perturbation smoothing loss is applied to suppress high-frequency noise and enhance transferability. Extensive experiments on FGSCR-42 and MTARSI datasets demonstrate superior performance over 12 state-of-the-art methods across 6 surrogate models. Notably, with ResNet as the surrogate on MTARSI, our method achieves a 17.28% average improvement in black-box attack success rate.
Via

Sep 08, 2025
Abstract:Data scarcity hinders deep learning for medical imaging. We propose a framework for breast cancer classification in thermograms that addresses this using a Diffusion Probabilistic Model (DPM) for data augmentation. Our DPM-based augmentation is shown to be superior to both traditional methods and a ProGAN baseline. The framework fuses deep features from a pre-trained ResNet-50 with handcrafted nonlinear features (e.g., Fractal Dimension) derived from U-Net segmented tumors. An XGBoost classifier trained on these fused features achieves 98.0\% accuracy and 98.1\% sensitivity. Ablation studies and statistical tests confirm that both the DPM augmentation and the nonlinear feature fusion are critical, statistically significant components of this success. This work validates the synergy between advanced generative models and interpretable features for creating highly accurate medical diagnostic tools.
* Accepted to IEEE-EMBS International Conference on Biomedical and
Health Informatics (BHI 2025)
Via

Sep 05, 2025
Abstract:Breast cancer remains a leading cause of cancer-related mortality among women worldwide. Ultrasound imaging, widely used due to its safety and cost-effectiveness, plays a key role in early detection, especially in patients with dense breast tissue. This paper presents a comprehensive study on the application of machine learning and deep learning techniques for breast cancer classification using ultrasound images. Using datasets such as BUSI, BUS-BRA, and BrEaST-Lesions USG, we evaluate classical machine learning models (SVM, KNN) and deep convolutional neural networks (ResNet-18, EfficientNet-B0, GoogLeNet). Experimental results show that ResNet-18 achieves the highest accuracy (99.7%) and perfect sensitivity for malignant lesions. Classical ML models, though outperformed by CNNs, achieve competitive performance when enhanced with deep feature extraction. Grad-CAM visualizations further improve model transparency by highlighting diagnostically relevant image regions. These findings support the integration of AI-based diagnostic tools into clinical workflows and demonstrate the feasibility of deploying high-performing, interpretable systems for ultrasound-based breast cancer detection.
* 6 pages, 2 figures and 1 table
Via

Sep 05, 2025
Abstract:Robustifying convolutional neural networks (CNNs) against adversarial attacks remains challenging and often requires resource-intensive countermeasures. We explore the use of sparse mixture-of-experts (MoE) layers to improve robustness by replacing selected residual blocks or convolutional layers, thereby increasing model capacity without additional inference cost. On ResNet architectures trained on CIFAR-100, we find that inserting a single MoE layer in the deeper stages leads to consistent improvements in robustness under PGD and AutoPGD attacks when combined with adversarial training. Furthermore, we discover that when switch loss is used for balancing, it causes routing to collapse onto a small set of overused experts, thereby concentrating adversarial training on these paths and inadvertently making them more robust. As a result, some individual experts outperform the gated MoE model in robustness, suggesting that robust subpaths emerge through specialization. Our code is available at https://github.com/KASTEL-MobilityLab/robust-sparse-moes.
* Accepted for publication at the STREAM workshop at ICCV 2025
Via

Sep 05, 2025
Abstract:Rapid and accurate post-hurricane damage assessment is vital for disaster response and recovery. Yet existing CNN-based methods struggle to capture multi-scale spatial features and to distinguish visually similar or co-occurring damage types. To address these issues, we propose MCANet, a multi-label classification framework that learns multi-scale representations and adaptively attends to spatially relevant regions for each damage category. MCANet employs a Res2Net-based hierarchical backbone to enrich spatial context across scales and a multi-head class-specific residual attention module to enhance discrimination. Each attention branch focuses on different spatial granularities, balancing local detail with global context. We evaluate MCANet on the RescueNet dataset of 4,494 UAV images collected after Hurricane Michael. MCANet achieves a mean average precision (mAP) of 91.75%, outperforming ResNet, Res2Net, VGG, MobileNet, EfficientNet, and ViT. With eight attention heads, performance further improves to 92.35%, boosting average precision for challenging classes such as Road Blocked by over 6%. Class activation mapping confirms MCANet's ability to localize damage-relevant regions, supporting interpretability. Outputs from MCANet can inform post-disaster risk mapping, emergency routing, and digital twin-based disaster response. Future work could integrate disaster-specific knowledge graphs and multimodal large language models to improve adaptability to unseen disasters and enrich semantic understanding for real-world decision-making.
* 34 pages, 7 figures
Via

Sep 05, 2025
Abstract:Sebocytes are lipid-secreting cells whose differentiation is marked by the accumulation of intracellular lipid droplets, making their quantification a key readout in sebocyte biology. Manual counting is labor-intensive and subjective, motivating automated solutions. Here, we introduce a simple attention-based multiple instance learning (MIL) framework for sebocyte image analysis. Nile Red-stained sebocyte images were annotated into 14 classes according to droplet counts, expanded via data augmentation to about 50,000 cells. Two models were benchmarked: a baseline multi-layer perceptron (MLP) trained on aggregated patch-level counts, and an attention-based MIL model leveraging ResNet-50 features with instance weighting. Experiments using five-fold cross-validation showed that the baseline MLP achieved more stable performance (mean MAE = 5.6) compared with the attention-based MIL, which was less consistent (mean MAE = 10.7) but occasionally superior in specific folds. These findings indicate that simple bag-level aggregation provides a robust baseline for slide-level droplet counting, while attention-based MIL requires task-aligned pooling and regularization to fully realize its potential in sebocyte image analysis.
* 8 pages, 1 figure, 2 tables
Via
