Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tracy Hammond

MambaLiteUNet: Cross-Gated Adaptive Feature Fusion for Robust Skin Lesion Segmentation

Apr 22, 2026

Md Maklachur Rahman, Soon Ki Jung, Tracy Hammond

Abstract:Recent segmentation models have demonstrated promising efficiency by aggressively reducing parameter counts and computational complexity. However, these models often struggle to accurately delineate fine lesion boundaries and texture patterns essential for early skin cancer diagnosis and treatment planning. In this paper, we propose MambaLiteUNet, a compact yet robust segmentation framework that integrates Mamba state space modeling into a U-Net architecture, along with three key modules: Adaptive Multi-Branch Mamba Feature Fusion (AMF), Local-Global Feature Mixing (LGFM), and Cross-Gated Attention (CGA). These modules are designed to enhance local-global feature interaction, preserve spatial details, and improve the quality of skip connections. MambaLiteUNet achieves an average IoU of 87.12% and average Dice score of 93.09% across ISIC2017, ISIC2018, HAM10000, and PH2 benchmarks, outperforming state-of-the-art models. Compared to U-Net, our model improves average IoU and Dice by 7.72 and 4.61 points, respectively, while reducing parameters by 93.6% and GFLOPs by 97.6%. Additionally, in domain generalization with six unseen lesion categories, MambaLiteUNet achieves 77.61% IoU and 87.23% Dice, performing best among all evaluated models. Our extensive experiments demonstrate that MambaLiteUNet achieves a strong balance between accuracy and efficiency, making it a competitive and practical solution for dermatological image segmentation. Our code is publicly available at: https://github.com/maklachur/MambaLiteUNet.

* Accepted at CVPR 2026 Main

Via

Access Paper or Ask Questions

Masked Contrastive Pre-Training Improves Music Audio Key Detection

Apr 11, 2026

Ori Yonay, Tracy Hammond, Tianbao Yang

Abstract:Self-supervised music foundation models underperform on key detection, which requires pitch-sensitive representations. In this work, we present the first systematic study showing that the design of self-supervised pretraining directly impacts pitch sensitivity, and demonstrate that masked contrastive embeddings uniquely enable state-of-the-art (SOTA) performance in key detection in the supervised setting. First, we discover that linear evaluation after masking-based contrastive pretraining on Mel spectrograms leads to competitive performance on music key detection out of the box. This leads us to train shallow but wide multi-layer perceptrons (MLPs) on features extracted from our base model, leading to SOTA performance without the need for sophisticated data augmentation policies. We further analyze robustness and show empirically that the learned representations naturally encode common augmentations. Our study establishes self-supervised pretraining as an effective approach for pitch-sensitive MIR tasks and provides insights for designing and probing music foundation models.

* Code and models available at github.com/echo-cipher/keymyna

Via

Access Paper or Ask Questions

DependencyAI: Detecting AI Generated Text through Dependency Parsing

Feb 17, 2026

Sara Ahmed, Tracy Hammond

Abstract:As large language models (LLMs) become increasingly prevalent, reliable methods for detecting AI-generated text are critical for mitigating potential risks. We introduce DependencyAI, a simple and interpretable approach for detecting AI-generated text using only the labels of linguistic dependency relations. Our method achieves competitive performance across monolingual, multi-generator, and multilingual settings. To increase interpretability, we analyze feature importance to reveal syntactic structures that distinguish AI-generated from human-written text. We also observe a systematic overprediction of certain models on unseen domains, suggesting that generator-specific writing styles may affect cross-domain generalization. Overall, our results demonstrate that dependency relations alone provide a robust signal for AI-generated text detection, establishing DependencyAI as a strong linguistically grounded, interpretable, and non-neural network baseline.

Via

Access Paper or Ask Questions

It's Not Just Labeling" -- A Research on LLM Generated Feedback Interpretability and Image Labeling Sketch Features

May 26, 2025

Baichuan Li, Larry Powell, Tracy Hammond

Figure 1 for It's Not Just Labeling" -- A Research on LLM Generated Feedback Interpretability and Image Labeling Sketch Features

Figure 2 for It's Not Just Labeling" -- A Research on LLM Generated Feedback Interpretability and Image Labeling Sketch Features

Figure 3 for It's Not Just Labeling" -- A Research on LLM Generated Feedback Interpretability and Image Labeling Sketch Features

Figure 4 for It's Not Just Labeling" -- A Research on LLM Generated Feedback Interpretability and Image Labeling Sketch Features

Abstract:The quality of training data is critical to the performance of machine learning applications in domains like transportation, healthcare, and robotics. Accurate image labeling, however, often relies on time-consuming, expert-driven methods with limited feedback. This research introduces a sketch-based annotation approach supported by large language models (LLMs) to reduce technical barriers and enhance accessibility. Using a synthetic dataset, we examine how sketch recognition features relate to LLM feedback metrics, aiming to improve the reliability and interpretability of LLM-assisted labeling. We also explore how prompting strategies and sketch variations influence feedback quality. Our main contribution is a sketch-based virtual assistant that simplifies annotation for non-experts and advances LLM-driven labeling tools in terms of scalability, accessibility, and explainability.

Via

Access Paper or Ask Questions

Hashigo: A Next Generation Sketch Interactive System for Japanese Kanji

Apr 15, 2025

Paul Taele, Tracy Hammond

Abstract:Language students can increase their effectiveness in learning written Japanese by mastering the visual structure and written technique of Japanese kanji. Yet, existing kanji handwriting recognition systems do not assess the written technique sufficiently enough to discourage students from developing bad learning habits. In this paper, we describe our work on Hashigo, a kanji sketch interactive system which achieves human instructor-level critique and feedback on both the visual structure and written technique of students' sketched kanji. This type of automated critique and feedback allows students to target and correct specific deficiencies in their sketches that, if left untreated, are detrimental to effective long-term kanji learning.

* Vol. 21 (2009): The Twenty-Second Conference on Innovative Applications of Artificial Intelligence

Via

Access Paper or Ask Questions

Myna: Masking-Based Contrastive Learning of Musical Representations

Feb 19, 2025

Ori Yonay, Tracy Hammond, Tianbao Yang

Figure 1 for Myna: Masking-Based Contrastive Learning of Musical Representations

Figure 2 for Myna: Masking-Based Contrastive Learning of Musical Representations

Figure 3 for Myna: Masking-Based Contrastive Learning of Musical Representations

Figure 4 for Myna: Masking-Based Contrastive Learning of Musical Representations

Abstract:We present Myna, a simple yet effective approach for self-supervised musical representation learning. Built on a contrastive learning framework, Myna introduces two key innovations: (1) the use of a Vision Transformer (ViT) on mel-spectrograms as the backbone and (2) a novel data augmentation strategy, token masking, that masks 90 percent of spectrogram tokens. These innovations deliver both effectiveness and efficiency: (i) Token masking enables a significant increase in per-GPU batch size, from 48 or 120 in prior methods (CLMR, MULE) to 4096. (ii) By avoiding traditional augmentations, Myna retains pitch sensitivity, enhancing performance in tasks like key detection. (iii) The use of vertical patches allows the model to better capture critical features for key detection. Our hybrid model, Myna-22M-Hybrid, processes both 16x16 and 128x2 patches, achieving state-of-the-art results. Trained on a single GPU, it outperforms MULE (62M) on average and rivals MERT-95M, which was trained on 16 and 64 GPUs, respectively. Additionally, it surpasses MERT-95M-public, establishing itself as the best-performing model trained on publicly available data. We release our code and models to promote reproducibility and facilitate future research.

Via

Access Paper or Ask Questions

Mamba in Vision: A Comprehensive Survey of Techniques and Applications

Oct 04, 2024

Md Maklachur Rahman, Abdullah Aman Tutul, Ankur Nath, Lamyanba Laishram, Soon Ki Jung, Tracy Hammond

Figure 1 for Mamba in Vision: A Comprehensive Survey of Techniques and Applications

Figure 2 for Mamba in Vision: A Comprehensive Survey of Techniques and Applications

Figure 3 for Mamba in Vision: A Comprehensive Survey of Techniques and Applications

Figure 4 for Mamba in Vision: A Comprehensive Survey of Techniques and Applications

Abstract:Mamba is emerging as a novel approach to overcome the challenges faced by Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in computer vision. While CNNs excel at extracting local features, they often struggle to capture long-range dependencies without complex architectural modifications. In contrast, ViTs effectively model global relationships but suffer from high computational costs due to the quadratic complexity of their self-attention mechanisms. Mamba addresses these limitations by leveraging Selective Structured State Space Models to effectively capture long-range dependencies with linear computational complexity. This survey analyzes the unique contributions, computational benefits, and applications of Mamba models while also identifying challenges and potential future research directions. We provide a foundational resource for advancing the understanding and growth of Mamba models in computer vision. An overview of this work is available at https://github.com/maklachur/Mamba-in-Computer-Vision.

* Under Review

Via

Access Paper or Ask Questions