Abstract:Music emotion recognition is a key task in symbolic music understanding (SMER). Recent approaches have shown promising results by fine-tuning large-scale pre-trained models (e.g., MIDIBERT, a benchmark in symbolic music understanding) to map musical semantics to emotional labels. While these models effectively capture distributional musical semantics, they often overlook tonal structures, particularly musical modes, which play a critical role in emotional perception according to music psychology. In this paper, we investigate the representational capacity of MIDIBERT and identify its limitations in capturing mode-emotion associations. To address this issue, we propose a Mode-Guided Enhancement (MoGE) strategy that incorporates psychological insights on mode into the model. Specifically, we first conduct a mode augmentation analysis, which reveals that MIDIBERT fails to effectively encode emotion-mode correlations. We then identify the least emotion-relevant layer within MIDIBERT and introduce a Mode-guided Feature-wise linear modulation injection (MoFi) framework to inject explicit mode features, thereby enhancing the model's capability in emotional representation and inference. Extensive experiments on the EMOPIA and VGMIDI datasets demonstrate that our mode injection strategy significantly improves SMER performance, achieving accuracies of 75.2% and 59.1%, respectively. These results validate the effectiveness of mode-guided modeling in symbolic music emotion recognition.




Abstract:Diabetic retinopathy (DR), a serious ocular complication of diabetes, is one of the primary causes of vision loss among retinal vascular diseases. Deep learning methods have been extensively applied in the grading of diabetic retinopathy (DR). However, their performance declines significantly when applied to data outside the training distribution due to domain shifts. Domain generalization (DG) has emerged as a solution to this challenge. However, most existing DG methods overlook lesion-specific features, resulting in insufficient accuracy. In this paper, we propose a novel approach that enhances existing DG methods by incorporating structural priors, inspired by the observation that DR grading is heavily dependent on vessel and lesion structures. We introduce Low-rank Adaptive Structural Priors (LoASP), a plug-and-play framework designed for seamless integration with existing DG models. LoASP improves generalization by learning adaptive structural representations that are finely tuned to the complexities of DR diagnosis. Extensive experiments on eight diverse datasets validate its effectiveness in both single-source and multi-source domain scenarios. Furthermore, visualizations reveal that the learned structural priors intuitively align with the intricate architecture of the vessels and lesions, providing compelling insights into their interpretability and diagnostic relevance.