Abstract:Speech Emotion Recognition models typically use single categorical labels, overlooking the inherent ambiguity of human emotions. Ambiguous Emotion Recognition addresses this by representing emotions as probability distributions, but progress is limited by unreliable ground-truth distributions inferred from sparse human annotations. This paper explores whether Large Audio-Language Models (ALMs) can mitigate the annotation bottleneck by generating high-quality synthetic annotations. We introduce a framework leveraging ALMs to create Synthetic Perceptual Proxies, augmenting human annotations to improve ground-truth distribution reliability. We validate these proxies through statistical analysis of their alignment with human distributions and evaluate their impact by fine-tuning ALMs with the augmented emotion distributions. Furthermore, to address class imbalance and enable unbiased evaluation, we propose DiME-Aug, a Distribution-aware Multimodal Emotion Augmentation strategy. Experiments on IEMOCAP and MSP-Podcast show that synthetic annotations enhance emotion distribution, especially in low-ambiguity regions where annotation agreement is high. However, benefits diminish for highly ambiguous emotions with greater human disagreement. This work provides the first evidence that ALMs could address annotation scarcity in ambiguous emotion recognition, but highlights the need for more advanced prompting or generation strategies to handle highly ambiguous cases.
Abstract:The advancements of Large language models (LLMs) have provided great opportunities to text-to-SQL tasks to overcome the main challenges to understand complex domain information and complex database structures in business applications. In this paper, we propose a meta-aware learning framework to integrate domain knowledge, database schema, chain-of-thought reasoning processes, and metadata relationships to improve the SQL generation quality. The proposed framework includes four learning strategies: schema-based learning, Chain-of-Thought (CoT) learning, knowledge-enhanced learning, and key information tokenization. This approach provides a comprehensive understanding of database structure and metadata information towards LLM through fine-tuning to improve its performance on SQL generation within business domains. Through two experimental studies, we have demonstrated the superiority of the proposed methods in execution accuracy, multi-task SQL generation capability, and reduction of catastrophic forgetting.