Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chunyan Miao

Continual Optimization with Symmetry Teleportation for Multi-Task Learning

Mar 06, 2025

Zhipeng Zhou, Ziqiao Meng, Pengcheng Wu, Peilin Zhao, Chunyan Miao

Figure 1 for Continual Optimization with Symmetry Teleportation for Multi-Task Learning

Figure 2 for Continual Optimization with Symmetry Teleportation for Multi-Task Learning

Figure 3 for Continual Optimization with Symmetry Teleportation for Multi-Task Learning

Figure 4 for Continual Optimization with Symmetry Teleportation for Multi-Task Learning

Abstract:Multi-task learning (MTL) is a widely explored paradigm that enables the simultaneous learning of multiple tasks using a single model. Despite numerous solutions, the key issues of optimization conflict and task imbalance remain under-addressed, limiting performance. Unlike existing optimization-based approaches that typically reweight task losses or gradients to mitigate conflicts or promote progress, we propose a novel approach based on Continual Optimization with Symmetry Teleportation (COST). During MTL optimization, when an optimization conflict arises, we seek an alternative loss-equivalent point on the loss landscape to reduce conflict. Specifically, we utilize a low-rank adapter (LoRA) to facilitate this practical teleportation by designing convergent, loss-invariant objectives. Additionally, we introduce a historical trajectory reuse strategy to continually leverage the benefits of advanced optimizers. Extensive experiments on multiple mainstream datasets demonstrate the effectiveness of our approach. COST is a plug-and-play solution that enhances a wide range of existing MTL methods. When integrated with state-of-the-art methods, COST achieves superior performance.

* 10 pages,8 figures

Via

Access Paper or Ask Questions

Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models

Mar 04, 2025

Zhifei Xie, Mingbao Lin, Zihang Liu, Pengcheng Wu, Shuicheng Yan, Chunyan Miao

Figure 1 for Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models

Figure 2 for Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models

Figure 3 for Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models

Figure 4 for Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models

Abstract:Recent advancements in multimodal reasoning have largely overlooked the audio modality. We introduce Audio-Reasoner, a large-scale audio language model for deep reasoning in audio tasks. We meticulously curated a large-scale and diverse multi-task audio dataset with simple annotations. Then, we leverage closed-source models to conduct secondary labeling, QA generation, along with structured COT process. These datasets together form a high-quality reasoning dataset with 1.2 million reasoning-rich samples, which we name CoTA. Following inference scaling principles, we train Audio-Reasoner on CoTA, enabling it to achieve great logical capabilities in audio reasoning. Experiments show state-of-the-art performance across key benchmarks, including MMAU-mini (+25.42%), AIR-Bench chat/foundation(+14.57%/+10.13%), and MELD (+8.01%). Our findings stress the core of structured CoT training in advancing audio reasoning.

* Technical report, in process

Via

Access Paper or Ask Questions

SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs

Feb 17, 2025

Yige Xu, Xu Guo, Zhiwei Zeng, Chunyan Miao

Figure 1 for SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs

Figure 2 for SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs

Figure 3 for SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs

Figure 4 for SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs

Abstract:Chain-of-Thought (CoT) reasoning enables Large Language Models (LLMs) to solve complex reasoning tasks by generating intermediate reasoning steps. However, most existing approaches focus on hard token decoding, which constrains reasoning within the discrete vocabulary space and may not always be optimal. While recent efforts explore continuous-space reasoning, they often suffer from catastrophic forgetting, limiting their applicability to state-of-the-art LLMs that already perform well in zero-shot settings with a proper instruction. To address this challenge, we propose a novel approach for continuous-space reasoning that does not require modifying the underlying LLM. Specifically, we employ a lightweight assistant model to generate instance-specific soft thought tokens speculatively as the initial chain of thoughts, which are then mapped into the LLM's representation space via a projection module. Experimental results on five reasoning benchmarks demonstrate that our method enhances LLM reasoning performance through supervised, parameter-efficient fine-tuning.

Via

Access Paper or Ask Questions

MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot

Feb 06, 2025

Xuejiao Zhao, Siyan Liu, Su-Yin Yang, Chunyan Miao

Figure 1 for MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot

Figure 2 for MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot

Figure 3 for MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot

Figure 4 for MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot

Abstract:Retrieval-augmented generation (RAG) is a well-suited technique for retrieving privacy-sensitive Electronic Health Records (EHR). It can serve as a key module of the healthcare copilot, helping reduce misdiagnosis for healthcare practitioners and patients. However, the diagnostic accuracy and specificity of existing heuristic-based RAG models used in the medical domain are inadequate, particularly for diseases with similar manifestations. This paper proposes MedRAG, a RAG model enhanced by knowledge graph (KG)-elicited reasoning for the medical domain that retrieves diagnosis and treatment recommendations based on manifestations. MedRAG systematically constructs a comprehensive four-tier hierarchical diagnostic KG encompassing critical diagnostic differences of various diseases. These differences are dynamically integrated with similar EHRs retrieved from an EHR database, and reasoned within a large language model. This process enables more accurate and specific decision support, while also proactively providing follow-up questions to enhance personalized medical decision-making. MedRAG is evaluated on both a public dataset DDXPlus and a private chronic pain diagnostic dataset (CPDD) collected from Tan Tock Seng Hospital, and its performance is compared against various existing RAG methods. Experimental results show that, leveraging the information integration and relational abilities of the KG, our MedRAG provides more specific diagnostic insights and outperforms state-of-the-art models in reducing misdiagnosis rates. Our code will be available at https://github.com/SNOWTEAM2023/MedRAG

Via

Access Paper or Ask Questions

Motif Guided Graph Transformer with Combinatorial Skeleton Prototype Learning for Skeleton-Based Person Re-Identification

Dec 12, 2024

Haocong Rao, Chunyan Miao

Figure 1 for Motif Guided Graph Transformer with Combinatorial Skeleton Prototype Learning for Skeleton-Based Person Re-Identification

Figure 2 for Motif Guided Graph Transformer with Combinatorial Skeleton Prototype Learning for Skeleton-Based Person Re-Identification

Figure 3 for Motif Guided Graph Transformer with Combinatorial Skeleton Prototype Learning for Skeleton-Based Person Re-Identification

Figure 4 for Motif Guided Graph Transformer with Combinatorial Skeleton Prototype Learning for Skeleton-Based Person Re-Identification

Abstract:Person re-identification (re-ID) via 3D skeleton data is a challenging task with significant value in many scenarios. Existing skeleton-based methods typically assume virtual motion relations between all joints, and adopt average joint or sequence representations for learning. However, they rarely explore key body structure and motion such as gait to focus on more important body joints or limbs, while lacking the ability to fully mine valuable spatial-temporal sub-patterns of skeletons to enhance model learning. This paper presents a generic Motif guided graph transformer with Combinatorial skeleton prototype learning (MoCos) that exploits structure-specific and gait-related body relations as well as combinatorial features of skeleton graphs to learn effective skeleton representations for person re-ID. In particular, motivated by the locality within joints' structure and the body-component collaboration in gait, we first propose the motif guided graph transformer (MGT) that incorporates hierarchical structural motifs and gait collaborative motifs, which simultaneously focuses on multi-order local joint correlations and key cooperative body parts to enhance skeleton relation learning. Then, we devise the combinatorial skeleton prototype learning (CSP) that leverages random spatial-temporal combinations of joint nodes and skeleton graphs to generate diverse sub-skeleton and sub-tracklet representations, which are contrasted with the most representative features (prototypes) of each identity to learn class-related semantics and discriminative skeleton representations. Extensive experiments validate the superior performance of MoCos over existing state-of-the-art models. We further show its generality under RGB-estimated skeletons, different graph modeling, and unsupervised scenarios.

* Accepted by AAAI 2025. Codes are available at https://github.com/Kali-Hac/MoCos

Via

Access Paper or Ask Questions

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Oct 16, 2024

Sicong Leng, Yun Xing, Zesen Cheng, Yang Zhou, Hang Zhang, Xin Li, Deli Zhao, Shijian Lu, Chunyan Miao, Lidong Bing

Figure 1 for The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Figure 2 for The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Figure 3 for The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Figure 4 for The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Abstract:Recent advancements in large multimodal models (LMMs) have significantly enhanced performance across diverse tasks, with ongoing efforts to further integrate additional modalities such as video and audio. However, most existing LMMs remain vulnerable to hallucinations, the discrepancy between the factual multimodal input and the generated textual output, which has limited their applicability in various real-world scenarios. This paper presents the first systematic investigation of hallucinations in LMMs involving the three most common modalities: language, visual, and audio. Our study reveals two key contributors to hallucinations: overreliance on unimodal priors and spurious inter-modality correlations. To address these challenges, we introduce the benchmark The Curse of Multi-Modalities (CMM), which comprehensively evaluates hallucinations in LMMs, providing a detailed analysis of their underlying issues. Our findings highlight key vulnerabilities, including imbalances in modality integration and biases from training data, underscoring the need for balanced cross-modal learning and enhanced hallucination mitigation strategies. Based on our observations and findings, we suggest potential research directions that could enhance the reliability of LMMs.

* Project Page: cmm-damovl.site

Via

Access Paper or Ask Questions

Generating Synthetic Datasets for Few-shot Prompt Tuning

Oct 08, 2024

Xu Guo, Zilin Du, Boyang Li, Chunyan Miao

Figure 1 for Generating Synthetic Datasets for Few-shot Prompt Tuning

Figure 2 for Generating Synthetic Datasets for Few-shot Prompt Tuning

Figure 3 for Generating Synthetic Datasets for Few-shot Prompt Tuning

Figure 4 for Generating Synthetic Datasets for Few-shot Prompt Tuning

Abstract:A major limitation of prompt tuning is its dependence on large labeled training datasets. Under few-shot learning settings, prompt tuning lags far behind full-model fine-tuning, limiting its scope of application. In this paper, we leverage the powerful LLMs to synthesize task-specific labeled data for training the soft prompts. We first introduce a distribution-aligned weighted generator tuning (DawGen) method to encourage generating in-distribution data that aligns with the few-shot real data. Then, we train soft prompts on both synthetic and real datasets using a gradient surgery approach, which eliminates the conflicting gradients from different data sources. Experiments on seven sentence-pair classification datasets demonstrate the effectiveness of our proposed method for boosting prompt tuning in few-shot learning settings. Results on QQP, MRPC, and SICK datasets are even comparable to the performance of transfer learning from large real-world datasets, showing the promise of synthetic data as an alternative for enhancing soft prompt tuning.

Via

Access Paper or Ask Questions

RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference

Oct 06, 2024

Yige Xu, Xu Guo, Zhiwei Zeng, Chunyan Miao

Figure 1 for RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference

Figure 2 for RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference

Figure 3 for RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference

Figure 4 for RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference

Abstract:Large language models (LLMs) have brought a great breakthrough to the natural language processing (NLP) community, while leading the challenge of handling concurrent customer queries due to their high throughput demands. Data multiplexing addresses this by merging multiple inputs into a single composite input, allowing more efficient inference through a shared forward pass. However, as distinguishing individuals from a composite input is challenging, conventional methods typically require training the entire backbone, yet still suffer from performance degradation. In this paper, we introduce RevMUX, a parameter-efficient data multiplexing framework that incorporates a reversible design in the multiplexer, which can be reused by the demultiplexer to perform reverse operations and restore individual samples for classification. Extensive experiments on four datasets and three types of LLM backbones demonstrate the effectiveness of RevMUX for enhancing LLM inference efficiency while retaining a satisfactory classification performance.

* EMNLP 2024 Main Conference

Via

Access Paper or Ask Questions

PairCFR: Enhancing Model Training on Paired Counterfactually Augmented Data through Contrastive Learning

Jun 09, 2024

Xiaoqi Qiu, Yongjie Wang, Xu Guo, Zhiwei Zeng, Yue Yu, Yuhong Feng, Chunyan Miao

Figure 1 for PairCFR: Enhancing Model Training on Paired Counterfactually Augmented Data through Contrastive Learning

Figure 2 for PairCFR: Enhancing Model Training on Paired Counterfactually Augmented Data through Contrastive Learning

Figure 3 for PairCFR: Enhancing Model Training on Paired Counterfactually Augmented Data through Contrastive Learning

Figure 4 for PairCFR: Enhancing Model Training on Paired Counterfactually Augmented Data through Contrastive Learning

Abstract:Counterfactually Augmented Data (CAD) involves creating new data samples by applying minimal yet sufficient modifications to flip the label of existing data samples to other classes. Training with CAD enhances model robustness against spurious features that happen to correlate with labels by spreading the casual relationships across different classes. Yet, recent research reveals that training with CAD may lead models to overly focus on modified features while ignoring other important contextual information, inadvertently introducing biases that may impair performance on out-ofdistribution (OOD) datasets. To mitigate this issue, we employ contrastive learning to promote global feature alignment in addition to learning counterfactual clues. We theoretically prove that contrastive loss can encourage models to leverage a broader range of features beyond those modified ones. Comprehensive experiments on two human-edited CAD datasets demonstrate that our proposed method outperforms the state-of-the-art on OOD datasets.

* Accepted by ACL 2024 main conference

Via

Access Paper or Ask Questions

A Survey of Artificial Intelligence in Gait-Based Neurodegenerative Disease Diagnosis

May 21, 2024

Haocong Rao, Minlin Zeng, Xuejiao Zhao, Chunyan Miao

Figure 1 for A Survey of Artificial Intelligence in Gait-Based Neurodegenerative Disease Diagnosis

Figure 2 for A Survey of Artificial Intelligence in Gait-Based Neurodegenerative Disease Diagnosis

Figure 3 for A Survey of Artificial Intelligence in Gait-Based Neurodegenerative Disease Diagnosis

Figure 4 for A Survey of Artificial Intelligence in Gait-Based Neurodegenerative Disease Diagnosis

Abstract:Recent years have witnessed an increasing global population affected by neurodegenerative diseases (NDs), which traditionally require extensive healthcare resources and human effort for medical diagnosis and monitoring. As a crucial disease-related motor symptom, human gait can be exploited to characterize different NDs. The current advances in artificial intelligence (AI) models enable automatic gait analysis for NDs identification and classification, opening a new avenue to facilitate faster and more cost-effective diagnosis of NDs. In this paper, we provide a comprehensive survey on recent progress of machine learning and deep learning based AI techniques applied to diagnosis of five typical NDs through gait. We provide an overview of the process of AI-assisted NDs diagnosis, and present a systematic taxonomy of existing gait data and AI models. Through an extensive review and analysis of 164 studies, we identify and discuss the challenges, potential solutions, and future directions in this field. Finally, we envision the prospective utilization of 3D skeleton data for human gait representation and the development of more efficient AI models for NDs diagnosis. We provide a public resource repository to track and facilitate developments in this emerging field: https://github.com/Kali-Hac/AI4NDD-Survey.

* 35 pages, 9 figures, 5 tables, citing 272 papers, under review at ACM Computing Survey (CSUR) journal. A up-to-date resource (papers, data, etc.) of this survey (AI4NDD) is provided at https://github.com/Kali-Hac/AI4NDD-Survey

Via

Access Paper or Ask Questions