Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ming Gao

Structure-aware Fine-tuning for Code Pre-trained Models

Apr 11, 2024

Jiayi Wu, Renyu Zhu, Nuo Chen, Qiushi Sun, Xiang Li, Ming Gao

Abstract:Over the past few years, we have witnessed remarkable advancements in Code Pre-trained Models (CodePTMs). These models achieved excellent representation capabilities by designing structure-based pre-training tasks for code. However, how to enhance the absorption of structural knowledge when fine-tuning CodePTMs still remains a significant challenge. To fill this gap, in this paper, we present Structure-aware Fine-tuning (SAT), a novel structure-enhanced and plug-and-play fine-tuning method for CodePTMs. We first propose a structure loss to quantify the difference between the information learned by CodePTMs and the knowledge extracted from code structure. Specifically, we use the attention scores extracted from Transformer layer as the learned structural information, and the shortest path length between leaves in abstract syntax trees as the structural knowledge. Subsequently, multi-task learning is introduced to improve the performance of fine-tuning. Experiments conducted on four pre-trained models and two generation tasks demonstrate the effectiveness of our proposed method as a plug-and-play solution. Furthermore, we observed that SAT can benefit CodePTMs more with limited training data.

* Accepted by COLING 2024

Via

Access Paper or Ask Questions

Message Passing Variational Autoregressive Network for Solving Intractable Ising Models

Apr 09, 2024

Qunlong Ma, Zhi Ma, Jinlong Xu, Hairui Zhang, Ming Gao

Abstract:Many deep neural networks have been used to solve Ising models, including autoregressive neural networks, convolutional neural networks, recurrent neural networks, and graph neural networks. Learning a probability distribution of energy configuration or finding the ground states of a disordered, fully connected Ising model is essential for statistical mechanics and NP-hard problems. Despite tremendous efforts, a neural network architecture with the ability to high-accurately solve these fully connected and extremely intractable problems on larger systems is still lacking. Here we propose a variational autoregressive architecture with a message passing mechanism, which can effectively utilize the interactions between spin variables. The new network trained under an annealing framework outperforms existing methods in solving several prototypical Ising spin Hamiltonians, especially for larger spin systems at low temperatures. The advantages also come from the great mitigation of mode collapse during the training process of deep neural networks. Considering these extremely difficult problems to be solved, our method extends the current computational limits of unsupervised neural networks to solve combinatorial optimization problems.

* 18 pages, 14 figures

Via

Access Paper or Ask Questions

InstructGraph: Boosting Large Language Models via Graph-centric Instruction Tuning and Preference Alignment

Feb 13, 2024

Jianing Wang, Junda Wu, Yupeng Hou, Yao Liu, Ming Gao, Julian McAuley

Figure 1 for InstructGraph: Boosting Large Language Models via Graph-centric Instruction Tuning and Preference Alignment

Figure 2 for InstructGraph: Boosting Large Language Models via Graph-centric Instruction Tuning and Preference Alignment

Figure 3 for InstructGraph: Boosting Large Language Models via Graph-centric Instruction Tuning and Preference Alignment

Figure 4 for InstructGraph: Boosting Large Language Models via Graph-centric Instruction Tuning and Preference Alignment

Abstract:Do current large language models (LLMs) better solve graph reasoning and generation tasks with parameter updates? In this paper, we propose InstructGraph, a framework that empowers LLMs with the abilities of graph reasoning and generation by instruction tuning and preference alignment. Specifically, we first propose a structured format verbalizer to unify all graph data into a universal code-like format, which can simply represent the graph without any external graph-specific encoders. Furthermore, a graph instruction tuning stage is introduced to guide LLMs in solving graph reasoning and generation tasks. Finally, we identify potential hallucination problems in graph tasks and sample negative instances for preference alignment, the target of which is to enhance the output's reliability of the model. Extensive experiments across multiple graph-centric tasks exhibit that InstructGraph can achieve the best performance and outperform GPT-4 and LLaMA2 by more than 13\% and 38\%, respectively.

* 19 pages

Via

Access Paper or Ask Questions

Optimal estimation of Gaussian (poly)trees

Feb 09, 2024

Yuhao Wang, Ming Gao, Wai Ming Tai, Bryon Aragam, Arnab Bhattacharyya

Abstract:We develop optimal algorithms for learning undirected Gaussian trees and directed Gaussian polytrees from data. We consider both problems of distribution learning (i.e. in KL distance) and structure learning (i.e. exact recovery). The first approach is based on the Chow-Liu algorithm, and learns an optimal tree-structured distribution efficiently. The second approach is a modification of the PC algorithm for polytrees that uses partial correlation as a conditional independence tester for constraint-based structure learning. We derive explicit finite-sample guarantees for both approaches, and show that both approaches are optimal by deriving matching lower bounds. Additionally, we conduct numerical experiments to compare the performance of various algorithms, providing further insights and empirical evidence.

Via

Access Paper or Ask Questions

Survey of Natural Language Processing for Education: Taxonomy, Systematic Review, and Future Trends

Jan 31, 2024

Yunshi Lan, Xinyuan Li, Hanyue Du, Xuesong Lu, Ming Gao, Weining Qian, Aoying Zhou

Figure 1 for Survey of Natural Language Processing for Education: Taxonomy, Systematic Review, and Future Trends

Figure 2 for Survey of Natural Language Processing for Education: Taxonomy, Systematic Review, and Future Trends

Figure 3 for Survey of Natural Language Processing for Education: Taxonomy, Systematic Review, and Future Trends

Figure 4 for Survey of Natural Language Processing for Education: Taxonomy, Systematic Review, and Future Trends

Abstract:Natural Language Processing (NLP) aims to analyze the text via techniques in the computer science field. It serves the applications in healthcare, commerce, and education domains. Particularly, NLP has been applied to the education domain to help teaching and learning. In this survey, we review recent advances in NLP with a focus on solving problems related to the education domain. In detail, we begin with introducing the relevant background. Then, we present the taxonomy of NLP in the education domain. Next, we illustrate the task definition, challenges, and corresponding techniques based on the above taxonomy. After that, we showcase some off-the-shelf demonstrations in this domain and conclude with future directions.

Via

Access Paper or Ask Questions

FaKnow: A Unified Library for Fake News Detection

Jan 27, 2024

Yiyuan Zhu, Yongjun Li, Jialiang Wang, Ming Gao, Jiali Wei

Abstract:Over the past years, a large number of fake news detection algorithms based on deep learning have emerged. However, they are often developed under different frameworks, each mandating distinct utilization methodologies, consequently hindering reproducibility. Additionally, a substantial amount of redundancy characterizes the code development of such fake news detection models. To address these concerns, we propose FaKnow, a unified and comprehensive fake news detection algorithm library. It encompasses a variety of widely used fake news detection models, categorized as content-based and social context-based approaches. This library covers the full spectrum of the model training and evaluation process, effectively organizing the data, models, and training procedures within a unified framework. Furthermore, it furnishes a series of auxiliary functionalities and tools, including visualization, and logging. Our work contributes to the standardization and unification of fake news detection research, concurrently facilitating the endeavors of researchers in this field. The open-source code and documentation can be accessed at https://github.com/NPURG/FaKnow and https://faknow.readthedocs.io, respectively.

Via

Access Paper or Ask Questions

Federated Learning via Consensus Mechanism on Heterogeneous Data: A New Perspective on Convergence

Nov 21, 2023

Shu Zheng, Tiandi Ye, Xiang Li, Ming Gao

Abstract:Federated learning (FL) on heterogeneous data (non-IID data) has recently received great attention. Most existing methods focus on studying the convergence guarantees for the global objective. While these methods can guarantee the decrease of the global objective in each communication round, they fail to ensure risk decrease for each client. In this paper, to address the problem,we propose FedCOME, which introduces a consensus mechanism to enforce decreased risk for each client after each training round. In particular, we allow a slight adjustment to a client's gradient on the server side, which generates an acute angle between the corrected gradient and the original ones of other clients. We theoretically show that the consensus mechanism can guarantee the convergence of the global objective. To generalize the consensus mechanism to the partial participation FL scenario, we devise a novel client sampling strategy to select the most representative clients for the global data distribution. Training on these selected clients with the consensus mechanism could empirically lead to risk decrease for clients that are not selected. Finally, we conduct extensive experiments on four benchmark datasets to show the superiority of FedCOME against other state-of-the-art methods in terms of effectiveness, efficiency and fairness. For reproducibility, we make our source code publicly available at: \url{https://github.com/fedcome/fedcome}.

Via

Access Paper or Ask Questions

Prioritized Propagation in Graph Neural Networks

Nov 06, 2023

Yao Cheng, Minjie Chen, Xiang Li, Caihua Shan, Ming Gao

Figure 1 for Prioritized Propagation in Graph Neural Networks

Figure 2 for Prioritized Propagation in Graph Neural Networks

Figure 3 for Prioritized Propagation in Graph Neural Networks

Figure 4 for Prioritized Propagation in Graph Neural Networks

Abstract:Graph neural networks (GNNs) have recently received significant attention. Learning node-wise message propagation in GNNs aims to set personalized propagation steps for different nodes in the graph. Despite the success, existing methods ignore node priority that can be reflected by node influence and heterophily. In this paper, we propose a versatile framework PPro, which can be integrated with most existing GNN models and aim to learn prioritized node-wise message propagation in GNNs. Specifically, the framework consists of three components: a backbone GNN model, a propagation controller to determine the optimal propagation steps for nodes, and a weight controller to compute the priority scores for nodes. We design a mutually enhanced mechanism to compute node priority, optimal propagation step and label prediction. We also propose an alternative optimization strategy to learn the parameters in the backbone GNN model and two parametric controllers. We conduct extensive experiments to compare our framework with other 11 state-of-the-art competitors on 8 benchmark datasets. Experimental results show that our framework can lead to superior performance in terms of propagation strategies and node representations.

Via

Access Paper or Ask Questions

CDSD: Chinese Dysarthria Speech Database

Oct 24, 2023

Mengyi Sun, Ming Gao, Xinchen Kang, Shiru Wang, Jun Du, Dengfeng Yao, Su-Jing Wang

Figure 1 for CDSD: Chinese Dysarthria Speech Database

Figure 2 for CDSD: Chinese Dysarthria Speech Database

Figure 3 for CDSD: Chinese Dysarthria Speech Database

Figure 4 for CDSD: Chinese Dysarthria Speech Database

Abstract:We present the Chinese Dysarthria Speech Database (CDSD) as a valuable resource for dysarthria research. This database comprises speech data from 24 participants with dysarthria. Among these participants, one recorded an additional 10 hours of speech data, while each recorded one hour, resulting in 34 hours of speech material. To accommodate participants with varying cognitive levels, our text pool primarily consists of content from the AISHELL-1 dataset and speeches by primary and secondary school students. When participants read these texts, they must use a mobile device or the ZOOM F8n multi-track field recorder to record their speeches. In this paper, we elucidate the data collection and annotation processes and present an approach for establishing a baseline for dysarthric speech recognition. Furthermore, we conducted a speaker-dependent dysarthric speech recognition experiment using an additional 10 hours of speech data from one of our participants. Our research findings indicate that, through extensive data-driven model training, fine-tuning limited quantities of specific individual data yields commendable results in speaker-dependent dysarthric speech recognition. However, we observe significant variations in recognition results among different dysarthric speakers. These insights provide valuable reference points for speaker-dependent dysarthric speech recognition.

* 9 pages, 3 figures

Via

Access Paper or Ask Questions

DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models

Oct 23, 2023

Chengcheng Han, Xiaowei Du, Che Zhang, Yixin Lian, Xiang Li, Ming Gao, Baoyuan Wang

Abstract:Chain-of-Thought (CoT) prompting has proven to be effective in enhancing the reasoning capabilities of Large Language Models (LLMs) with at least 100 billion parameters. However, it is ineffective or even detrimental when applied to reasoning tasks in Smaller Language Models (SLMs) with less than 10 billion parameters. To address this limitation, we introduce Dialogue-guided Chain-of-Thought (DialCoT) which employs a dialogue format to generate intermediate reasoning steps, guiding the model toward the final answer. Additionally, we optimize the model's reasoning path selection using the Proximal Policy Optimization (PPO) algorithm, further enhancing its reasoning capabilities. Our method offers several advantages compared to previous approaches. Firstly, we transform the process of solving complex reasoning questions by breaking them down into a series of simpler sub-questions, significantly reducing the task difficulty and making it more suitable for SLMs. Secondly, we optimize the model's reasoning path selection through the PPO algorithm. We conduct comprehensive experiments on four arithmetic reasoning datasets, demonstrating that our method achieves significant performance improvements compared to state-of-the-art competitors.

* Accepted to EMNLP 2023

Via

Access Paper or Ask Questions