Alert button
Picture for Feiyang Ye

Feiyang Ye

Alert button

A Scale-Invariant Task Balancing Approach for Multi-Task Learning

Aug 23, 2023
Baijiong Lin, Weisen Jiang, Feiyang Ye, Yu Zhang, Pengguang Chen, Ying-Cong Chen, Shu Liu

Figure 1 for A Scale-Invariant Task Balancing Approach for Multi-Task Learning
Figure 2 for A Scale-Invariant Task Balancing Approach for Multi-Task Learning
Figure 3 for A Scale-Invariant Task Balancing Approach for Multi-Task Learning
Figure 4 for A Scale-Invariant Task Balancing Approach for Multi-Task Learning

Multi-task learning (MTL), a learning paradigm to learn multiple related tasks simultaneously, has achieved great success in various fields. However, task-balancing remains a significant challenge in MTL, with the disparity in loss/gradient scales often leading to performance compromises. In this paper, we propose a Scale-Invariant Multi-Task Learning (SI-MTL) method to alleviate the task-balancing problem from both loss and gradient perspectives. Specifically, SI-MTL contains a logarithm transformation which is performed on all task losses to ensure scale-invariant at the loss level, and a gradient balancing method, SI-G, which normalizes all task gradients to the same magnitude as the maximum gradient norm. Extensive experiments conducted on several benchmark datasets consistently demonstrate the effectiveness of SI-G and the state-of-the-art performance of SI-MTL.

* Technical Report 
Viaarxiv icon

A Fast Attention Network for Joint Intent Detection and Slot Filling on Edge Devices

May 16, 2022
Liang Huang, Senjie Liang, Feiyang Ye, Nan Gao

Figure 1 for A Fast Attention Network for Joint Intent Detection and Slot Filling on Edge Devices
Figure 2 for A Fast Attention Network for Joint Intent Detection and Slot Filling on Edge Devices
Figure 3 for A Fast Attention Network for Joint Intent Detection and Slot Filling on Edge Devices
Figure 4 for A Fast Attention Network for Joint Intent Detection and Slot Filling on Edge Devices

Intent detection and slot filling are two main tasks in natural language understanding and play an essential role in task-oriented dialogue systems. The joint learning of both tasks can improve inference accuracy and is popular in recent works. However, most joint models ignore the inference latency and cannot meet the need to deploy dialogue systems at the edge. In this paper, we propose a Fast Attention Network (FAN) for joint intent detection and slot filling tasks, guaranteeing both accuracy and latency. Specifically, we introduce a clean and parameter-refined attention module to enhance the information exchange between intent and slot, improving semantic accuracy by more than 2%. FAN can be implemented on different encoders and delivers more accurate models at every speed level. Our experiments on the Jetson Nano platform show that FAN inferences fifteen utterances per second with a small accuracy drop, showing its effectiveness and efficiency on edge devices.

* 9 pages, 4 figures 
Viaarxiv icon

A Closer Look at Loss Weighting in Multi-Task Learning

Nov 20, 2021
Baijiong Lin, Feiyang Ye, Yu Zhang

Figure 1 for A Closer Look at Loss Weighting in Multi-Task Learning
Figure 2 for A Closer Look at Loss Weighting in Multi-Task Learning
Figure 3 for A Closer Look at Loss Weighting in Multi-Task Learning
Figure 4 for A Closer Look at Loss Weighting in Multi-Task Learning

Multi-Task Learning (MTL) has achieved great success in various fields, however, how to balance different tasks to avoid negative effects is still a key problem. To achieve the task balancing, there exist many works to balance task losses or gradients. In this paper, we unify eight representative task balancing methods from the perspective of loss weighting and provide a consistent experimental comparison. Moreover, we surprisingly find that training a MTL model with random weights sampled from a distribution can achieve comparable performance over state-of-the-art baselines. Based on this finding, we propose a simple yet effective weighting strategy called Random Loss Weighting (RLW), which can be implemented in only one additional line of code over existing works. Theoretically, we analyze the convergence of RLW and reveal that RLW has a higher probability to escape local minima than existing models with fixed task weights, resulting in a better generalization ability. Empirically, we extensively evaluate the proposed RLW method on six image datasets and four multilingual tasks from the XTREME benchmark to show the effectiveness of the proposed RLW strategy when compared with state-of-the-art strategies.

Viaarxiv icon

Safe Multi-Task Learning

Nov 20, 2021
Pengxin Guo, Feiyang Ye, Yu Zhang

Figure 1 for Safe Multi-Task Learning
Figure 2 for Safe Multi-Task Learning
Figure 3 for Safe Multi-Task Learning
Figure 4 for Safe Multi-Task Learning

In recent years, Multi-Task Learning (MTL) attracts much attention due to its good performance in many applications. However, many existing MTL models cannot guarantee that its performance is no worse than its single-task counterpart on each task. Though this phenomenon has been empirically observed by some works, little work aims to handle the resulting problem, which is formally defined as negative sharing in this paper. To achieve safe multi-task learning where no \textit{negative sharing} occurs, we propose a Safe Multi-Task Learning (SMTL) model, which consists of a public encoder shared by all the tasks, private encoders, gates, and private decoders. Specifically, each task has a private encoder, a gate, and a private decoder, where the gate is to learn how to combine the private encoder and public encoder for the downstream private decoder. To reduce the storage cost during the inference stage, a lite version of SMTL is proposed to allow the gate to choose either the public encoder or the corresponding private encoder. Moreover, we propose a variant of SMTL to place all the gates after decoders of all the tasks. Experiments on several benchmark datasets demonstrate the effectiveness of the proposed methods.

Viaarxiv icon

Multi-Objective Meta Learning

Feb 14, 2021
Feiyang Ye, Baijiong Lin, Zhixiong Yue, Pengxin Guo, Qiao Xiao, Yu Zhang

Figure 1 for Multi-Objective Meta Learning
Figure 2 for Multi-Objective Meta Learning
Figure 3 for Multi-Objective Meta Learning
Figure 4 for Multi-Objective Meta Learning

Meta learning with multiple objectives can be formulated as a Multi-Objective Bi-Level optimization Problem (MOBLP) where the upper-level subproblem is to solve several possible conflicting targets for the meta learner. However, existing studies either apply an inefficient evolutionary algorithm or linearly combine multiple objectives as a single-objective problem with the need to tune combination weights. In this paper, we propose a unified gradient-based Multi-Objective Meta Learning (MOML) framework and devise the first gradient-based optimization algorithm to solve the MOBLP by alternatively solving the lower-level and upper-level subproblems via the gradient descent method and the gradient-based multi-objective optimization method, respectively. Theoretically, we prove the convergence properties of the proposed gradient-based optimization algorithm. Empirically, we show the effectiveness of the proposed MOML framework in several meta learning problems, including few-shot learning, neural architecture search, domain adaptation, and multi-task learning.

Viaarxiv icon