Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections

Jul 12, 2020

Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

Figure 1 for HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections

Figure 2 for HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections

Figure 3 for HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections

Figure 4 for HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections

Share this with someone who'll enjoy it:

Abstract:Achieving state-of-the-art performance on natural language understanding tasks typically relies on fine-tuning a fresh model for every task. Consequently, this approach leads to a higher overall parameter cost, along with higher technical maintenance for serving multiple models. Learning a single multi-task model that is able to do well for all the tasks has been a challenging and yet attractive proposition. In this paper, we propose \textsc{HyperGrid}, a new approach for highly effective multi-task learning. The proposed approach is based on a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks. In order to construct the proposed hypernetwork, our method learns the interactions and composition between a global (task-agnostic) state and a local task-specific state. We apply our proposed \textsc{HyperGrid} on the current state-of-the-art T5 model, demonstrating strong performance across the GLUE and SuperGLUE benchmarks when using only a single multi-task model. Our method helps bridge the gap between fine-tuning and multi-task learning approaches.

View paper on

Share this with someone who'll enjoy it:

Title:HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections

Paper and Code