Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Information": models, code, and papers

Foundation Models for Natural Language Processing -- Pre-trained Language Models Integrating Media

Feb 16, 2023
Gerhard Paaß, Sven Giesselbach

This open access book provides a comprehensive overview of the state of the art in research and applications of Foundation Models and is intended for readers familiar with basic Natural Language Processing (NLP) concepts. Over the recent years, a revolutionary new paradigm has been developed for training models for NLP. These models are first pre-trained on large collections of text documents to acquire general syntactic knowledge and semantic information. Then, they are fine-tuned for specific tasks, which they can often solve with superhuman accuracy. When the models are large enough, they can be instructed by prompts to solve new tasks without any fine-tuning. Moreover, they can be applied to a wide range of different media and problem domains, ranging from image and video processing to robot control learning. Because they provide a blueprint for solving many tasks in artificial intelligence, they have been called Foundation Models. After a brief introduction to basic NLP models the main pre-trained language models BERT, GPT and sequence-to-sequence transformer are described, as well as the concepts of self-attention and context-sensitive embedding. Then, different approaches to improving these models are discussed, such as expanding the pre-training criteria, increasing the length of input texts, or including extra knowledge. An overview of the best-performing models for about twenty application areas is then presented, e.g., question answering, translation, story generation, dialog systems, generating images from text, etc. For each application area, the strengths and weaknesses of current models are discussed, and an outlook on further developments is given. In addition, links are provided to freely available program code. A concluding chapter summarizes the economic opportunities, mitigation of risks, and potential developments of AI.

* This book has been accepted by Springer Nature and will be published as an open access monograph. https://link.springer.com/book/9783031231896. It is licensed under the CC BY-NC-SA license (https://creativecommons.org/licenses/by-nc-sa/4.0/), except for the material included from other authors, which may have different licenses

Via

Access Paper or Ask Questions

MMA-RNN: A Multi-level Multi-task Attention-based Recurrent Neural Network for Discrimination and Localization of Atrial Fibrillation

Feb 09, 2023
Yifan Sun, Jingyan Shen, Yunfan Jiang, Zhaohui Huang, Minsheng Hao, Xuegong Zhang

Figure 1 for MMA-RNN: A Multi-level Multi-task Attention-based Recurrent Neural Network for Discrimination and Localization of Atrial Fibrillation

Figure 2 for MMA-RNN: A Multi-level Multi-task Attention-based Recurrent Neural Network for Discrimination and Localization of Atrial Fibrillation

Figure 3 for MMA-RNN: A Multi-level Multi-task Attention-based Recurrent Neural Network for Discrimination and Localization of Atrial Fibrillation

Figure 4 for MMA-RNN: A Multi-level Multi-task Attention-based Recurrent Neural Network for Discrimination and Localization of Atrial Fibrillation

The automatic detection of atrial fibrillation based on electrocardiograph (ECG) signals has received wide attention both clinically and practically. It is challenging to process ECG signals with cyclical pattern, varying length and unstable quality due to noise and distortion. Besides, there has been insufficient research on separating persistent atrial fibrillation from paroxysmal atrial fibrillation, and little discussion on locating the onsets and end points of AF episodes. It is even more arduous to perform well on these two distinct but interrelated tasks, while avoiding the mistakes inherent from stage-by-stage approaches. This paper proposes the Multi-level Multi-task Attention-based Recurrent Neural Network for three-class discrimination on patients and localization of the exact timing of AF episodes. Our model captures three-level sequential features based on a hierarchical architecture utilizing Bidirectional Long and Short-Term Memory Network (Bi-LSTM) and attention layers, and accomplishes the two tasks simultaneously with a multi-head classifier. The model is designed as an end-to-end framework to enhance information interaction and reduce error accumulation. Finally, we conduct experiments on CPSC 2021 dataset and the result demonstrates the superior performance of our method, indicating the potential application of MMA-RNN to wearable mobile devices for routine AF monitoring and early diagnosis.

* 9 pages, 5 figures

Via

Access Paper or Ask Questions

Learning Complex Teamwork Tasks using a Sub-task Curriculum

Feb 09, 2023
Elliot Fosong, Arrasy Rahman, Ignacio Carlucho, Stefano V. Albrecht

Figure 1 for Learning Complex Teamwork Tasks using a Sub-task Curriculum

Figure 2 for Learning Complex Teamwork Tasks using a Sub-task Curriculum

Figure 3 for Learning Complex Teamwork Tasks using a Sub-task Curriculum

Figure 4 for Learning Complex Teamwork Tasks using a Sub-task Curriculum

Training a team to complete a complex task via multi-agent reinforcement learning can be difficult due to challenges such as policy search in a large policy space, and non-stationarity caused by mutually adapting agents. To facilitate efficient learning of complex multi-agent tasks, we propose an approach which uses an expert-provided curriculum of simpler multi-agent sub-tasks. In each sub-task of the curriculum, a subset of the entire team is trained to acquire sub-task-specific policies. The sub-teams are then merged and transferred to the target task, where their policies are collectively fined tuned to solve the more complex target task. We present MEDoE, a flexible method which identifies situations in the target task where each agent can use its sub-task-specific skills, and uses this information to modulate hyperparameters for learning and exploration during the fine-tuning process. We compare MEDoE to multi-agent reinforcement learning baselines that train from scratch in the full task, and with na\"ive applications of standard multi-agent reinforcement learning techniques for fine-tuning. We show that MEDoE outperforms baselines which train from scratch or use na\"ive fine-tuning approaches, requiring significantly fewer total training timesteps to solve a range of complex teamwork tasks.

Via

Access Paper or Ask Questions

Machine Learning Capability: A standardized metric using case difficulty with applications to individualized deployment of supervised machine learning

Feb 09, 2023
Adrienne Kline, Joon Lee

Figure 1 for Machine Learning Capability: A standardized metric using case difficulty with applications to individualized deployment of supervised machine learning

Figure 2 for Machine Learning Capability: A standardized metric using case difficulty with applications to individualized deployment of supervised machine learning

Figure 3 for Machine Learning Capability: A standardized metric using case difficulty with applications to individualized deployment of supervised machine learning

Figure 4 for Machine Learning Capability: A standardized metric using case difficulty with applications to individualized deployment of supervised machine learning

Model evaluation is a critical component in supervised machine learning classification analyses. Traditional metrics do not currently incorporate case difficulty. This renders the classification results unbenchmarked for generalization. Item Response Theory (IRT) and Computer Adaptive Testing (CAT) with machine learning can benchmark datasets independent of the end-classification results. This provides high levels of case-level information regarding evaluation utility. To showcase, two datasets were used: 1) health-related and 2) physical science. For the health dataset a two-parameter IRT model, and for the physical science dataset a polytonomous IRT model, was used to analyze predictive features and place each case on a difficulty continuum. A CAT approach was used to ascertain the algorithms' performance and applicability to new data. This method provides an efficient way to benchmark data, using only a fraction of the dataset (less than 1%) and 22-60x more computationally efficient than traditional metrics. This novel metric, termed Machine Learning Capability (MLC) has additional benefits as it is unbiased to outcome classification and a standardized way to make model comparisons within and across datasets. MLC provides a metric on the limitation of supervised machine learning algorithms. In situations where the algorithm falls short, other input(s) are required for decision-making.

Via

Access Paper or Ask Questions

Leveraging task dependency and contrastive learning for Legal Judgement Prediction on the European Court of Human Rights

Feb 01, 2023
Santosh T. Y. S. S, Marcel Perez San Blas, Phillip Kemper, Matthias Grabmair

Figure 1 for Leveraging task dependency and contrastive learning for Legal Judgement Prediction on the European Court of Human Rights

Figure 2 for Leveraging task dependency and contrastive learning for Legal Judgement Prediction on the European Court of Human Rights

We report on an experiment in legal judgement prediction on European Court of Human Rights cases where our model first learns to predict the convention articles allegedly violated by the state from case facts descriptions, and subsequently utilizes that information to predict a finding of a violation by the court. We assess the dependency between these two tasks at the feature and outcome level. Furthermore, we leverage a hierarchical contrastive loss to pull together article specific representations of cases at the higher level level, leading to distinctive article clusters, and further pulls the cases in each article cluster based on their outcome leading to sub-clusters of cases with similar outcomes. Our experiment results demonstrate that, given a static pre-trained encoder, our models produce a small but consistent improvement in prediction performance over single-task and joint models without contrastive loss.

* EACL 2023

Via

Access Paper or Ask Questions

The RW3D: A multi-modal panel dataset to understand the psychological impact of the pandemic

Feb 01, 2023
Isabelle van der Vegt, Bennett Kleinberg

Figure 1 for The RW3D: A multi-modal panel dataset to understand the psychological impact of the pandemic

Figure 2 for The RW3D: A multi-modal panel dataset to understand the psychological impact of the pandemic

Figure 3 for The RW3D: A multi-modal panel dataset to understand the psychological impact of the pandemic

Figure 4 for The RW3D: A multi-modal panel dataset to understand the psychological impact of the pandemic

Besides far-reaching public health consequences, the COVID-19 pandemic had a significant psychological impact on people around the world. To gain further insight into this matter, we introduce the Real World Worry Waves Dataset (RW3D). The dataset combines rich open-ended free-text responses with survey data on emotions, significant life events, and psychological stressors in a repeated-measures design in the UK over three years (2020: n=2441, 2021: n=1716 and 2022: n=1152). This paper provides background information on the data collection procedure, the recorded variables, participants' demographics, and higher-order psychological and text-based derived variables that emerged from the data. The RW3D is a unique primary data resource that could inspire new research questions on the psychological impact of the pandemic, especially those that connect modalities (here: text data, psychological survey variables and demographics) over time.

* preprint

Via

Access Paper or Ask Questions

Reliable Natural Language Understanding with Large Language Models and Answer Set Programming

Feb 07, 2023
Abhiramon Rajasekharan, Yankai Zeng, Parth Padalkar, Gopal Gupta

Figure 1 for Reliable Natural Language Understanding with Large Language Models and Answer Set Programming

Figure 2 for Reliable Natural Language Understanding with Large Language Models and Answer Set Programming

Figure 3 for Reliable Natural Language Understanding with Large Language Models and Answer Set Programming

Figure 4 for Reliable Natural Language Understanding with Large Language Models and Answer Set Programming

Humans understand language by extracting information (meaning) from sentences, combining it with existing commonsense knowledge, and then performing reasoning to draw conclusions. While large language models (LLMs) such as GPT-3 and ChatGPT are able to leverage patterns in the text to solve a variety of NLP tasks, they fall short in problems that require reasoning. They also cannot reliably explain the answers generated for a given question. In order to emulate humans better, we propose STAR, a framework that combines LLMs with Answer Set Programming (ASP). We show how LLMs can be used to effectively extract knowledge -- represented as predicates -- from language. Goal-directed ASP is then employed to reliably reason over this knowledge. We apply the STAR framework to three different NLU tasks requiring reasoning: qualitative reasoning, mathematical reasoning, and goal-directed conversation. Our experiments reveal that STAR is able to bridge the gap of reasoning in NLU tasks, leading to significant performance improvements, especially for smaller LLMs, i.e., LLMs with a smaller number of parameters. NLU applications developed using the STAR framework are also explainable: along with the predicates generated, a justification in the form of a proof tree can be produced for a given output.

Via

Access Paper or Ask Questions

Tetris-inspired detector with neural network for radiation mapping

Feb 07, 2023
Ryotaro Okabe, Shangjie Xue, Jiankai Yu, Tongtong Liu, Benoit Forget, Stefanie Jegelka, Gordon Kohse, Lin-wen Hu, Mingda Li

Figure 1 for Tetris-inspired detector with neural network for radiation mapping

Figure 2 for Tetris-inspired detector with neural network for radiation mapping

Figure 3 for Tetris-inspired detector with neural network for radiation mapping

Figure 4 for Tetris-inspired detector with neural network for radiation mapping

In recent years, radiation mapping has attracted widespread research attention and increased public concerns on environmental monitoring. In terms of both materials and their configurations, radiation detectors have been developed to locate the directions and positions of the radiation sources. In this process, algorithm is essential in converting detector signals to radiation source information. However, due to the complex mechanisms of radiation-matter interaction and the current limitation of data collection, high-performance, low-cost radiation mapping is still challenging. Here we present a computational framework using Tetris-inspired detector pixels and machine learning for radiation mapping. Using inter-pixel padding to increase the contrast between pixels and neural network to analyze the detector readings, a detector with as few as four pixels can achieve high-resolution directional mapping. By further imposing Maximum a Posteriori (MAP) with a moving detector, further radiation position localization is achieved. Non-square, Tetris-shaped detector can further improve performance beyond the conventional grid-shaped detector. Our framework offers a new avenue for high quality radiation mapping with least number of detector pixels possible, and is anticipated to be capable to deploy for real-world radiation detection with moderate validation.

* 29 pages, 20 figures. Ryotaro Okabe and Shangjie Xue contributed equally to this work

Via

Access Paper or Ask Questions

Mind the Gap: Offline Policy Optimization for Imperfect Rewards

Feb 03, 2023
Jianxiong Li, Xiao Hu, Haoran Xu, Jingjing Liu, Xianyuan Zhan, Qing-Shan Jia, Ya-Qin Zhang

Figure 1 for Mind the Gap: Offline Policy Optimization for Imperfect Rewards

Figure 2 for Mind the Gap: Offline Policy Optimization for Imperfect Rewards

Figure 3 for Mind the Gap: Offline Policy Optimization for Imperfect Rewards

Figure 4 for Mind the Gap: Offline Policy Optimization for Imperfect Rewards

Reward function is essential in reinforcement learning (RL), serving as the guiding signal to incentivize agents to solve given tasks, however, is also notoriously difficult to design. In many cases, only imperfect rewards are available, which inflicts substantial performance loss for RL agents. In this study, we propose a unified offline policy optimization approach, \textit{RGM (Reward Gap Minimization)}, which can smartly handle diverse types of imperfect rewards. RGM is formulated as a bi-level optimization problem: the upper layer optimizes a reward correction term that performs visitation distribution matching w.r.t. some expert data; the lower layer solves a pessimistic RL problem with the corrected rewards. By exploiting the duality of the lower layer, we derive a tractable algorithm that enables sampled-based learning without any online interactions. Comprehensive experiments demonstrate that RGM achieves superior performance to existing methods under diverse settings of imperfect rewards. Further, RGM can effectively correct wrong or inconsistent rewards against expert preference and retrieve useful information from biased rewards.

* Accept by ICLR2023. The first two authors contributed equally

Via

Access Paper or Ask Questions

Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective

Feb 03, 2023
Jongwoo Ko, Seungjoon Park, Minchan Jeong, Sukjin Hong, Euijai Ahn, Du-Seong Chang, Se-Young Yun

Figure 1 for Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective

Figure 2 for Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective

Figure 3 for Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective

Figure 4 for Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective

Knowledge distillation (KD) is a highly promising method for mitigating the computational problems of pre-trained language models (PLMs). Among various KD approaches, Intermediate Layer Distillation (ILD) has been a de facto standard KD method with its performance efficacy in the NLP field. In this paper, we find that existing ILD methods are prone to overfitting to training datasets, although these methods transfer more information than the original KD. Next, we present the simple observations to mitigate the overfitting of ILD: distilling only the last Transformer layer and conducting ILD on supplementary tasks. Based on our two findings, we propose a simple yet effective consistency-regularized ILD (CR-ILD), which prevents the student model from overfitting the training dataset. Substantial experiments on distilling BERT on the GLUE benchmark and several synthetic datasets demonstrate that our proposed ILD method outperforms other KD techniques. Our code is available at https://github.com/jongwooko/CR-ILD.

* The 17th Conference of the European Chapter of the Association for Computational Linguistics (Findings)

Via

Access Paper or Ask Questions