Alert button
Picture for Taeuk Kim

Taeuk Kim

Alert button

Prompt-Augmented Linear Probing: Scaling Beyond The Limit of Few-shot In-Context Learners

Dec 28, 2022
Hyunsoo Cho, Hyuhng Joon Kim, Junyeob Kim, Sang-Woo Lee, Sang-goo Lee, Kang Min Yoo, Taeuk Kim

Figure 1 for Prompt-Augmented Linear Probing: Scaling Beyond The Limit of Few-shot In-Context Learners
Figure 2 for Prompt-Augmented Linear Probing: Scaling Beyond The Limit of Few-shot In-Context Learners
Figure 3 for Prompt-Augmented Linear Probing: Scaling Beyond The Limit of Few-shot In-Context Learners
Figure 4 for Prompt-Augmented Linear Probing: Scaling Beyond The Limit of Few-shot In-Context Learners

Through in-context learning (ICL), large-scale language models are effective few-shot learners without additional model fine-tuning. However, the ICL performance does not scale well with the number of available training samples as it is limited by the inherent input length constraint of the underlying language model. Meanwhile, many studies have revealed that language models are also powerful feature extractors, allowing them to be utilized in a black-box manner and enabling the linear probing paradigm, where lightweight discriminators are trained on top of the pre-extracted input representations. This paper proposes prompt-augmented linear probing (PALP), a hybrid of linear probing and ICL, which leverages the best of both worlds. PALP inherits the scalability of linear probing and the capability of enforcing language models to derive more meaningful representations via tailoring input into a more conceivable form. Throughout in-depth investigations on various datasets, we verified that PALP significantly enhances the input representations closing the gap between ICL in the data-hungry scenario and fine-tuning in the data-abundant scenario with little training overhead, potentially making PALP a strong alternative in a black-box scenario.

* AAAI 2023 
Viaarxiv icon

Self-Generated In-Context Learning: Leveraging Auto-regressive Language Models as a Demonstration Generator

Jun 16, 2022
Hyuhng Joon Kim, Hyunsoo Cho, Junyeob Kim, Taeuk Kim, Kang Min Yoo, Sang-goo Lee

Figure 1 for Self-Generated In-Context Learning: Leveraging Auto-regressive Language Models as a Demonstration Generator
Figure 2 for Self-Generated In-Context Learning: Leveraging Auto-regressive Language Models as a Demonstration Generator
Figure 3 for Self-Generated In-Context Learning: Leveraging Auto-regressive Language Models as a Demonstration Generator
Figure 4 for Self-Generated In-Context Learning: Leveraging Auto-regressive Language Models as a Demonstration Generator

Large-scale pre-trained language models (PLMs) are well-known for being capable of solving a task simply by conditioning a few input-label pairs dubbed demonstrations on a prompt without being explicitly tuned for the desired downstream task. Such a process (i.e., in-context learning), however, naturally leads to high reliance on the demonstrations which are usually selected from external datasets. In this paper, we propose self-generated in-context learning (SG-ICL), which generates demonstrations for in-context learning from PLM itself to minimize the reliance on the external demonstration. We conduct experiments on four different text classification tasks and show SG-ICL significantly outperforms zero-shot learning and is generally worth approximately 0.6 gold training samples. Moreover, our generated demonstrations show more consistent performance with low variance compared to randomly selected demonstrations from the training dataset.

* NAACL 2022 Workshop on Large-scale Pre-trained Language Models 
Viaarxiv icon

Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations

May 25, 2022
Junyeob Kim, Hyuhng Joon Kim, Hyunsoo Cho, Hwiyeol Jo, Sang-Woo Lee, Sang-goo Lee, Kang Min Yoo, Taeuk Kim

Figure 1 for Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations
Figure 2 for Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations
Figure 3 for Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations
Figure 4 for Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations

Despite recent explosion in research interests, in-context learning and the precise impact of the quality of demonstrations remain elusive. While, based on current literature, it is expected that in-context learning shares a similar mechanism to supervised learning, Min et al. (2022) recently reported that, surprisingly, input-label correspondence is less important than other aspects of prompt demonstrations. Inspired by this counter-intuitive observation, we re-examine the importance of ground truth labels on in-context learning from diverse and statistical points of view. With the aid of the newly introduced metrics, i.e., Ground-truth Label Effect Ratio (GLER), demo-gain, and label sensitivity, we find that the impact of the correct input-label matching can vary according to different configurations. Expanding upon the previous key finding on the role of demonstrations, the complementary and contrastive results suggest that one might need to take more care when estimating the impact of each component in in-context learning demonstrations.

Viaarxiv icon

Self-Guided Contrastive Learning for BERT Sentence Representations

Jun 03, 2021
Taeuk Kim, Kang Min Yoo, Sang-goo Lee

Figure 1 for Self-Guided Contrastive Learning for BERT Sentence Representations
Figure 2 for Self-Guided Contrastive Learning for BERT Sentence Representations
Figure 3 for Self-Guided Contrastive Learning for BERT Sentence Representations
Figure 4 for Self-Guided Contrastive Learning for BERT Sentence Representations

Although BERT and its variants have reshaped the NLP landscape, it still remains unclear how best to derive sentence embeddings from such pre-trained Transformers. In this work, we propose a contrastive learning method that utilizes self-guidance for improving the quality of BERT sentence representations. Our method fine-tunes BERT in a self-supervised fashion, does not rely on data augmentation, and enables the usual [CLS] token embeddings to function as sentence vectors. Moreover, we redesign the contrastive learning objective (NT-Xent) and apply it to sentence representation learning. We demonstrate with extensive experiments that our approach is more effective than competitive baselines on diverse sentence-related tasks. We also show it is efficient at inference and robust to domain shifts.

* ACL 2021 
Viaarxiv icon

Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads

Oct 19, 2020
Bowen Li, Taeuk Kim, Reinald Kim Amplayo, Frank Keller

Figure 1 for Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads
Figure 2 for Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads
Figure 3 for Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads
Figure 4 for Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads

Transformer-based pre-trained language models (PLMs) have dramatically improved the state of the art in NLP across many tasks. This has led to substantial interest in analyzing the syntactic knowledge PLMs learn. Previous approaches to this question have been limited, mostly using test suites or probes. Here, we propose a novel fully unsupervised parsing approach that extracts constituency trees from PLM attention heads. We rank transformer attention heads based on their inherent properties, and create an ensemble of high-ranking heads to produce the final tree. Our method is adaptable to low-resource languages, as it does not rely on development sets, which can be expensive to annotate. Our experiments show that the proposed method often outperform existing approaches if there is no development set present. Our unsupervised parser can also be used as a tool to analyze the grammars PLMs learn implicitly. For this, we use the parse trees induced by our method to train a neural PCFG and compare it to a grammar derived from a human-annotated treebank.

* AACL-IJCNLP 2020 
Viaarxiv icon

IDS at SemEval-2020 Task 10: Does Pre-trained Language Model Know What to Emphasize?

Jul 24, 2020
Jaeyoul Shin, Taeuk Kim, Sang-goo Lee

Figure 1 for IDS at SemEval-2020 Task 10: Does Pre-trained Language Model Know What to Emphasize?
Figure 2 for IDS at SemEval-2020 Task 10: Does Pre-trained Language Model Know What to Emphasize?
Figure 3 for IDS at SemEval-2020 Task 10: Does Pre-trained Language Model Know What to Emphasize?
Figure 4 for IDS at SemEval-2020 Task 10: Does Pre-trained Language Model Know What to Emphasize?

We propose a novel method that enables us to determine words that deserve to be emphasized from written text in visual media, relying only on the information from the self-attention distributions of pre-trained language models (PLMs). With extensive experiments and analyses, we show that 1) our zero-shot approach is superior to a reasonable baseline that adopts TF-IDF and that 2) there exist several attention heads in PLMs specialized for emphasis selection, confirming that PLMs are capable of recognizing important words in sentences.

Viaarxiv icon

Multilingual Zero-shot Constituency Parsing

Apr 08, 2020
Taeuk Kim, Sang-goo Lee

Figure 1 for Multilingual Zero-shot Constituency Parsing
Figure 2 for Multilingual Zero-shot Constituency Parsing
Figure 3 for Multilingual Zero-shot Constituency Parsing
Figure 4 for Multilingual Zero-shot Constituency Parsing

Zero-shot constituency parsing aims to extract parse trees from neural models such as pre-trained language models (PLMs) without further training or the need to train an additional parser. This paper improves upon existing zero-shot parsing paradigms by introducing a novel chart-based parsing method, showing gains in zero-shot parsing performance. Furthermore, we attempt to broaden the range of zero-shot parsing applications by examining languages other than English and by utilizing multilingual models, demonstrating that it is feasible to generate parse tree-like structures for sentences in eight other languages using our method.

* preprint 
Viaarxiv icon

Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction

Jan 30, 2020
Taeuk Kim, Jihun Choi, Daniel Edmiston, Sang-goo Lee

Figure 1 for Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction
Figure 2 for Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction
Figure 3 for Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction
Figure 4 for Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction

With the recent success and popularity of pre-trained language models (LMs) in natural language processing, there has been a rise in efforts to understand their inner workings. In line with such interest, we propose a novel method that assists us in investigating the extent to which pre-trained LMs capture the syntactic notion of constituency. Our method provides an effective way of extracting constituency trees from the pre-trained LMs without training. In addition, we report intriguing findings in the induced trees, including the fact that pre-trained LMs outperform other approaches in correctly demarcating adverb phrases in sentences.

* ICLR 2020 
Viaarxiv icon

Summary Level Training of Sentence Rewriting for Abstractive Summarization

Sep 26, 2019
Sanghwan Bae, Taeuk Kim, Jihoon Kim, Sang-goo Lee

Figure 1 for Summary Level Training of Sentence Rewriting for Abstractive Summarization
Figure 2 for Summary Level Training of Sentence Rewriting for Abstractive Summarization
Figure 3 for Summary Level Training of Sentence Rewriting for Abstractive Summarization
Figure 4 for Summary Level Training of Sentence Rewriting for Abstractive Summarization

As an attempt to combine extractive and abstractive summarization, Sentence Rewriting models adopt the strategy of extracting salient sentences from a document first and then paraphrasing the selected ones to generate a summary. However, the existing models in this framework mostly rely on sentence-level rewards or suboptimal labels, causing a mismatch between a training objective and evaluation metric. In this paper, we present a novel training signal that directly maximizes summary-level ROUGE scores through reinforcement learning. In addition, we incorporate BERT into our model, making good use of its ability on natural language understanding. In extensive experiments, we show that a combination of our proposed model and training procedure obtains new state-of-the-art performance on both CNN/Daily Mail and New York Times datasets. We also demonstrate that it generalizes better on DUC-2002 test set.

* EMNLP 2019 Workshop on New Frontiers in Summarization 
Viaarxiv icon