Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aixin Sun

Few-shot Event Detection: An Empirical Study and a Unified View

May 03, 2023
Yubo Ma, Zehao Wang, Yixin Cao, Aixin Sun

Figure 1 for Few-shot Event Detection: An Empirical Study and a Unified View

Figure 2 for Few-shot Event Detection: An Empirical Study and a Unified View

Figure 3 for Few-shot Event Detection: An Empirical Study and a Unified View

Figure 4 for Few-shot Event Detection: An Empirical Study and a Unified View

Few-shot event detection (ED) has been widely studied, while this brings noticeable discrepancies, e.g., various motivations, tasks, and experimental settings, that hinder the understanding of models for future progress. This paper presents a thorough empirical study, a unified view of ED models, and a better unified baseline. For fair evaluation, we choose two practical settings: low-resource setting to assess generalization ability and class-transfer setting for transferability. We compare ten representative methods on three datasets, which are roughly grouped into prompt-based and prototype-based models for detailed analysis. To investigate the superior performance of prototype-based methods, we break down the design and build a unified framework. Based on that, we not only propose a simple yet effective method (e.g., 2.7% F1 gains under low-resource setting) but also offer many valuable research insights for future research.

* Accepted by ACL 2023 main conference

Via

Access Paper or Ask Questions

FreeLM: Fine-Tuning-Free Language Model

May 02, 2023
Xiang Li, Xin Jiang, Xuying Meng, Aixin Sun, Yequan Wang

Figure 1 for FreeLM: Fine-Tuning-Free Language Model

Figure 2 for FreeLM: Fine-Tuning-Free Language Model

Figure 3 for FreeLM: Fine-Tuning-Free Language Model

Figure 4 for FreeLM: Fine-Tuning-Free Language Model

Pre-trained language models (PLMs) have achieved remarkable success in NLP tasks. Despite the great success, mainstream solutions largely follow the pre-training then finetuning paradigm, which brings in both high deployment costs and low training efficiency. Nevertheless, fine-tuning on a specific task is essential because PLMs are only pre-trained with language signal from large raw data. In this paper, we propose a novel fine-tuning-free strategy for language models, to consider both language signal and teacher signal. Teacher signal is an abstraction of a battery of downstream tasks, provided in a unified proposition format. Trained with both language and strong task-aware teacher signals in an interactive manner, our FreeLM model demonstrates strong generalization and robustness. FreeLM outperforms large models e.g., GPT-3 and InstructGPT, on a range of language understanding tasks in experiments. FreeLM is much smaller with 0.3B parameters, compared to 175B in these models.

Via

Access Paper or Ask Questions

DiffuRec: A Diffusion Model for Sequential Recommendation

Apr 09, 2023
Zihao Li, Aixin Sun, Chenliang Li

Figure 1 for DiffuRec: A Diffusion Model for Sequential Recommendation

Figure 2 for DiffuRec: A Diffusion Model for Sequential Recommendation

Figure 3 for DiffuRec: A Diffusion Model for Sequential Recommendation

Figure 4 for DiffuRec: A Diffusion Model for Sequential Recommendation

Mainstream solutions to Sequential Recommendation (SR) represent items with fixed vectors. These vectors have limited capability in capturing items' latent aspects and users' diverse preferences. As a new generative paradigm, Diffusion models have achieved excellent performance in areas like computer vision and natural language processing. To our understanding, its unique merit in representation generation well fits the problem setting of sequential recommendation. In this paper, we make the very first attempt to adapt Diffusion model to SR and propose DiffuRec, for item representation construction and uncertainty injection. Rather than modeling item representations as fixed vectors, we represent them as distributions in DiffuRec, which reflect user's multiple interests and item's various aspects adaptively. In diffusion phase, DiffuRec corrupts the target item embedding into a Gaussian distribution via noise adding, which is further applied for sequential item distribution representation generation and uncertainty injection. Afterwards, the item representation is fed into an Approximator for target item representation reconstruction. In reversion phase, based on user's historical interaction behaviors, we reverse a Gaussian noise into the target item representation, then apply rounding operation for target item prediction. Experiments over four datasets show that DiffuRec outperforms strong baselines by a large margin.

Via

Access Paper or Ask Questions

GCRE-GPT: A Generative Model for Comparative Relation Extraction

Mar 15, 2023
Yequan Wang, Hengran Zhang, Aixin Sun, Xuying Meng

Figure 1 for GCRE-GPT: A Generative Model for Comparative Relation Extraction

Figure 2 for GCRE-GPT: A Generative Model for Comparative Relation Extraction

Figure 3 for GCRE-GPT: A Generative Model for Comparative Relation Extraction

Figure 4 for GCRE-GPT: A Generative Model for Comparative Relation Extraction

Given comparative text, comparative relation extraction aims to extract two targets (\eg two cameras) in comparison and the aspect they are compared for (\eg image quality). The extracted comparative relations form the basis of further opinion analysis.Existing solutions formulate this task as a sequence labeling task, to extract targets and aspects. However, they cannot directly extract comparative relation(s) from text. In this paper, we show that comparative relations can be directly extracted with high accuracy, by generative model. Based on GPT-2, we propose a Generation-based Comparative Relation Extractor (GCRE-GPT). Experiment results show that \modelname achieves state-of-the-art accuracy on two datasets.

Via

Access Paper or Ask Questions

Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples!

Mar 15, 2023
Yubo Ma, Yixin Cao, YongChing Hong, Aixin Sun

Figure 1 for Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples!

Figure 2 for Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples!

Figure 3 for Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples!

Figure 4 for Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples!

Large Language Models (LLMs) have made remarkable strides in various tasks. However, whether they are competitive few-shot solvers for information extraction (IE) tasks and surpass fine-tuned small Pre-trained Language Models (SLMs) remains an open problem. This paper aims to provide a thorough answer to this problem, and moreover, to explore an approach towards effective and economical IE systems that combine the strengths of LLMs and SLMs. Through extensive experiments on eight datasets across three IE tasks, we show that LLMs are not effective few-shot information extractors in general, given their unsatisfactory performance in most settings and the high latency and budget requirements. However, we demonstrate that LLMs can well complement SLMs and effectively solve hard samples that SLMs struggle with. Building on these findings, we propose an adaptive filter-then-rerank paradigm, in which SLMs act as filters and LLMs act as rerankers. By utilizing LLMs to rerank a small portion of difficult samples identified by SLMs, our preliminary system consistently achieves promising improvements (2.1% F1-gain on average) on various IE tasks, with acceptable cost of time and money.

Via

Access Paper or Ask Questions

Dataset vs Reality: Understanding Model Performance from the Perspective of Information Need

Dec 06, 2022
Mengying Yu, Aixin Sun

Figure 1 for Dataset vs Reality: Understanding Model Performance from the Perspective of Information Need

Figure 2 for Dataset vs Reality: Understanding Model Performance from the Perspective of Information Need

Figure 3 for Dataset vs Reality: Understanding Model Performance from the Perspective of Information Need

Figure 4 for Dataset vs Reality: Understanding Model Performance from the Perspective of Information Need

Deep learning technologies have brought us many models that outperform human beings on a few benchmarks. An interesting question is: can these models well solve real-world problems with similar settings (e.g., same input/output) to the benchmark datasets? We argue that a model is trained to answer the same information need for which the training dataset is created. Although some datasets may share high structural similarities, e.g., question-answer pairs for the question answering (QA) task and image-caption pairs for the image captioning (IC) task, not all datasets are created for the same information need. To support our argument, we conduct a comprehensive analysis on widely used benchmark datasets for both QA and IC tasks. We compare the dataset creation process (e.g., crowdsourced, or collected data from real users or content providers) from the perspective of information need in the context of information retrieval. To show the differences between datasets, we perform both word-level and sentence-level analysis. We show that data collected from real users or content providers tend to have richer, more diverse, and more specific words than data annotated by crowdworkers. At sentence level, data by crowdworkers share similar dependency distributions and higher similarities in sentence structure, compared to data collected from content providers. We believe our findings could partially explain why some datasets are considered more challenging than others, for similar tasks. Our findings may also be helpful in guiding new dataset construction.

* 17 pages, 5 figures

Via

Access Paper or Ask Questions

Syntactic Multi-view Learning for Open Information Extraction

Dec 05, 2022
Kuicai Dong, Aixin Sun, Jung-Jae Kim, Xiaoli Li

Figure 1 for Syntactic Multi-view Learning for Open Information Extraction

Figure 2 for Syntactic Multi-view Learning for Open Information Extraction

Figure 3 for Syntactic Multi-view Learning for Open Information Extraction

Figure 4 for Syntactic Multi-view Learning for Open Information Extraction

Open Information Extraction (OpenIE) aims to extract relational tuples from open-domain sentences. Traditional rule-based or statistical models have been developed based on syntactic structures of sentences, identified by syntactic parsers. However, previous neural OpenIE models under-explore the useful syntactic information. In this paper, we model both constituency and dependency trees into word-level graphs, and enable neural OpenIE to learn from the syntactic structures. To better fuse heterogeneous information from both graphs, we adopt multi-view learning to capture multiple relationships from them. Finally, the finetuned constituency and dependency representations are aggregated with sentential semantic representations for tuple generation. Experiments show that both constituency and dependency information, and the multi-view learning are effective.

* EMNLP 2022
* To appear in EMNLP 2022

Via

Access Paper or Ask Questions

Perplexity from PLM Is Unreliable for Evaluating Text Quality

Oct 12, 2022
Yequan Wang, Jiawen Deng, Aixin Sun, Xuying Meng

Figure 1 for Perplexity from PLM Is Unreliable for Evaluating Text Quality

Figure 2 for Perplexity from PLM Is Unreliable for Evaluating Text Quality

Figure 3 for Perplexity from PLM Is Unreliable for Evaluating Text Quality

Figure 4 for Perplexity from PLM Is Unreliable for Evaluating Text Quality

Recently, amounts of works utilize perplexity~(PPL) to evaluate the quality of the generated text. They suppose that if the value of PPL is smaller, the quality(i.e. fluency) of the text to be evaluated is better. However, we find that the PPL referee is unqualified and it cannot evaluate the generated text fairly for the following reasons: (i) The PPL of short text is larger than long text, which goes against common sense, (ii) The repeated text span could damage the performance of PPL, and (iii) The punctuation marks could affect the performance of PPL heavily. Experiments show that the PPL is unreliable for evaluating the quality of given text. Last, we discuss the key problems with evaluating text quality using language models.

Via

Access Paper or Ask Questions

From Counter-intuitive Observations to a Fresh Look at Recommender System

Oct 09, 2022
Aixin Sun

Figure 1 for From Counter-intuitive Observations to a Fresh Look at Recommender System

Figure 2 for From Counter-intuitive Observations to a Fresh Look at Recommender System

Figure 3 for From Counter-intuitive Observations to a Fresh Look at Recommender System

Figure 4 for From Counter-intuitive Observations to a Fresh Look at Recommender System

Recently, a few papers report counter-intuitive observations made from experiments on recommender system (RecSys). One observation is that users who spend more time and users who have many interactions with a recommendation system receive poorer recommendations. Another observation is that models trained by using only the more recent parts of a dataset show significant performance improvement. In this opinion paper, we interpret these counter-intuitive observations from two perspectives. First, the observations are made with respect to the global timeline of user-item interactions. Second, the observations are considered counter-intuitive because they contradict our expectation on a recommender: the more interactions a user has, the higher chance that the recommender better learns the user preference. For the first perspective, we discuss the importance of the global timeline by using the simplest baseline Popularity as a starting point. We answer two questions: (i) why the simplest model popularity is often ill-defined in academic research? and (ii) why the popularity baseline is evaluated in this way? The questions lead to a detailed discussion on the data leakage issue in many offline evaluations. As the result, model accuracies reported in many academic papers are less meaningful and incomparable. For the second perspective, we try to answer two more questions: (i) why models trained by using only the more recent parts of data demonstrate better performance? and (ii) why more interactions from users lead to poorer recommendations? The key to both questions is user preference modeling. We then propose to have a fresh look at RecSys. We discuss how to conduct more practical offline evaluations and possible ways to effectively model user preferences. The discussion and opinions in this paper are on top-N recommendation only, not on rating prediction.

* 11 pages, 5 figures

Via

Access Paper or Ask Questions