Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jinghao Deng

Clustering-Aware Negative Sampling for Unsupervised Sentence Representation

May 17, 2023

Jinghao Deng, Fanqi Wan, Tao Yang, Xiaojun Quan, Rui Wang

Abstract:Contrastive learning has been widely studied in sentence representation learning. However, earlier works mainly focus on the construction of positive examples, while in-batch samples are often simply treated as negative examples. This approach overlooks the importance of selecting appropriate negative examples, potentially leading to a scarcity of hard negatives and the inclusion of false negatives. To address these issues, we propose ClusterNS (Clustering-aware Negative Sampling), a novel method that incorporates cluster information into contrastive learning for unsupervised sentence representation learning. We apply a modified K-means clustering algorithm to supply hard negatives and recognize in-batch false negatives during training, aiming to solve the two issues in one unified framework. Experiments on semantic textual similarity (STS) tasks demonstrate that our proposed ClusterNS compares favorably with baselines in unsupervised sentence representation learning. Our code has been made publicly available.

* accepted to Finding of ACL2023, 16 pages

Via

Access Paper or Ask Questions

Orders Are Unwanted: Dynamic Deep Graph Convolutional Network for Personality Detection

Dec 06, 2022

Tao Yang, Jinghao Deng, Xiaojun Quan, Qifan Wang

Abstract:Predicting personality traits based on online posts has emerged as an important task in many fields such as social network analysis. One of the challenges of this task is assembling information from various posts into an overall profile for each user. While many previous solutions simply concatenate the posts into a long document and then encode the document by sequential or hierarchical models, they introduce unwarranted orders for the posts, which may mislead the models. In this paper, we propose a dynamic deep graph convolutional network (D-DGCN) to overcome the above limitation. Specifically, we design a learn-to-connect approach that adopts a dynamic multi-hop structure instead of a deterministic structure, and combine it with a DGCN module to automatically learn the connections between posts. The modules of post encoder, learn-to-connect, and DGCN are jointly trained in an end-to-end manner. Experimental results on the Kaggle and Pandora datasets show the superior performance of D-DGCN to state-of-the-art baselines. Our code is available at https://github.com/djz233/D-DGCN.

* The current version contains some errors. We will resubmit it after revision

Via

Access Paper or Ask Questions

AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning

Oct 12, 2022

Tao Yang, Jinghao Deng, Xiaojun Quan, Qifan Wang, Shaoliang Nie

Figure 1 for AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning

Figure 2 for AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning

Figure 3 for AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning

Figure 4 for AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning

Abstract:Fine-tuning large pre-trained language models on downstream tasks is apt to suffer from overfitting when limited training data is available. While dropout proves to be an effective antidote by randomly dropping a proportion of units, existing research has not examined its effect on the self-attention mechanism. In this paper, we investigate this problem through self-attention attribution and find that dropping attention positions with low attribution scores can accelerate training and increase the risk of overfitting. Motivated by this observation, we propose Attribution-Driven Dropout (AD-DROP), which randomly discards some high-attribution positions to encourage the model to make predictions by relying more on low-attribution positions to reduce overfitting. We also develop a cross-tuning strategy to alternate fine-tuning and AD-DROP to avoid dropping high-attribution positions excessively. Extensive experiments on various benchmarks show that AD-DROP yields consistent improvements over baselines. Analysis further confirms that AD-DROP serves as a strategic regularizer to prevent overfitting during fine-tuning.

* Accepted to NeurIPS 2022

Via

Access Paper or Ask Questions