Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fangzhao Wu

One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers

Jun 02, 2021

Chuhan Wu, Fangzhao Wu, Yongfeng Huang

Figure 1 for One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers

Figure 2 for One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers

Figure 3 for One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers

Figure 4 for One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers

Abstract:Pre-trained language models (PLMs) achieve great success in NLP. However, their huge model sizes hinder their applications in many practical systems. Knowledge distillation is a popular technique to compress PLMs, which learns a small student model from a large teacher PLM. However, the knowledge learned from a single teacher may be limited and even biased, resulting in low-quality student model. In this paper, we propose a multi-teacher knowledge distillation framework named MT-BERT for pre-trained language model compression, which can train high-quality student model from multiple teacher PLMs. In MT-BERT we design a multi-teacher co-finetuning method to jointly finetune multiple teacher PLMs in downstream tasks with shared pooling and prediction layers to align their output space for better collaborative teaching. In addition, we propose a multi-teacher hidden loss and a multi-teacher distillation loss to transfer the useful knowledge in both hidden states and soft labels from multiple teacher PLMs to the student model. Experiments on three benchmark datasets validate the effectiveness of MT-BERT in compressing PLMs.

* Findings of ACL-IJCNLP 2021

Via

Access Paper or Ask Questions

Rethinking InfoNCE: How Many Negative Samples Do You Need?

May 27, 2021

Chuhan Wu, Fangzhao Wu, Yongfeng Huang

Figure 1 for Rethinking InfoNCE: How Many Negative Samples Do You Need?

Figure 2 for Rethinking InfoNCE: How Many Negative Samples Do You Need?

Figure 3 for Rethinking InfoNCE: How Many Negative Samples Do You Need?

Figure 4 for Rethinking InfoNCE: How Many Negative Samples Do You Need?

Abstract:InfoNCE loss is a widely used loss function for contrastive model training. It aims to estimate the mutual information between a pair of variables by discriminating between each positive pair and its associated $K$ negative pairs. It is proved that when the sample labels are clean, the lower bound of mutual information estimation is tighter when more negative samples are incorporated, which usually yields better model performance. However, in many real-world tasks the labels often contain noise, and incorporating too many noisy negative samples for model training may be suboptimal. In this paper, we study how many negative samples are optimal for InfoNCE in different scenarios via a semi-quantitative theoretical framework. More specifically, we first propose a probabilistic model to analyze the influence of the negative sampling ratio $K$ on training sample informativeness. Then, we design a training effectiveness function to measure the overall influence of training samples on model learning based on their informativeness. We estimate the optimal negative sampling ratio using the $K$ value that maximizes the training effectiveness function. Based on our framework, we further propose an adaptive negative sampling method that can dynamically adjust the negative sampling ratio to improve InfoNCE based model training. Extensive experiments on different real-world datasets show our framework can accurately predict the optimal negative sampling ratio in different tasks, and our proposed adaptive negative sampling method can achieve better performance than the commonly used fixed negative sampling ratio strategy.

Via

Access Paper or Ask Questions

Killing Two Birds with One Stone: Stealing Model and Inferring Attribute from BERT-based APIs

May 23, 2021

Lingjuan Lyu, Xuanli He, Fangzhao Wu, Lichao Sun

Figure 1 for Killing Two Birds with One Stone: Stealing Model and Inferring Attribute from BERT-based APIs

Figure 2 for Killing Two Birds with One Stone: Stealing Model and Inferring Attribute from BERT-based APIs

Figure 3 for Killing Two Birds with One Stone: Stealing Model and Inferring Attribute from BERT-based APIs

Figure 4 for Killing Two Birds with One Stone: Stealing Model and Inferring Attribute from BERT-based APIs

Abstract:The advances in pre-trained models (e.g., BERT, XLNET and etc) have largely revolutionized the predictive performance of various modern natural language processing tasks. This allows corporations to provide machine learning as a service (MLaaS) by encapsulating fine-tuned BERT-based models as commercial APIs. However, previous works have discovered a series of vulnerabilities in BERT- based APIs. For example, BERT-based APIs are vulnerable to both model extraction attack and adversarial example transferrability attack. However, due to the high capacity of BERT-based APIs, the fine-tuned model is easy to be overlearned, what kind of information can be leaked from the extracted model remains unknown and is lacking. To bridge this gap, in this work, we first present an effective model extraction attack, where the adversary can practically steal a BERT-based API (the target/victim model) by only querying a limited number of queries. We further develop an effective attribute inference attack to expose the sensitive attribute of the training data used by the BERT-based APIs. Our extensive experiments on benchmark datasets under various realistic settings demonstrate the potential vulnerabilities of BERT-based APIs.

* paper under review

Via

Access Paper or Ask Questions

Personalized News Recommendation with Knowledge-aware Interactive Matching

Apr 20, 2021

Tao Qi, Fangzhao Wu, Chuhan Wu, Yongfeng Huang

Figure 1 for Personalized News Recommendation with Knowledge-aware Interactive Matching

Figure 2 for Personalized News Recommendation with Knowledge-aware Interactive Matching

Figure 3 for Personalized News Recommendation with Knowledge-aware Interactive Matching

Figure 4 for Personalized News Recommendation with Knowledge-aware Interactive Matching

Abstract:The core of personalized news recommendation is accurate matching between candidate news and user interest. Most existing news recommendation methods usually model candidate news from its textual content and model users' interest from their clicked news, independently. However, a news article may cover multiple aspects and entities, and a user may have multiple interests. Independent modeling of candidate news and user interest may lead to inferior matching between news and users. In this paper, we propose a knowledge-aware interactive matching framework for personalized news recommendation. Our method can interactively model candidate news and user interest to learn user-aware candidate news representation and candidate news-aware user interest representation, which can facilitate the accurate matching between user interest and candidate news. More specifically, we propose a knowledge co-encoder to interactively learn knowledge-based news representations for both clicked news and candidate news by capturing their relatedness in entities with the help of knowledge graphs. In addition, we propose a text co-encoder to interactively learn text-based news representation for clicked news and candidate news by modeling the semantic relatedness between their texts. Besides, we propose a user-news co-encoder to learn candidate news-aware user interest representation and user-aware candidate news representation from the knowledge- and text-based representations of candidate news and clicked news for better interest matching. Through extensive experiments on two real-world datasets, we demonstrate our method can effectively improve the performance of news recommendation.

* SIGIR 2021

Via

Access Paper or Ask Questions

Empowering News Recommendation with Pre-trained Language Models

Apr 15, 2021

Chuhan Wu, Fangzhao Wu, Tao Qi, Yongfeng Huang

Figure 1 for Empowering News Recommendation with Pre-trained Language Models

Figure 2 for Empowering News Recommendation with Pre-trained Language Models

Figure 3 for Empowering News Recommendation with Pre-trained Language Models

Figure 4 for Empowering News Recommendation with Pre-trained Language Models

Abstract:Personalized news recommendation is an essential technique for online news services. News articles usually contain rich textual content, and accurate news modeling is important for personalized news recommendation. Existing news recommendation methods mainly model news texts based on traditional text modeling methods, which is not optimal for mining the deep semantic information in news texts. Pre-trained language models (PLMs) are powerful for natural language understanding, which has the potential for better news modeling. However, there is no public report that show PLMs have been applied to news recommendation. In this paper, we report our work on exploiting pre-trained language models to empower news recommendation. Offline experimental results on both monolingual and multilingual news recommendation datasets show that leveraging PLMs for news modeling can effectively improve the performance of news recommendation. Our PLM-empowered news recommendation models have been deployed to the Microsoft News platform, and achieved significant gains in terms of both click and pageview in both English-speaking and global markets.

* To appear in SIGIR 2021

Via

Access Paper or Ask Questions

MM-Rec: Multimodal News Recommendation

Apr 15, 2021

Chuhan Wu, Fangzhao Wu, Tao Qi, Yongfeng Huang

Figure 1 for MM-Rec: Multimodal News Recommendation

Figure 2 for MM-Rec: Multimodal News Recommendation

Figure 3 for MM-Rec: Multimodal News Recommendation

Figure 4 for MM-Rec: Multimodal News Recommendation

Abstract:Accurate news representation is critical for news recommendation. Most of existing news representation methods learn news representations only from news texts while ignore the visual information in news like images. In fact, users may click news not only because of the interest in news titles but also due to the attraction of news images. Thus, images are useful for representing news and predicting user behaviors. In this paper, we propose a multimodal news recommendation method, which can incorporate both textual and visual information of news to learn multimodal news representations. We first extract region-of-interests (ROIs) from news images via objective detection. Then we use a pre-trained visiolinguistic model to encode both news texts and news image ROIs and model their inherent relatedness using co-attentional Transformers. In addition, we propose a crossmodal candidate-aware attention network to select relevant historical clicked news for accurate user modeling by measuring the crossmodal relatedness between clicked news and candidate news. Experiments validate that incorporating multimodal news information can effectively improve news recommendation.

Via

Access Paper or Ask Questions

Two Birds with One Stone: Unified Model Learning for Both Recall and Ranking in News Recommendation

Apr 15, 2021

Chuhan Wu, Fangzhao Wu, Tao Qi, Yongfeng Huang

Figure 1 for Two Birds with One Stone: Unified Model Learning for Both Recall and Ranking in News Recommendation

Figure 2 for Two Birds with One Stone: Unified Model Learning for Both Recall and Ranking in News Recommendation

Figure 3 for Two Birds with One Stone: Unified Model Learning for Both Recall and Ranking in News Recommendation

Figure 4 for Two Birds with One Stone: Unified Model Learning for Both Recall and Ranking in News Recommendation

Abstract:Recall and ranking are two critical steps in personalized news recommendation. Most existing news recommender systems conduct personalized news recall and ranking separately with different models. However, maintaining multiple models leads to high computational cost and poses great challenge to meeting the online latency requirement of news recommender systems. In order to handle this problem, in this paper we propose UniRec, a unified method for recall and ranking in news recommendation. In our method, we first infer user embedding for ranking from the historical news click behaviors of a user using a user encoder model. Then we derive the user embedding for recall from the obtained user embedding for ranking by using it as the attention query to select a set of basis user embeddings which encode different general user interests and synthesize them into a user embedding for recall. The extensive experiments on benchmark dataset demonstrate that our method can improve both efficiency and effectiveness for recall and ranking in news recommendation.

Via

Access Paper or Ask Questions

DebiasedRec: Bias-aware User Modeling and Click Prediction for Personalized News Recommendation

Apr 15, 2021

Jingwei Yi, Fangzhao Wu, Chuhan Wu, Qifei Li, Guangzhong Sun, Xing Xie

Figure 1 for DebiasedRec: Bias-aware User Modeling and Click Prediction for Personalized News Recommendation

Figure 2 for DebiasedRec: Bias-aware User Modeling and Click Prediction for Personalized News Recommendation

Figure 3 for DebiasedRec: Bias-aware User Modeling and Click Prediction for Personalized News Recommendation

Figure 4 for DebiasedRec: Bias-aware User Modeling and Click Prediction for Personalized News Recommendation

Abstract:News recommendation is critical for personalized news access. Existing news recommendation methods usually infer users' personal interest based on their historical clicked news, and train the news recommendation models by predicting future news clicks. A core assumption behind these methods is that news click behaviors can indicate user interest. However, in practical scenarios, beyond the relevance between user interest and news content, the news click behaviors may also be affected by other factors, such as the bias of news presentation in the online platform. For example, news with higher positions and larger sizes are usually more likely to be clicked. The bias of clicked news may bring noises to user interest modeling and model training, which may hurt the performance of the news recommendation model. In this paper, we propose a bias-aware personalized news recommendation method named DebiasRec, which can handle the bias information for more accurate user interest inference and model training. The core of our method includes a bias representation module, a bias-aware user modeling module, and a bias-aware click prediction module. The bias representation module is used to model different kinds of news bias and their interactions to capture their joint effect on click behaviors. The bias-aware user modeling module aims to infer users' debiased interest from the clicked news articles by using their bias information to calibrate the interest model. The bias-aware click prediction module is used to train a debiased news recommendation model from the biased click behaviors, where the click score is decomposed into a preference score indicating user's interest in the news content and a news bias score inferred from its different bias features. Experiments on two real-world datasets show that our method can effectively improve the performance of news recommendation.

* 10 pages, 10 figures

Via

Access Paper or Ask Questions

FedGNN: Federated Graph Neural Network for Privacy-Preserving Recommendation

Mar 01, 2021

Chuhan Wu, Fangzhao Wu, Yang Cao, Yongfeng Huang, Xing Xie

Figure 1 for FedGNN: Federated Graph Neural Network for Privacy-Preserving Recommendation

Figure 2 for FedGNN: Federated Graph Neural Network for Privacy-Preserving Recommendation

Figure 3 for FedGNN: Federated Graph Neural Network for Privacy-Preserving Recommendation

Figure 4 for FedGNN: Federated Graph Neural Network for Privacy-Preserving Recommendation

Abstract:Graph neural network (GNN) is widely used for recommendation to model high-order interactions between users and items. Existing GNN-based recommendation methods rely on centralized storage of user-item graphs and centralized model learning. However, user data is privacy-sensitive, and the centralized storage of user-item graphs may arouse privacy concerns and risk. In this paper, we propose a federated framework for privacy-preserving GNN-based recommendation, which can collectively train GNN models from decentralized user data and meanwhile exploit high-order user-item interaction information with privacy well protected. In our method, we locally train GNN model in each user client based on the user-item graph inferred from the local user-item interaction data. Each client uploads the local gradients of GNN to a server for aggregation, which are further sent to user clients for updating local GNN models. Since local gradients may contain private information, we apply local differential privacy techniques to the local gradients to protect user privacy. In addition, in order to protect the items that users have interactions with, we propose to incorporate randomly sampled items as pseudo interacted items for anonymity. To incorporate high-order user-item interactions, we propose a user-item graph expansion method that can find neighboring users with co-interacted items and exchange their embeddings for expanding the local user-item graphs in a privacy-preserving way. Extensive experiments on six benchmark datasets validate that our approach can achieve competitive results with existing centralized GNN-based recommendation methods and meanwhile effectively protect user privacy.

Via

Access Paper or Ask Questions

FeedRec: News Feed Recommendation with Various User Feedbacks

Feb 09, 2021

Chuhan Wu, Fangzhao Wu, Tao Qi, Yongfeng Huang

Figure 1 for FeedRec: News Feed Recommendation with Various User Feedbacks

Figure 2 for FeedRec: News Feed Recommendation with Various User Feedbacks

Figure 3 for FeedRec: News Feed Recommendation with Various User Feedbacks

Figure 4 for FeedRec: News Feed Recommendation with Various User Feedbacks

Abstract:Personalized news recommendation techniques are widely adopted by many online news feed platforms to target user interests. Learning accurate user interest models is important for news recommendation. Most existing methods for news recommendation rely on implicit feedbacks like click behaviors for inferring user interests and model training. However, click behaviors are implicit feedbacks and usually contain heavy noise. In addition, they cannot help infer complicated user interest such as dislike. Besides, the feed recommendation models trained solely on click behaviors cannot optimize other objectives such as user engagement. In this paper, we present a news feed recommendation method that can exploit various kinds of user feedbacks to enhance both user interest modeling and recommendation model training. In our method we propose a unified user modeling framework to incorporate various explicit and implicit user feedbacks to infer both positive and negative user interests. In addition, we propose a strong-to-weak attention network that uses the representations of stronger feedbacks to distill positive and negative user interests from implicit weak feedbacks for accurate user interest modeling. Besides, we propose a multi-feedback model training framework by jointly training the model in the click, finish and dwell time prediction tasks to learn an engagement-aware feed recommendation model. Extensive experiments on real-world dataset show that our approach can effectively improve the model performance in terms of both news clicks and user engagement.

Via

Access Paper or Ask Questions