Alert button
Picture for Quanyu Dai

Quanyu Dai

Alert button

Semi-supervised Domain Adaptation on Graphs with Contrastive Learning and Minimax Entropy

Sep 14, 2023
Jiaren Xiao, Quanyu Dai, Xiao Shen, Xiaochen Xie, Jing Dai, James Lam, Ka-Wai Kwok

Label scarcity in a graph is frequently encountered in real-world applications due to the high cost of data labeling. To this end, semi-supervised domain adaptation (SSDA) on graphs aims to leverage the knowledge of a labeled source graph to aid in node classification on a target graph with limited labels. SSDA tasks need to overcome the domain gap between the source and target graphs. However, to date, this challenging research problem has yet to be formally considered by the existing approaches designed for cross-graph node classification. To tackle the SSDA problem on graphs, a novel method called SemiGCL is proposed, which benefits from graph contrastive learning and minimax entropy training. SemiGCL generates informative node representations by contrasting the representations learned from a graph's local and global views. Additionally, SemiGCL is adversarially optimized with the entropy loss of unlabeled target nodes to reduce domain divergence. Experimental results on benchmark datasets demonstrate that SemiGCL outperforms the state-of-the-art baselines on the SSDA tasks.

Viaarxiv icon

Only Encode Once: Making Content-based News Recommender Greener

Aug 27, 2023
Qijiong Liu, Jieming Zhu, Quanyu Dai, Xiao-Ming Wu

Figure 1 for Only Encode Once: Making Content-based News Recommender Greener
Figure 2 for Only Encode Once: Making Content-based News Recommender Greener
Figure 3 for Only Encode Once: Making Content-based News Recommender Greener
Figure 4 for Only Encode Once: Making Content-based News Recommender Greener

Large pretrained language models (PLM) have become de facto news encoders in modern news recommender systems, due to their strong ability in comprehending textual content. These huge Transformer-based architectures, when finetuned on recommendation tasks, can greatly improve news recommendation performance. However, the PLM-based pretrain-finetune framework incurs high computational cost and energy consumption, primarily due to the extensive redundant processing of news encoding during each training epoch. In this paper, we propose the ``Only Encode Once'' framework for news recommendation (OLEO), by decoupling news representation learning from downstream recommendation task learning. The decoupled design makes content-based news recommender as green and efficient as id-based ones, leading to great reduction in computational cost and training resources. Extensive experiments show that our OLEO framework can reduce carbon emissions by up to 13 times compared with the state-of-the-art pretrain-finetune framework and maintain a competitive or even superior performance level. The source code is released for reproducibility.

Viaarxiv icon

Out-of-distribution Detection with Implicit Outlier Transformation

Mar 09, 2023
Qizhou Wang, Junjie Ye, Feng Liu, Quanyu Dai, Marcus Kalander, Tongliang Liu, Jianye Hao, Bo Han

Figure 1 for Out-of-distribution Detection with Implicit Outlier Transformation
Figure 2 for Out-of-distribution Detection with Implicit Outlier Transformation
Figure 3 for Out-of-distribution Detection with Implicit Outlier Transformation
Figure 4 for Out-of-distribution Detection with Implicit Outlier Transformation

Outlier exposure (OE) is powerful in out-of-distribution (OOD) detection, enhancing detection capability via model fine-tuning with surrogate OOD data. However, surrogate data typically deviate from test OOD data. Thus, the performance of OE, when facing unseen OOD data, can be weakened. To address this issue, we propose a novel OE-based approach that makes the model perform well for unseen OOD situations, even for unseen OOD cases. It leads to a min-max learning scheme -- searching to synthesize OOD data that leads to worst judgments and learning from such OOD data for uniform performance in OOD detection. In our realization, these worst OOD data are synthesized by transforming original surrogate ones. Specifically, the associated transform functions are learned implicitly based on our novel insight that model perturbation leads to data transformation. Our methodology offers an efficient way of synthesizing OOD data, which can further benefit the detection model, besides the surrogate OOD data. We conduct extensive experiments under various OOD detection setups, demonstrating the effectiveness of our method against its advanced counterparts.

Viaarxiv icon

REASONER: An Explainable Recommendation Dataset with Multi-aspect Real User Labeled Ground Truths Towards more Measurable Explainable Recommendation

Mar 01, 2023
Xu Chen, Jingsen Zhang, Lei Wang, Quanyu Dai, Zhenhua Dong, Ruiming Tang, Rui Zhang, Li Chen, Ji-Rong Wen

Figure 1 for REASONER: An Explainable Recommendation Dataset with Multi-aspect Real User Labeled Ground Truths Towards more Measurable Explainable Recommendation
Figure 2 for REASONER: An Explainable Recommendation Dataset with Multi-aspect Real User Labeled Ground Truths Towards more Measurable Explainable Recommendation
Figure 3 for REASONER: An Explainable Recommendation Dataset with Multi-aspect Real User Labeled Ground Truths Towards more Measurable Explainable Recommendation
Figure 4 for REASONER: An Explainable Recommendation Dataset with Multi-aspect Real User Labeled Ground Truths Towards more Measurable Explainable Recommendation

Explainable recommendation has attracted much attention from the industry and academic communities. It has shown great potential for improving the recommendation persuasiveness, informativeness and user satisfaction. Despite a lot of promising explainable recommender models have been proposed in the past few years, the evaluation strategies of these models suffer from several limitations. For example, the explanation ground truths are not labeled by real users, the explanations are mostly evaluated based on only one aspect and the evaluation strategies can be hard to unify. To alleviate the above problems, we propose to build an explainable recommendation dataset with multi-aspect real user labeled ground truths. In specific, we firstly develop a video recommendation platform, where a series of questions around the recommendation explainability are carefully designed. Then, we recruit about 3000 users with different backgrounds to use the system, and collect their behaviors and feedback to our questions. In this paper, we detail the construction process of our dataset and also provide extensive analysis on its characteristics. In addition, we develop a library, where ten well-known explainable recommender models are implemented in a unified framework. Based on this library, we build several benchmarks for different explainable recommendation tasks. At last, we present many new opportunities brought by our dataset, which are expected to shed some new lights to the explainable recommendation field. Our dataset, library and the related documents have been released at https://reasoner2023.github.io/.

Viaarxiv icon

A Generalized Doubly Robust Learning Framework for Debiasing Post-Click Conversion Rate Prediction

Nov 12, 2022
Quanyu Dai, Haoxuan Li, Peng Wu, Zhenhua Dong, Xiao-Hua Zhou, Rui Zhang, Rui zhang, Jie Sun

Figure 1 for A Generalized Doubly Robust Learning Framework for Debiasing Post-Click Conversion Rate Prediction
Figure 2 for A Generalized Doubly Robust Learning Framework for Debiasing Post-Click Conversion Rate Prediction
Figure 3 for A Generalized Doubly Robust Learning Framework for Debiasing Post-Click Conversion Rate Prediction
Figure 4 for A Generalized Doubly Robust Learning Framework for Debiasing Post-Click Conversion Rate Prediction

Post-click conversion rate (CVR) prediction is an essential task for discovering user interests and increasing platform revenues in a range of industrial applications. One of the most challenging problems of this task is the existence of severe selection bias caused by the inherent self-selection behavior of users and the item selection process of systems. Currently, doubly robust (DR) learning approaches achieve the state-of-the-art performance for debiasing CVR prediction. However, in this paper, by theoretically analyzing the bias, variance and generalization bounds of DR methods, we find that existing DR approaches may have poor generalization caused by inaccurate estimation of propensity scores and imputation errors, which often occur in practice. Motivated by such analysis, we propose a generalized learning framework that not only unifies existing DR methods, but also provides a valuable opportunity to develop a series of new debiasing techniques to accommodate different application scenarios. Based on the framework, we propose two new DR methods, namely DR-BIAS and DR-MSE. DR-BIAS directly controls the bias of DR loss, while DR-MSE balances the bias and variance flexibly, which achieves better generalization performance. In addition, we propose a novel tri-level joint learning optimization method for DR-MSE in CVR prediction, and an efficient training algorithm correspondingly. We conduct extensive experiments on both real-world and semi-synthetic datasets, which validate the effectiveness of our proposed methods.

Viaarxiv icon

Recommendation with User Active Disclosing Willingness

Oct 25, 2022
Lei Wang, Xu Chen, Quanyu Dai, Zhenhua Dong

Figure 1 for Recommendation with User Active Disclosing Willingness
Figure 2 for Recommendation with User Active Disclosing Willingness
Figure 3 for Recommendation with User Active Disclosing Willingness
Figure 4 for Recommendation with User Active Disclosing Willingness

Recommender system has been deployed in a large amount of real-world applications, profoundly influencing people's daily life and production.Traditional recommender models mostly collect as comprehensive as possible user behaviors for accurate preference estimation. However, considering the privacy, preference shaping and other issues, the users may not want to disclose all their behaviors for training the model. In this paper, we study a novel recommendation paradigm, where the users are allowed to indicate their "willingness" on disclosing different behaviors, and the models are optimized by trading-off the recommendation quality as well as the violation of the user "willingness". More specifically, we formulate the recommendation problem as a multiplayer game, where the action is a selection vector representing whether the items are involved into the model training. For efficiently solving this game, we design a tailored algorithm based on influence function to lower the time cost for recommendation quality exploration, and also extend it with multiple anchor selection vectors.We conduct extensive experiments to demonstrate the effectiveness of our model on balancing the recommendation quality and user disclosing willingness.

Viaarxiv icon

Debiased Recommendation with Neural Stratification

Aug 15, 2022
Quanyu Dai, Zhenhua Dong, Xu Chen

Figure 1 for Debiased Recommendation with Neural Stratification
Figure 2 for Debiased Recommendation with Neural Stratification
Figure 3 for Debiased Recommendation with Neural Stratification

Debiased recommender models have recently attracted increasing attention from the academic and industry communities. Existing models are mostly based on the technique of inverse propensity score (IPS). However, in the recommendation domain, IPS can be hard to estimate given the sparse and noisy nature of the observed user-item exposure data. To alleviate this problem, in this paper, we assume that the user preference can be dominated by a small amount of latent factors, and propose to cluster the users for computing more accurate IPS via increasing the exposure densities. Basically, such method is similar with the spirit of stratification models in applied statistics. However, unlike previous heuristic stratification strategy, we learn the cluster criterion by presenting the users with low ranking embeddings, which are future shared with the user representations in the recommender model. At last, we find that our model has strong connections with the previous two types of debiased recommender models. We conduct extensive experiments based on real-world datasets to demonstrate the effectiveness of the proposed method.

Viaarxiv icon

BARS: Towards Open Benchmarking for Recommender Systems

Jun 01, 2022
Jieming Zhu, Quanyu Dai, Liangcai Su, Rong Ma, Jinyang Liu, Guohao Cai, Xi Xiao, Rui Zhang

Figure 1 for BARS: Towards Open Benchmarking for Recommender Systems
Figure 2 for BARS: Towards Open Benchmarking for Recommender Systems
Figure 3 for BARS: Towards Open Benchmarking for Recommender Systems
Figure 4 for BARS: Towards Open Benchmarking for Recommender Systems

The past two decades have witnessed the rapid development of personalized recommendation techniques. Despite the significant progress made in both research and practice of recommender systems, to date, there is a lack of a widely-recognized benchmarking standard in this field. Many of the existing studies perform model evaluations and comparisons in an ad-hoc manner, for example, by employing their own private data splits or using a different experimental setting. However, such conventions not only increase the difficulty in reproducing existing studies, but also lead to inconsistent experimental results among them. This largely limits the credibility and practical value of research results in this field. To tackle these issues, we present an initiative project aimed for open benchmarking for recommender systems. In contrast to some earlier attempts towards this goal, we take one further step by setting up a standardized benchmarking pipeline for reproducible research, which integrates all the details about datasets, source code, hyper-parameter settings, running logs, and evaluation results. The benchmark is designed with comprehensiveness and sustainability in mind. It spans both matching and ranking tasks, and also allows anyone to easily follow and contribute. We believe that our benchmark could not only reduce the redundant efforts of researchers to re-implement or re-run existing baselines, but also drive more solid and reproducible research on recommender systems.

* Author's draft version. BARS benchmark available at https://openbenchmark.github.io 
Viaarxiv icon