Alert button
Picture for Hongning Wang

Hongning Wang

Alert button

Uncertainty-Aware Off-Policy Learning

Mar 11, 2023
Xiaoying Zhang, Junpu Chen, Hongning Wang, Hong Xie, Hang Li

Figure 1 for Uncertainty-Aware Off-Policy Learning
Figure 2 for Uncertainty-Aware Off-Policy Learning
Figure 3 for Uncertainty-Aware Off-Policy Learning
Figure 4 for Uncertainty-Aware Off-Policy Learning

Off-policy learning, referring to the procedure of policy optimization with access only to logged feedback data, has shown importance in various real-world applications, such as search engines, recommender systems, and etc. While the ground-truth logging policy, which generates the logged data, is usually unknown, previous work simply takes its estimated value in off-policy learning, ignoring both high bias and high variance resulted from such an estimator, especially on samples with small and inaccurately estimated logging probabilities. In this work, we explicitly model the uncertainty in the estimated logging policy and propose a Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning. Experiment results on synthetic and three real-world recommendation datasets demonstrate the advantageous sample efficiency of the proposed UIPS estimator against an extensive list of state-of-the-art baselines.

Viaarxiv icon

Meta-Reinforcement Learning via Exploratory Task Clustering

Feb 15, 2023
Zhendong Chu, Hongning Wang

Figure 1 for Meta-Reinforcement Learning via Exploratory Task Clustering
Figure 2 for Meta-Reinforcement Learning via Exploratory Task Clustering
Figure 3 for Meta-Reinforcement Learning via Exploratory Task Clustering
Figure 4 for Meta-Reinforcement Learning via Exploratory Task Clustering

Meta-reinforcement learning (meta-RL) aims to quickly solve new tasks by leveraging knowledge from prior tasks. However, previous studies often assume a single mode homogeneous task distribution, ignoring possible structured heterogeneity among tasks. Leveraging such structures can better facilitate knowledge sharing among related tasks and thus improve sample efficiency. In this paper, we explore the structured heterogeneity among tasks via clustering to improve meta-RL. We develop a dedicated exploratory policy to discover task structures via divide-and-conquer. The knowledge of the identified clusters helps to narrow the search space of task-specific information, leading to more sample efficient policy adaptation. Experiments on various MuJoCo tasks showed the proposed method can unravel cluster structures effectively in both rewards and state dynamics, proving strong advantages against a set of state-of-the-art baselines.

* 22 pages 
Viaarxiv icon

Debiasing Recommendation by Learning Identifiable Latent Confounders

Feb 10, 2023
Qing Zhang, Xiaoying Zhang, Yang Liu, Hongning Wang, Min Gao, Jiheng Zhang, Ruocheng Guo

Figure 1 for Debiasing Recommendation by Learning Identifiable Latent Confounders
Figure 2 for Debiasing Recommendation by Learning Identifiable Latent Confounders
Figure 3 for Debiasing Recommendation by Learning Identifiable Latent Confounders
Figure 4 for Debiasing Recommendation by Learning Identifiable Latent Confounders

Recommendation systems aim to predict users' feedback on items not exposed to them. Confounding bias arises due to the presence of unmeasured variables (e.g., the socio-economic status of a user) that can affect both a user's exposure and feedback. Existing methods either (1) make untenable assumptions about these unmeasured variables or (2) directly infer latent confounders from users' exposure. However, they cannot guarantee the identification of counterfactual feedback, which can lead to biased predictions. In this work, we propose a novel method, i.e., identifiable deconfounder (iDCF), which leverages a set of proxy variables (e.g., observed user features) to resolve the aforementioned non-identification issue. The proposed iDCF is a general deconfounded recommendation framework that applies proximal causal inference to infer the unmeasured confounders and identify the counterfactual feedback with theoretical guarantees. Extensive experiments on various real-world and synthetic datasets verify the proposed method's effectiveness and robustness.

Viaarxiv icon

How Bad is Top-$K$ Recommendation under Competing Content Creators?

Feb 03, 2023
Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, Haifeng Xu

Figure 1 for How Bad is Top-$K$ Recommendation under Competing Content Creators?
Figure 2 for How Bad is Top-$K$ Recommendation under Competing Content Creators?
Figure 3 for How Bad is Top-$K$ Recommendation under Competing Content Creators?
Figure 4 for How Bad is Top-$K$ Recommendation under Competing Content Creators?

Content creators compete for exposure on recommendation platforms, and such strategic behavior leads to a dynamic shift over the content distribution. However, how the creators' competition impacts user welfare and how the relevance-driven recommendation influences the dynamics in the long run are still largely unknown. This work provides theoretical insights into these research questions. We model the creators' competition under the assumptions that: 1) the platform employs an innocuous top-$K$ recommendation policy; 2) user decisions follow the Random Utility model; 3) content creators compete for user engagement and, without knowing their utility function in hindsight, apply arbitrary no-regret learning algorithms to update their strategies. We study the user welfare guarantee through the lens of Price of Anarchy and show that the fraction of user welfare loss due to creator competition is always upper bounded by a small constant depending on $K$ and randomness in user decisions; we also prove the tightness of this bound. Our result discloses an intrinsic merit of the myopic approach to the recommendation, i.e., relevance-driven matching performs reasonably well in the long run, as long as users' decisions involve randomness and the platform provides reasonably many alternatives to its users.

Viaarxiv icon

Disentangled Representation for Diversified Recommendations

Jan 13, 2023
Xiaoying Zhang, Hongning Wang, Hang Li

Figure 1 for Disentangled Representation for Diversified Recommendations
Figure 2 for Disentangled Representation for Diversified Recommendations
Figure 3 for Disentangled Representation for Diversified Recommendations
Figure 4 for Disentangled Representation for Diversified Recommendations

Accuracy and diversity have long been considered to be two conflicting goals for recommendations. We point out, however, that as the diversity is typically measured by certain pre-selected item attributes, e.g., category as the most popularly employed one, improved diversity can be achieved without sacrificing recommendation accuracy, as long as the diversification respects the user's preference about the pre-selected attributes. This calls for a fine-grained understanding of a user's preferences over items, where one needs to recognize the user's choice is driven by the quality of the item itself, or the pre-selected attributes of the item. In this work, we focus on diversity defined on item categories. We propose a general diversification framework agnostic to the choice of recommendation algorithms. Our solution disentangles the learnt user representation in the recommendation module into category-independent and category-dependent components to differentiate a user's preference over items from two orthogonal perspectives. Experimental results on three benchmark datasets and online A/B test demonstrate the effectiveness of our solution in improving both recommendation accuracy and diversity. In-depth analysis suggests that the improvement is due to our improved modeling of users' categorical preferences and refined ranking within item categories.

* Accepted to WSDM2023 
Viaarxiv icon

MiddleGAN: Generate Domain Agnostic Samples for Unsupervised Domain Adaptation

Nov 06, 2022
Ye Gao, Zhendong Chu, Hongning Wang, John Stankovic

Figure 1 for MiddleGAN: Generate Domain Agnostic Samples for Unsupervised Domain Adaptation
Figure 2 for MiddleGAN: Generate Domain Agnostic Samples for Unsupervised Domain Adaptation
Figure 3 for MiddleGAN: Generate Domain Agnostic Samples for Unsupervised Domain Adaptation
Figure 4 for MiddleGAN: Generate Domain Agnostic Samples for Unsupervised Domain Adaptation

In recent years, machine learning has achieved impressive results across different application areas. However, machine learning algorithms do not necessarily perform well on a new domain with a different distribution than its training set. Domain Adaptation (DA) is used to mitigate this problem. One approach of existing DA algorithms is to find domain invariant features whose distributions in the source domain are the same as their distribution in the target domain. In this paper, we propose to let the classifier that performs the final classification task on the target domain learn implicitly the invariant features to perform classification. It is achieved via feeding the classifier during training generated fake samples that are similar to samples from both the source and target domains. We call these generated samples domain-agnostic samples. To accomplish this we propose a novel variation of generative adversarial networks (GAN), called the MiddleGAN, that generates fake samples that are similar to samples from both the source and target domains, using two discriminators and one generator. We extend the theory of GAN to show that there exist optimal solutions for the parameters of the two discriminators and one generator in MiddleGAN, and empirically show that the samples generated by the MiddleGAN are similar to both samples from the source domain and samples from the target domain. We conducted extensive evaluations using 24 benchmarks; on the 24 benchmarks, we compare MiddleGAN against various state-of-the-art algorithms and outperform the state-of-the-art by up to 20.1\% on certain benchmarks.

Viaarxiv icon

Spectral Augmentation for Self-Supervised Learning on Graphs

Oct 02, 2022
Lu Lin, Jinghui Chen, Hongning Wang

Figure 1 for Spectral Augmentation for Self-Supervised Learning on Graphs
Figure 2 for Spectral Augmentation for Self-Supervised Learning on Graphs
Figure 3 for Spectral Augmentation for Self-Supervised Learning on Graphs
Figure 4 for Spectral Augmentation for Self-Supervised Learning on Graphs

Graph contrastive learning (GCL), as an emerging self-supervised learning technique on graphs, aims to learn representations via instance discrimination. Its performance heavily relies on graph augmentation to reflect invariant patterns that are robust to small perturbations; yet it still remains unclear about what graph invariance GCL should capture. Recent studies mainly perform topology augmentations in a uniformly random manner in the spatial domain, ignoring its influence on the intrinsic structural properties embedded in the spectral domain. In this work, we aim to find a principled way for topology augmentations by exploring the invariance of graphs from the spectral perspective. We develop spectral augmentation which guides topology augmentations by maximizing the spectral change. Extensive experiments on both graph and node classification tasks demonstrate the effectiveness of our method in self-supervised representation learning. The proposed method also brings promising generalization capability in transfer learning, and is equipped with intriguing robustness property under adversarial attacks. Our study sheds light on a general principle for graph topology augmentation.

* 26 pages, 5 figures, 12 tables 
Viaarxiv icon

Rethinking Conversational Recommendations: Is Decision Tree All You Need?

Aug 31, 2022
A S M Ahsan-Ul Haque, Hongning Wang

Figure 1 for Rethinking Conversational Recommendations: Is Decision Tree All You Need?
Figure 2 for Rethinking Conversational Recommendations: Is Decision Tree All You Need?
Figure 3 for Rethinking Conversational Recommendations: Is Decision Tree All You Need?
Figure 4 for Rethinking Conversational Recommendations: Is Decision Tree All You Need?

Conversational recommender systems (CRS) dynamically obtain the user preferences via multi-turn questions and answers. The existing CRS solutions are widely dominated by deep reinforcement learning algorithms. However, deep reinforcement learning methods are often criticised for lacking interpretability and requiring a large amount of training data to perform. In this paper, we explore a simpler alternative and propose a decision tree based solution to CRS. The underlying challenge in CRS is that the same item can be described differently by different users. We show that decision trees are sufficient to characterize the interactions between users and items, and solve the key challenges in multi-turn CRS: namely which questions to ask, how to rank the candidate items, when to recommend, and how to handle negative feedback on the recommendations. Firstly, the training of decision trees enables us to find questions which effectively narrow down the search space. Secondly, by learning embeddings for each item and tree nodes, the candidate items can be ranked based on their similarity to the conversation context encoded by the tree nodes. Thirdly, the diversity of items associated with each tree node allows us to develop an early stopping strategy to decide when to make recommendations. Fourthly, when the user rejects a recommendation, we adaptively choose the next decision tree to improve subsequent questions and recommendations. Extensive experiments on three publicly available benchmark CRS datasets show that our approach provides significant improvement to the state of the art CRS methods.

* 19 pages, 5 figures 
Viaarxiv icon

Dynamic Global Sensitivity for Differentially Private Contextual Bandits

Aug 30, 2022
Huazheng Wang, David Zhao, Hongning Wang

Figure 1 for Dynamic Global Sensitivity for Differentially Private Contextual Bandits
Figure 2 for Dynamic Global Sensitivity for Differentially Private Contextual Bandits

Bandit algorithms have become a reference solution for interactive recommendation. However, as such algorithms directly interact with users for improved recommendations, serious privacy concerns have been raised regarding its practical use. In this work, we propose a differentially private linear contextual bandit algorithm, via a tree-based mechanism to add Laplace or Gaussian noise to model parameters. Our key insight is that as the model converges during online update, the global sensitivity of its parameters shrinks over time (thus named dynamic global sensitivity). Compared with existing solutions, our dynamic global sensitivity analysis allows us to inject less noise to obtain $(\epsilon, \delta)$-differential privacy with added regret caused by noise injection in $\tilde O(\log{T}\sqrt{T}/\epsilon)$. We provide a rigorous theoretical analysis over the amount of noise added via dynamic global sensitivity and the corresponding upper regret bound of our proposed algorithm. Experimental results on both synthetic and real-world datasets confirmed the algorithm's advantage against existing solutions.

* RecSys 2022 
Viaarxiv icon

Not Just Skipping. Understanding the Effect of Sponsored Content on Users' Decision-Making in Online Health Search

Jul 10, 2022
Anat Hashavit, Hongning Wang, Tamar Stern, Sarit Kraus

Figure 1 for Not Just Skipping. Understanding the Effect of Sponsored Content on Users' Decision-Making in Online Health Search
Figure 2 for Not Just Skipping. Understanding the Effect of Sponsored Content on Users' Decision-Making in Online Health Search
Figure 3 for Not Just Skipping. Understanding the Effect of Sponsored Content on Users' Decision-Making in Online Health Search
Figure 4 for Not Just Skipping. Understanding the Effect of Sponsored Content on Users' Decision-Making in Online Health Search

Advertisements (ads) are an innate part of search engine business models. Advertisers are willing to pay search engines to promote their content to a prominent position in the search result page (SERP). This raises concerns about the search engine manipulation effect (SEME): the opinions of users can be influenced by the way search results are presented. In this work, we investigate the connection between SEME and sponsored content in the health domain. We conduct a series of user studies in which participants need to evaluate the effectiveness of different non-prescription natural remedies for various medical conditions. We present participants SERPs with different intentionally created biases towards certain viewpoints, with or without sponsored content, and ask them to evaluate the effectiveness of the treatment only based on the information presented to them. We investigate two types of sponsored content: 1. Direct marketing ads that directly market the product without expressing an opinion about its effectiveness, and 2. Indirect marketing ads that explicitly advocate the product's effectiveness on the condition in the query. Our results reveal a significant difference between the influence on users from these two ad types. Though direct marketing ads are mostly skipped by users, they can tilt users decision making towards more positive viewpoints. Indirect marketing ads affect both the users' examination behaviour and their perception of the treatment's effectiveness. We further discover that the contrast between the indirect marketing ads and the viewpoint presented in the organic search results plays an important role in users' decision-making. When the contrast is high, users exhibit a strong preference towards a negative viewpoint, and when the contrast is low or none, users exhibit preference towards a more positive viewpoint.

* 10 pages, double column 
Viaarxiv icon