Alert button
Picture for Mengyue Yang

Mengyue Yang

Alert button

Invariant Learning via Probability of Sufficient and Necessary Causes

Sep 22, 2023
Mengyue Yang, Zhen Fang, Yonggang Zhang, Yali Du, Furui Liu, Jean-Francois Ton, Jun Wang

Out-of-distribution (OOD) generalization is indispensable for learning models in the wild, where testing distribution typically unknown and different from the training. Recent methods derived from causality have shown great potential in achieving OOD generalization. However, existing methods mainly focus on the invariance property of causes, while largely overlooking the property of \textit{sufficiency} and \textit{necessity} conditions. Namely, a necessary but insufficient cause (feature) is invariant to distribution shift, yet it may not have required accuracy. By contrast, a sufficient yet unnecessary cause (feature) tends to fit specific data well but may have a risk of adapting to a new domain. To capture the information of sufficient and necessary causes, we employ a classical concept, the probability of sufficiency and necessary causes (PNS), which indicates the probability of whether one is the necessary and sufficient cause. To associate PNS with OOD generalization, we propose PNS risk and formulate an algorithm to learn representation with a high PNS value. We theoretically analyze and prove the generalizability of the PNS risk. Experiments on both synthetic and real-world benchmarks demonstrate the effectiveness of the proposed method. The details of the implementation can be found at the GitHub repository: https://github.com/ymy4323460/CaSN.

Viaarxiv icon

Replace Scoring with Arrangement: A Contextual Set-to-Arrangement Framework for Learning-to-Rank

Aug 05, 2023
Jiarui Jin, Xianyu Chen, Weinan Zhang, Mengyue Yang, Yang Wang, Yali Du, Yong Yu, Jun Wang

Figure 1 for Replace Scoring with Arrangement: A Contextual Set-to-Arrangement Framework for Learning-to-Rank
Figure 2 for Replace Scoring with Arrangement: A Contextual Set-to-Arrangement Framework for Learning-to-Rank
Figure 3 for Replace Scoring with Arrangement: A Contextual Set-to-Arrangement Framework for Learning-to-Rank
Figure 4 for Replace Scoring with Arrangement: A Contextual Set-to-Arrangement Framework for Learning-to-Rank

Learning-to-rank is a core technique in the top-N recommendation task, where an ideal ranker would be a mapping from an item set to an arrangement (a.k.a. permutation). Most existing solutions fall in the paradigm of probabilistic ranking principle (PRP), i.e., first score each item in the candidate set and then perform a sort operation to generate the top ranking list. However, these approaches neglect the contextual dependence among candidate items during individual scoring, and the sort operation is non-differentiable. To bypass the above issues, we propose Set-To-Arrangement Ranking (STARank), a new framework directly generates the permutations of the candidate items without the need for individually scoring and sort operations; and is end-to-end differentiable. As a result, STARank can operate when only the ground-truth permutations are accessible without requiring access to the ground-truth relevance scores for items. For this purpose, STARank first reads the candidate items in the context of the user browsing history, whose representations are fed into a Plackett-Luce module to arrange the given items into a list. To effectively utilize the given ground-truth permutations for supervising STARank, we leverage the internal consistency property of Plackett-Luce models to derive a computationally efficient list-wise loss. Experimental comparisons against 9 the state-of-the-art methods on 2 learning-to-rank benchmark datasets and 3 top-N real-world recommendation datasets demonstrate the superiority of STARank in terms of conventional ranking metrics. Notice that these ranking metrics do not consider the effects of the contextual dependence among the items in the list, we design a new family of simulation-based ranking metrics, where existing metrics can be regarded as special cases. STARank can consistently achieve better performance in terms of PBM and UBM simulation-based metrics.

* CIKM 2023 
Viaarxiv icon

ChessGPT: Bridging Policy Learning and Language Modeling

Jun 15, 2023
Xidong Feng, Yicheng Luo, Ziyan Wang, Hongrui Tang, Mengyue Yang, Kun Shao, David Mguni, Yali Du, Jun Wang

When solving decision-making tasks, humans typically depend on information from two key sources: (1) Historical policy data, which provides interaction replay from the environment, and (2) Analytical insights in natural language form, exposing the invaluable thought process or strategic considerations. Despite this, the majority of preceding research focuses on only one source: they either use historical replay exclusively to directly learn policy or value functions, or engaged in language model training utilizing mere language corpus. In this paper, we argue that a powerful autonomous agent should cover both sources. Thus, we propose ChessGPT, a GPT model bridging policy learning and language modeling by integrating data from these two sources in Chess games. Specifically, we build a large-scale game and language dataset related to chess. Leveraging the dataset, we showcase two model examples ChessCLIP and ChessGPT, integrating policy learning and language modeling. Finally, we propose a full evaluation framework for evaluating language model's chess ability. Experimental results validate our model and dataset's effectiveness. We open source our code, model, and dataset at https://github.com/waterhorse1/ChessGPT.

Viaarxiv icon

Generalizable Information Theoretic Causal Representation

Feb 17, 2022
Mengyue Yang, Xinyu Cai, Furui Liu, Xu Chen, Zhitang Chen, Jianye Hao, Jun Wang

Figure 1 for Generalizable Information Theoretic Causal Representation
Figure 2 for Generalizable Information Theoretic Causal Representation
Figure 3 for Generalizable Information Theoretic Causal Representation
Figure 4 for Generalizable Information Theoretic Causal Representation

It is evidence that representation learning can improve model's performance over multiple downstream tasks in many real-world scenarios, such as image classification and recommender systems. Existing learning approaches rely on establishing the correlation (or its proxy) between features and the downstream task (labels), which typically results in a representation containing cause, effect and spurious correlated variables of the label. Its generalizability may deteriorate because of the unstability of the non-causal parts. In this paper, we propose to learn causal representation from observational data by regularizing the learning procedure with mutual information measures according to our hypothetical causal graph. The optimization involves a counterfactual loss, based on which we deduce a theoretical guarantee that the causality-inspired learning is with reduced sample complexity and better generalization ability. Extensive experiments show that the models trained on causal representations learned by our approach is robust under adversarial attacks and distribution shift.

Viaarxiv icon

Debiased Recommendation with User Feature Balancing

Jan 16, 2022
Mengyue Yang, Guohao Cai, Furui Liu, Zhenhua Dong, Xiuqiang He, Jianye Hao, Jun Wang, Xu Chen

Figure 1 for Debiased Recommendation with User Feature Balancing
Figure 2 for Debiased Recommendation with User Feature Balancing
Figure 3 for Debiased Recommendation with User Feature Balancing
Figure 4 for Debiased Recommendation with User Feature Balancing

Debiased recommendation has recently attracted increasing attention from both industry and academic communities. Traditional models mostly rely on the inverse propensity score (IPS), which can be hard to estimate and may suffer from the high variance issue. To alleviate these problems, in this paper, we propose a novel debiased recommendation framework based on user feature balancing. The general idea is to introduce a projection function to adjust user feature distributions, such that the ideal unbiased learning objective can be upper bounded by a solvable objective purely based on the offline dataset. In the upper bound, the projected user distributions are expected to be equal given different items. From the causal inference perspective, this requirement aims to remove the causal relation from the user to the item, which enables us to achieve unbiased recommendation, bypassing the computation of IPS. In order to efficiently balance the user distributions upon each item pair, we propose three strategies, including clipping, sampling and adversarial learning to improve the training process. For more robust optimization, we deploy an explicit model to capture the potential latent confounders in recommendation systems. To the best of our knowledge, this paper is the first work on debiased recommendation based on confounder balancing. In the experiments, we compare our framework with many state-of-the-art methods based on synthetic, semi-synthetic and real-world datasets. Extensive experiments demonstrate that our model is effective in promoting the recommendation performance.

Viaarxiv icon

Top-N Recommendation with Counterfactual User Preference Simulation

Sep 13, 2021
Mengyue Yang, Quanyu Dai, Zhenhua Dong, Xu Chen, Xiuqiang He, Jun Wang

Figure 1 for Top-N Recommendation with Counterfactual User Preference Simulation
Figure 2 for Top-N Recommendation with Counterfactual User Preference Simulation
Figure 3 for Top-N Recommendation with Counterfactual User Preference Simulation
Figure 4 for Top-N Recommendation with Counterfactual User Preference Simulation

Top-N recommendation, which aims to learn user ranking-based preference, has long been a fundamental problem in a wide range of applications. Traditional models usually motivate themselves by designing complex or tailored architectures based on different assumptions. However, the training data of recommender system can be extremely sparse and imbalanced, which poses great challenges for boosting the recommendation performance. To alleviate this problem, in this paper, we propose to reformulate the recommendation task within the causal inference framework, which enables us to counterfactually simulate user ranking-based preferences to handle the data scarce problem. The core of our model lies in the counterfactual question: "what would be the user's decision if the recommended items had been different?". To answer this question, we firstly formulate the recommendation process with a series of structural equation models (SEMs), whose parameters are optimized based on the observed data. Then, we actively indicate many recommendation lists (called intervention in the causal inference terminology) which are not recorded in the dataset, and simulate user feedback according to the learned SEMs for generating new training samples. Instead of randomly intervening on the recommendation list, we design a learning-based method to discover more informative training samples. Considering that the learned SEMs can be not perfect, we, at last, theoretically analyze the relation between the number of generated samples and the model prediction error, based on which a heuristic method is designed to control the negative effect brought by the prediction error. Extensive experiments are conducted based on both synthetic and real-world datasets to demonstrate the effectiveness of our framework.

Viaarxiv icon

Causal World Models by Unsupervised Deconfounding of Physical Dynamics

Dec 28, 2020
Minne Li, Mengyue Yang, Furui Liu, Xu Chen, Zhitang Chen, Jun Wang

Figure 1 for Causal World Models by Unsupervised Deconfounding of Physical Dynamics
Figure 2 for Causal World Models by Unsupervised Deconfounding of Physical Dynamics
Figure 3 for Causal World Models by Unsupervised Deconfounding of Physical Dynamics
Figure 4 for Causal World Models by Unsupervised Deconfounding of Physical Dynamics

The capability of imagining internally with a mental model of the world is vitally important for human cognition. If a machine intelligent agent can learn a world model to create a "dream" environment, it can then internally ask what-if questions -- simulate the alternative futures that haven't been experienced in the past yet -- and make optimal decisions accordingly. Existing world models are established typically by learning spatio-temporal regularities embedded from the past sensory signal without taking into account confounding factors that influence state transition dynamics. As such, they fail to answer the critical counterfactual questions about "what would have happened" if a certain action policy was taken. In this paper, we propose Causal World Models (CWMs) that allow unsupervised modeling of relationships between the intervened observations and the alternative futures by learning an estimator of the latent confounding factors. We empirically evaluate our method and demonstrate its effectiveness in a variety of physical reasoning environments. Specifically, we show reductions in sample complexity for reinforcement learning tasks and improvements in counterfactual physical reasoning.

Viaarxiv icon

CausalVAE: Structured Causal Disentanglement in Variational Autoencoder

Apr 18, 2020
Mengyue Yang, Furui Liu, Zhitang Chen, Xinwei Shen, Jianye Hao, Jun Wang

Figure 1 for CausalVAE: Structured Causal Disentanglement in Variational Autoencoder
Figure 2 for CausalVAE: Structured Causal Disentanglement in Variational Autoencoder
Figure 3 for CausalVAE: Structured Causal Disentanglement in Variational Autoencoder
Figure 4 for CausalVAE: Structured Causal Disentanglement in Variational Autoencoder

Learning disentanglement aims at finding a low dimensional representation, which consists of multiple explanatory and generative factors of the observational data. The framework of variational autoencoder is commonly used to disentangle independent factors from observations. However, in real scenarios, the factors with semantic meanings are not necessarily independent. Instead, there might be an underlying causal structure due to physics laws. We thus propose a new VAE based framework named CausalVAE, which includes causal layers to transform independent factors into causal factors that correspond to causally related concepts in data. We analyze the model identifiabitily of CausalVAE, showing that the generative model learned from the observational data recovers the true one up to a certain degree. Experiments are conducted on various datasets, including synthetic datasets consisting of pictures with multiple causally related objects abstracted from physical world, and a benchmark face dataset CelebA. The results show that the causal representations by CausalVAE are semantically interpretable, and lead to better results on downstream tasks. The new framework allows causal intervention, by which we can intervene any causal concepts to generate artificial data.

Viaarxiv icon

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Apr 06, 2020
Mengyue Yang, Qingyang Li, Zhiwei Qin, Jieping Ye

Figure 1 for Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation
Figure 2 for Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation
Figure 3 for Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation
Figure 4 for Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Contextual multi-armed bandit (MAB) achieves cutting-edge performance on a variety of problems. When it comes to real-world scenarios such as recommendation system and online advertising, however, it is essential to consider the resource consumption of exploration. In practice, there is typically non-zero cost associated with executing a recommendation (arm) in the environment, and hence, the policy should be learned with a fixed exploration cost constraint. It is challenging to learn a global optimal policy directly, since it is a NP-hard problem and significantly complicates the exploration and exploitation trade-off of bandit algorithms. Existing approaches focus on solving the problems by adopting the greedy policy which estimates the expected rewards and costs and uses a greedy selection based on each arm's expected reward/cost ratio using historical observation until the exploration resource is exhausted. However, existing methods are hard to extend to infinite time horizon, since the learning process will be terminated when there is no more resource. In this paper, we propose a hierarchical adaptive contextual bandit method (HATCH) to conduct the policy learning of contextual bandits with a budget constraint. HATCH adopts an adaptive method to allocate the exploration resource based on the remaining resource/time and the estimation of reward distribution among different user contexts. In addition, we utilize full of contextual feature information to find the best personalized recommendation. Finally, in order to prove the theoretical guarantee, we present a regret bound analysis and prove that HATCH achieves a regret bound as low as $O(\sqrt{T})$. The experimental results demonstrate the effectiveness and efficiency of the proposed method on both synthetic data sets and the real-world applications.

* Accepted for publication at WWW (The Web Conference) 2020 
Viaarxiv icon