Recently, many pre-trained language models for source code have been proposed to model the context of code and serve as a basis for downstream code intelligence tasks such as code completion, code search, and code summarization. These models leverage masked pre-training and Transformer and have achieved promising results. However, currently there is still little progress regarding interpretability of existing pre-trained code models. It is not clear why these models work and what feature correlations they can capture. In this paper, we conduct a thorough structural analysis aiming to provide an interpretation of pre-trained language models for source code (e.g., CodeBERT, and GraphCodeBERT) from three distinctive perspectives: (1) attention analysis, (2) probing on the word embedding, and (3) syntax tree induction. Through comprehensive analysis, this paper reveals several insightful findings that may inspire future studies: (1) Attention aligns strongly with the syntax structure of code. (2) Pre-training language models of code can preserve the syntax structure of code in the intermediate representations of each Transformer layer. (3) The pre-trained models of code have the ability of inducing syntax trees of code. Theses findings suggest that it may be helpful to incorporate the syntax structure of code into the process of pre-training for better code representations.
Traditional recommendation models trained on observational interaction data have generated large impacts in a wide range of applications, it faces bias problems that cover users' true intent and thus deteriorate the recommendation effectiveness. Existing methods tracks this problem as eliminating bias for the robust recommendation, e.g., by re-weighting training samples or learning disentangled representation. The disentangled representation methods as the state-of-the-art eliminate bias through revealing cause-effect of the bias generation. However, how to design the semantics-aware and unbiased representation for users true intents is largely unexplored. To bridge the gap, we are the first to propose an unbiased and semantics-aware disentanglement learning called CaDSI (Causal Disentanglement for Semantics-Aware Intent Learning) from a causal perspective. Particularly, CaDSI explicitly models the causal relations underlying recommendation task, and thus produces semantics-aware representations via disentangling users true intents aware of specific item context. Moreover, the causal intervention mechanism is designed to eliminate confounding bias stemmed from context information, which further to align the semantics-aware representation with users true intent. Extensive experiments and case studies both validate the robustness and interpretability of our proposed model.
Graph contrastive learning is the state-of-the-art unsupervised graph representation learning framework and has shown comparable performance with supervised approaches. However, evaluating whether the graph contrastive learning is robust to adversarial attacks is still an open problem because most existing graph adversarial attacks are supervised models, which means they heavily rely on labels and can only be used to evaluate the graph contrastive learning in a specific scenario. For unsupervised graph representation methods such as graph contrastive learning, it is difficult to acquire labels in real-world scenarios, making traditional supervised graph attack methods difficult to be applied to test their robustness. In this paper, we propose a novel unsupervised gradient-based adversarial attack that does not rely on labels for graph contrastive learning. We compute the gradients of the adjacency matrices of the two views and flip the edges with gradient ascent to maximize the contrastive loss. In this way, we can fully use multiple views generated by the graph contrastive learning models and pick the most informative edges without knowing their labels, and therefore can promisingly support our model adapted to more kinds of downstream tasks. Extensive experiments show that our attack outperforms unsupervised baseline attacks and has comparable performance with supervised attacks in multiple downstream tasks including node classification and link prediction. We further show that our attack can be transferred to other graph representation models as well.
Binary-source code matching plays an important role in many security and software engineering related tasks such as malware detection, reverse engineering and vulnerability assessment. Currently, several approaches have been proposed for binary-source code matching by jointly learning the embeddings of binary code and source code in a common vector space. Despite much effort, existing approaches target on matching the binary code and source code written in a single programming language. However, in practice, software applications are often written in different programming languages to cater for different requirements and computing platforms. Matching binary and source code across programming languages introduces additional challenges when maintaining multi-language and multi-platform applications. To this end, this paper formulates the problem of cross-language binary-source code matching, and develops a new dataset for this new problem. We present a novel approach XLIR, which is a Transformer-based neural network by learning the intermediate representations for both binary and source code. To validate the effectiveness of XLIR, comprehensive experiments are conducted on two tasks of cross-language binary-source code matching, and cross-language source-source code matching, on top of our curated dataset. Experimental results and analysis show that our proposed XLIR with intermediate representations significantly outperforms other state-of-the-art models in both of the two tasks.
Unsupervised graph representation learning has emerged as a powerful tool to address real-world problems and achieves huge success in the graph learning domain. Graph contrastive learning is one of the unsupervised graph representation learning methods, which recently attracts attention from researchers and has achieved state-of-the-art performances on various tasks. The key to the success of graph contrastive learning is to construct proper contrasting pairs to acquire the underlying structural semantics of the graph. However, this key part is not fully explored currently, most of the ways generating contrasting pairs focus on augmenting or perturbating graph structures to obtain different views of the input graph. But such strategies could degrade the performances via adding noise into the graph, which may narrow down the field of the applications of graph contrastive learning. In this paper, we propose a novel graph contrastive learning method, namely \textbf{D}ual \textbf{S}pace \textbf{G}raph \textbf{C}ontrastive (DSGC) Learning, to conduct graph contrastive learning among views generated in different spaces including the hyperbolic space and the Euclidean space. Since both spaces have their own advantages to represent graph data in the embedding spaces, we hope to utilize graph contrastive learning to bridge the spaces and leverage advantages from both sides. The comparison experiment results show that DSGC achieves competitive or better performances among all the datasets. In addition, we conduct extensive experiments to analyze the impact of different graph encoders on DSGC, giving insights about how to better leverage the advantages of contrastive learning between different spaces.
Causal inference is capable of estimating the treatment effect (i.e., the causal effect of treatment on the outcome) to benefit the decision making in various domains. One fundamental challenge in this research is that the treatment assignment bias in observational data. To increase the validity of observational studies on causal inference, representation based methods as the state-of-the-art have demonstrated the superior performance of treatment effect estimation. Most representation based methods assume all observed covariates are pre-treatment (i.e., not affected by the treatment), and learn a balanced representation from these observed covariates for estimating treatment effect. Unfortunately, this assumption is often too strict a requirement in practice, as some covariates are changed by doing an intervention on treatment (i.e., post-treatment). By contrast, the balanced representation learned from unchanged covariates thus biases the treatment effect estimation.
Recent advances in path-based explainable recommendation systems have attracted increasing attention thanks to the rich information provided by knowledge graphs. Most existing explainable recommendations only utilize static knowledge graphs and ignore the dynamic user-item evolutions, leading to less convincing and inaccurate explanations. Although there are some works that realize that modelling user's temporal sequential behaviour could boost the performance and explainability of the recommender systems, most of them either only focus on modelling user's sequential interactions within a path or independently and separately of the recommendation mechanism. In this paper, we propose a novel Temporal Meta-path Guided Explainable Recommendation leveraging Reinforcement Learning (TMER-RL), which utilizes reinforcement item-item path modelling between consecutive items with attention mechanisms to sequentially model dynamic user-item evolutions on dynamic knowledge graph for explainable recommendation. Compared with existing works that use heavy recurrent neural networks to model temporal information, we propose simple but effective neural networks to capture users' historical item features and path-based context to characterize the next purchased item. Extensive evaluations of TMER on two real-world datasets show state-of-the-art performance compared against recent strong baselines.
User purchasing prediction with multi-behavior information remains a challenging problem for current recommendation systems. Various methods have been proposed to address it via leveraging the advantages of graph neural networks (GNNs) or multi-task learning. However, most existing works do not take the complex dependencies among different behaviors of users into consideration. They utilize simple and fixed schemes, like neighborhood information aggregation or mathematical calculation of vectors, to fuse the embeddings of different user behaviors to obtain a unified embedding to represent a user's behavioral patterns which will be used in downstream recommendation tasks. To tackle the challenge, in this paper, we first propose the concept of hyper meta-path to construct hyper meta-paths or hyper meta-graphs to explicitly illustrate the dependencies among different behaviors of a user. How to obtain a unified embedding for a user from hyper meta-paths and avoid the previously mentioned limitations simultaneously is critical. Thanks to the recent success of graph contrastive learning, we leverage it to learn embeddings of user behavior patterns adaptively instead of assigning a fixed scheme to understand the dependencies among different behaviors. A new graph contrastive learning based framework is proposed by coupling with hyper meta-paths, namely HMG-CR, which consistently and significantly outperforms all baselines in extensive comparison experiments.
Causal inference methods are widely applied in various decision-making domains such as precision medicine, optimal policy and economics. Central to causal inference is the treatment effect estimation of intervention strategies, such as changes in drug dosing and increases in financial aid. Existing methods are mostly restricted to the deterministic treatment and compare outcomes under different treatments. However, they are unable to address the substantial recent interest of treatment effect estimation under stochastic treatment, e.g., "how all units health status change if they adopt 50\% dose reduction". In other words, they lack the capability of providing fine-grained treatment effect estimation to support sound decision-making. In our study, we advance the causal inference research by proposing a new effective framework to estimate the treatment effect on stochastic intervention. Particularly, we develop a stochastic intervention effect estimator (SIE) based on nonparametric influence function, with the theoretical guarantees of robustness and fast convergence rates. Additionally, we construct a customised reinforcement learning algorithm based on the random search solver which can effectively find the optimal policy to produce the greatest expected outcomes for the decision-making process. Finally, we conduct an empirical study to justify that our framework can achieve significant performance in comparison with state-of-the-art baselines.