Link prediction is widely used in a variety of industrial applications, such as merchant recommendation, fraudulent transaction detection, and so on. However, it's a great challenge to train and deploy a link prediction model on industrial-scale graphs with billions of nodes and edges. In this work, we present a scalable and distributed framework for semi-supervised link prediction problem (named DSSLP), which is able to handle industrial-scale graphs. Instead of training model on the whole graph, DSSLP is proposed to train on the \emph{$k$-hops neighborhood} of nodes in a mini-batch setting, which helps reduce the scale of the input graph and distribute the training procedure. In order to generate negative examples effectively, DSSLP contains a distributed batched runtime sampling module. It implements uniform and dynamic sampling approaches, and is able to adaptively construct positive and negative examples to guide the training process. Moreover, DSSLP proposes a model-split strategy to accelerate the speed of inference process of the link prediction task. Experimental results demonstrate that the effectiveness and efficiency of DSSLP in serval public datasets as well as real-world datasets of industrial-scale graphs.
The insurance industry has been creating innovative products around the emerging online shopping activities. Such e-commerce insurance is designed to protect buyers from potential risks such as impulse purchases and counterfeits. Fraudulent claims towards online insurance typically involve multiple parties such as buyers, sellers, and express companies, and they could lead to heavy financial losses. In order to uncover the relations behind organized fraudsters and detect fraudulent claims, we developed a large-scale insurance fraud detection system, i.e., InfDetect, which provides interfaces for commonly used graphs, standard data processing procedures, and a uniform graph learning platform. InfDetect is able to process big graphs containing up to 100 millions of nodes and billions of edges. In this paper, we investigate different graphs to facilitate fraudster mining, such as a device-sharing graph, a transaction graph, a friendship graph, and a buyer-seller graph. These graphs are fed to a uniform graph learning platform containing supervised and unsupervised graph learning algorithms. Cases on widely applied e-commerce insurance are described to demonstrate the usage and capability of our system. InfDetect has successfully detected thousands of fraudulent claims and saved over tens of thousands of dollars daily.
Many researchers have identified robotics as a potential solution to the aging population faced by many developed and developing countries. If so, how should we address the cognitive acceptance and ambient control of elderly assistive robots through design? In this paper, we proposed an explorative design of an ambient SuperLimb (Supernumerary Robotic Limb) system that involves a pneumatically-driven robotic cane for at-home motion assistance, an inflatable vest for compliant human-robot interaction, and a depth sensor for ambient intention detection. The proposed system aims at providing active assistance during the sit-to-stand transition for at-home usage by the elderly at the bedside, in the chair, and on the toilet. We proposed a modified biomechanical model with a linear cane robot for closed-loop control implementation. We validated the design feasibility of the proposed ambient SuperLimb system including the biomechanical model, our result showed the advantages in reducing lower limb efforts and elderly fall risks, yet the detection accuracy using depth sensing and adjustments on the model still require further research in the future. Nevertheless, we summarized empirical guidelines to support the ambient design of elderly-assistive SuperLimb systems for lower limb functional augmentation.
Mobile payment such as Alipay has been widely used in our daily lives. To further promote the mobile payment activities, it is important to run marketing campaigns under a limited budget by providing incentives such as coupons, commissions to merchants. As a result, incentive optimization is the key to maximizing the commercial objective of the marketing campaign. With the analyses of online experiments, we found that the transaction network can subtly describe the similarity of merchants' responses to different incentives, which is of great use in the incentive optimization problem. In this paper, we present a graph representation learning method atop of transaction networks for merchant incentive optimization in mobile payment marketing. With limited samples collected from online experiments, our end-to-end method first learns merchant representations based on an attributed transaction networks, then effectively models the correlations between the commercial objectives each merchant may achieve and the incentives under varying treatments. Thus we are able to model the sensitivity to incentive for each merchant, and spend the most budgets on those merchants that show strong sensitivities in the marketing campaign. Extensive offline and online experimental results at Alipay demonstrate the effectiveness of our proposed approach.
We present, GEM, the first heterogeneous graph neural network approach for detecting malicious accounts at Alipay, one of the world's leading mobile cashless payment platform. Our approach, inspired from a connected subgraph approach, adaptively learns discriminative embeddings from heterogeneous account-device graphs based on two fundamental weaknesses of attackers, i.e. device aggregation and activity aggregation. For the heterogeneous graph consists of various types of nodes, we propose an attention mechanism to learn the importance of different types of nodes, while using the sum operator for modeling the aggregation patterns of nodes in each type. Experiments show that our approaches consistently perform promising results compared with competitive methods over time.
Fraudulent claim detection is one of the greatest challenges the insurance industry faces. Alibaba's return-freight insurance, providing return-shipping postage compensations over product return on the e-commerce platform, receives thousands of potentially fraudulent claims every day. Such deliberate abuse of the insurance policy could lead to heavy financial losses. In order to detect and prevent fraudulent insurance claims, we developed a novel data-driven procedure to identify groups of organized fraudsters, one of the major contributions to financial losses, by learning network information. In this paper, we introduce a device-sharing network among claimants, followed by developing an automated solution for fraud detection based on graph learning algorithms, to separate fraudsters from regular customers and uncover groups of organized fraudsters. This solution applied at Alibaba achieves more than 80% precision while covering 44% more suspicious accounts compared with a previously deployed rule-based classifier after human expert investigations. Our approach can easily and effectively generalizes to other types of insurance.
Time-series forecasting is an important task in both academic and industry, which can be applied to solve many real forecasting problems like stock, water-supply, and sales predictions. In this paper, we study the case of retailers' sales forecasting on Tmall|the world's leading online B2C platform. By analyzing the data, we have two main observations, i.e., sales seasonality after we group different groups of retails and a Tweedie distribution after we transform the sales (target to forecast). Based on our observations, we design two mechanisms for sales forecasting, i.e., seasonality extraction and distribution transformation. First, we adopt Fourier decomposition to automatically extract the seasonalities for different categories of retailers, which can further be used as additional features for any established regression algorithms. Second, we propose to optimize the Tweedie loss of sales after logarithmic transformations. We apply these two mechanisms to classic regression models, i.e., neural network and Gradient Boosting Decision Tree, and the experimental results on Tmall dataset show that both mechanisms can significantly improve the forecasting results.
With online payment platforms being ubiquitous and important, fraud transaction detection has become the key for such platforms, to ensure user account safety and platform security. In this work, we present a novel method for detecting fraud transactions by leveraging patterns from both users' static profiles and users' dynamic behaviors in a unified framework. To address and explore the information of users' behaviors in continuous time spaces, we propose to use \emph{time attention based recurrent layers} to embed the detailed information of the time interval, such as the durations of specific actions, time differences between different actions and sequential behavior patterns,etc., in the same latent space. We further combine the learned embeddings and users' static profiles altogether in a unified framework. Extensive experiments validate the effectiveness of our proposed methods over state-of-the-art methods on various evaluation metrics, especially on \emph{recall at top percent} which is an important metric for measuring the balance between service experiences and risk of potential losses.
We present, GeniePath, a scalable approach for learning adaptive receptive fields of neural networks defined on permutation invariant graph data. In GeniePath, we propose an adaptive path layer consists of two complementary functions designed for breadth and depth exploration respectively, where the former learns the importance of different sized neighborhoods, while the latter extracts and filters signals aggregated from neighbors of different hops away. Our method works in both transductive and inductive settings, and extensive experiments compared with competitive methods show that our approaches yield state-of-the-art results on large graphs.