Risk scoring systems have been widely deployed in many applications, which assign risk scores to users according to their behavior sequences. Though many deep learning methods with sophisticated designs have achieved promising results, the black-box nature hinders their applications due to fairness, explainability, and compliance consideration. Rule-based systems are considered reliable in these sensitive scenarios. However, building a rule system is labor-intensive. Experts need to find informative statistics from user behavior sequences, design rules based on statistics and assign weights to each rule. In this paper, we bridge the gap between effective but black-box models and transparent rule models. We propose a two-stage method, RuDi, that distills the knowledge of black-box teacher models into rule-based student models. We design a Monte Carlo tree search-based statistics generation method that can provide a set of informative statistics in the first stage. Then statistics are composed into logical rules with our proposed neural logical networks by mimicking the outputs of teacher models. We evaluate RuDi on three real-world public datasets and an industrial dataset to demonstrate its effectiveness.
Real-world electricity consumption prediction may involve different tasks, e.g., prediction for different time steps ahead or different geo-locations. These tasks are often solved independently without utilizing some common problem-solving knowledge that could be extracted and shared among these tasks to augment the performance of solving each task. In this work, we propose a multi-task optimization (MTO) based co-training (MTO-CT) framework, where the models for solving different tasks are co-trained via an MTO paradigm in which solving each task may benefit from the knowledge gained from when solving some other tasks to help its solving process. MTO-CT leverages long short-term memory (LSTM) based model as the predictor where the knowledge is represented via connection weights and biases. In MTO-CT, an inter-task knowledge transfer module is designed to transfer knowledge between different tasks, where the most helpful source tasks are selected by using the probability matching and stochastic universal selection, and evolutionary operations like mutation and crossover are performed for reusing the knowledge from selected source tasks in a target task. We use electricity consumption data from five states in Australia to design two sets of tasks at different scales: a) one-step ahead prediction for each state (five tasks) and b) 6-step, 12-step, 18-step, and 24-step ahead prediction for each state (20 tasks). The performance of MTO-CT is evaluated on solving each of these two sets of tasks in comparison to solving each task in the set independently without knowledge sharing under the same settings, which demonstrates the superiority of MTO-CT in terms of prediction accuracy.
Multivariate time series (MTS) prediction plays a key role in many fields such as finance, energy and transport, where each individual time series corresponds to the data collected from a certain data source, so-called channel. A typical pipeline of building an MTS prediction model (PM) consists of selecting a subset of channels among all available ones, extracting features from the selected channels, and building a PM based on the extracted features, where each component involves certain optimization tasks, i.e., selection of channels, feature extraction (FE) methods, and PMs as well as configuration of the selected FE method and PM. Accordingly, pursuing the best prediction performance corresponds to optimizing the pipeline by solving all of its involved optimization problems. This is a non-trivial task due to the vastness of the solution space. Different from most of the existing works which target at optimizing certain components of the pipeline, we propose a novel evolutionary ensemble learning framework to optimize the entire pipeline in a holistic manner. In this framework, a specific pipeline is encoded as a candidate solution and a multi-objective evolutionary algorithm is applied under different population sizes to produce multiple Pareto optimal sets (POSs). Finally, selective ensemble learning is designed to choose the optimal subset of solutions from the POSs and combine them to yield final prediction by using greedy sequential selection and least square methods. We implement the proposed framework and evaluate our implementation on two real-world applications, i.e., electricity consumption prediction and air quality prediction. The performance comparison with state-of-the-art techniques demonstrates the superiority of the proposed approach.
In this paper we present DELTA, a deep learning based language technology platform. DELTA is an end-to-end platform designed to solve industry level natural language and speech processing problems. It integrates most popular neural network models for training as well as comprehensive deployment tools for production. DELTA aims to provide easy and fast experiences for using, deploying, and developing natural language processing and speech models for both academia and industry use cases. We demonstrate the reliable performance with DELTA on several natural language processing and speech tasks, including text classification, named entity recognition, natural language inference, speech recognition, speaker verification, etc. DELTA has been used for developing several state-of-the-art algorithms for publications and delivering real production to serve millions of users.