We present an algorithmic framework generalizing quantum-inspired polylogarithmic-time algorithms on low-rank matrices. Our work follows the line of research started by Tang's breakthrough classical algorithm for recommendation systems [STOC'19]. The main result of this work is an algorithm for singular value transformation on low-rank inputs in the quantum-inspired regime, where singular value transformation is a framework proposed by Gily\'en et al. [STOC'19] to study various quantum speedups. Since singular value transformation encompasses a vast range of matrix arithmetic, this result, combined with simple sampling lemmas from previous work, suffices to generalize all results dequantizing quantum machine learning algorithms to the authors' knowledge. Via simple black-box applications of our singular value transformation framework, we recover the dequantization results on recommendation systems, principal component analysis, supervised clustering, low-rank matrix inversion, low-rank semidefinite programming, and support vector machines. We also give additional dequantizations results on low-rank Hamiltonian simulation and discriminant analysis.
Recurrent neural networks are nowadays successfully used in an abundance of applications, going from text, speech and image processing to recommender systems. Backpropagation through time is the algorithm that is commonly used to train these networks on specific tasks. Many deep learning frameworks have their own implementation of training and sampling procedures for recurrent neural networks, while there are in fact multiple other possibilities to choose from and other parameters to tune. In existing literature this is very often overlooked or ignored. In this paper we therefore give an overview of possible training and sampling schemes for character-level recurrent neural networks to solve the task of predicting the next token in a given sequence. We test these different schemes on a variety of datasets, neural network architectures and parameter settings, and formulate a number of take-home recommendations. The choice of training and sampling scheme turns out to be subject to a number of trade-offs, such as training stability, sampling time, model performance and implementation effort, but is largely independent of the data. Perhaps the most surprising result is that transferring hidden states for correctly initializing the model on subsequences often leads to unstable training behavior depending on the dataset.
Virtual support agents have grown in popularity as a way for businesses to provide better and more accessible customer service. Some challenges in this domain include ambiguous user queries as well as changing support topics and user behavior (non-stationarity). We do, however, have access to partial feedback provided by the user (clicks, surveys, and other events) which can be leveraged to improve the user experience. Adaptable learning techniques, like contextual bandits, are a natural fit for this problem setting. In this paper, we discuss real-world implementations of contextual bandits (CB) for the Microsoft virtual agent. It includes intent disambiguation based on neural-linear bandits (NLB) and contextual recommendations based on a collection of multi-armed bandits (MAB). Our solutions have been deployed to production and have improved key business metrics of the Microsoft virtual agent, as confirmed by A/B experiments. Results include a relative increase of over 12% in problem resolution rate and relative decrease of over 4% in escalations to a human operator. While our current use cases focus on intent disambiguation and contextual recommendation for support bots, we believe our methods can be extended to other domains.
In the current competitive environment, it is crucial for manufacturers to make the best decisions in the shortest time, in order to optimize the efficiency and effectiveness of the manufacturing systems. These decisions reach from the strategic level to tactical and operational production planning and control. In this context, elaborating intelligent decisions support systems (DSS) that are capable of integrating a wide variety of models along with data and knowledge resources has become promising. This paper proposes an intelligent DSS for quality control planning. The DSS is a recommender system (RS) that helps the decision maker to select the best control scenario using two different approaches. The first is a manual choice using a multi-criteria decision making method. The second is an automatic recommendation based on case-based reasoning (CBR) technique. Furthermore, the proposed RS makes it possible to continuously update the control plans in order to be adapted to the actual process quality situation. In so doing, CBR is used for learning the required knowledge in order to improve the decision quality. A numerical application is performed in a real case study in order to illustrate the feasibility and practicability of the proposed DSS.
Fashion preference is a fuzzy concept that depends on customer taste, prevailing norms in fashion product/style, henceforth used interchangeably, and a customer's perception of utility or fashionability, yet fashion e-retail relies on algorithmically generated search and recommendation systems that process structured data and images to best match customer preference. Retailers study tastes solely as a function of what sold vs what did not, and take it to represent customer preference. Such explicit modeling, however, belies the underlying user preference, which is a complicated interplay of preference and commercials such as brand, price point, promotions, other sale events, and competitor push/marketing. It is hard to infer a notion of utility or even customer preference by looking at sales data. In search and recommendation systems for fashion e-retail, customer preference is implicitly derived by user-user similarity or item-item similarity. In this work, we aim to derive a metric that separates the buying preferences of users from the commercials of the merchandise (price, promotions, etc). We extend our earlier work on explicit signals to gauge sellability or preference with implicit signals from user behaviour.
User response prediction makes a crucial contribution to the rapid development of online advertising system and recommendation system. The importance of learning feature interactions has been emphasized by many works. Many deep models are proposed to automatically learn high-order feature interactions. Since most features in advertising system and recommendation system are high-dimensional sparse features, deep models usually learn a low-dimensional distributed representation for each feature in the bottom layer. Besides traditional fully-connected architectures, some new operations, such as convolutional operations and product operations, are proposed to learn feature interactions better. In these models, the representation is shared among different operations. However, the best representation for different operations may be different. In this paper, we propose a new neural model named Operation-aware Neural Networks (ONN) which learns different representations for different operations. Our experimental results on two large-scale real-world ad click/conversion datasets demonstrate that ONN consistently outperforms the state-of-the-art models in both offline-training environment and online-training environment.
Collaborative filtering (CF) has achieved great success in the field of recommender systems. In recent years, many novel CF models, particularly those based on deep learning or graph techniques, have been proposed for a variety of recommendation tasks, such as rating prediction and item ranking. These newly published models usually demonstrate their performance in comparison to baselines or existing models in terms of accuracy improvements. However, others have pointed out that many newly proposed models are not as strong as expected and are outperformed by very simple baselines. This paper proposes a simple linear model based on Matrix Factorization (MF), called UserReg, which regularizes users' latent representations with explicit feedback information for rating prediction. We compare the effectiveness of UserReg with three linear CF models that are widely-used as baselines, and with a set of recently proposed complex models that are based on deep learning or graph techniques. Experimental results show that UserReg achieves overall better performance than the fine-tuned baselines considered and is highly competitive when compared with other recently proposed models. We conclude that UserReg can be used as a strong baseline for future CF research.
In recent years, knowledge graphs have been widely applied to organize data in a uniform way and enhance many tasks that require knowledge, for example, online shopping which has greatly facilitated people's life. As a backbone for online shopping platforms, we built a billion-scale e-commerce product knowledge graph for various item knowledge services such as item recommendation. However, such knowledge services usually include tedious data selection and model design for knowledge infusion, which might bring inappropriate results. Thus, to avoid this problem, we propose a Pre-trained Knowledge Graph Model (PKGM) for our billion-scale e-commerce product knowledge graph, providing item knowledge services in a uniform way for embedding-based models without accessing triple data in the knowledge graph. Notably, PKGM could also complete knowledge graphs during servicing, thereby overcoming the common incompleteness issue in knowledge graphs. We test PKGM in three knowledge-related tasks including item classification, same item identification, and recommendation. Experimental results show PKGM successfully improves the performance of each task.
Decisions made by machine learning systems have increasing influence on the world, yet it is common for machine learning algorithms to assume that no such influence exists. An example is the use of the i.i.d. assumption in content recommendation. In fact, the (choice of) content displayed can change users' perceptions and preferences, or even drive them away, causing a shift in the distribution of users. We introduce the term auto-induced distributional shift (ADS) to describe the phenomenon of an algorithm causing a change in the distribution of its own inputs. Our goal is to ensure that machine learning systems do not leverage ADS to increase performance when doing so could be undesirable. We demonstrate that changes to the learning algorithm, such as the introduction of meta-learning, can cause hidden incentives for auto-induced distributional shift (HI-ADS) to be revealed. To address this issue, we introduce `unit tests' and a mitigation strategy for HI-ADS, as well as a toy environment for modelling real-world issues with HI-ADS in content recommendation, where we demonstrate that strong meta-learners achieve gains in performance via ADS. We show meta-learning and Q-learning both sometimes fail unit tests, but pass when using our mitigation strategy.
Click-Through Rate (CTR) prediction is one of the most important machine learning tasks in recommender systems, driving personalized experience for billions of consumers. Neural architecture search (NAS), as an emerging field, has demonstrated its capabilities in discovering powerful neural network architectures, which motivates us to explore its potential for CTR predictions. Due to 1) diverse unstructured feature interactions, 2) heterogeneous feature space, and 3) high data volume and intrinsic data randomness, it is challenging to construct, search, and compare different architectures effectively for recommendation models. To address these challenges, we propose an automated interaction architecture discovering framework for CTR prediction named AutoCTR. Via modularizing simple yet representative interactions as virtual building blocks and wiring them into a space of direct acyclic graphs, AutoCTR performs evolutionary architecture exploration with learning-to-rank guidance at the architecture level and achieves acceleration using low-fidelity model. Empirical analysis demonstrates the effectiveness of AutoCTR on different datasets comparing to human-crafted architectures. The discovered architecture also enjoys generalizability and transferability among different datasets.