Decisions made by machine learning systems have increasing influence on the world, yet it is common for machine learning algorithms to assume that no such influence exists. An example is the use of the i.i.d. assumption in content recommendation. In fact, the (choice of) content displayed can change users' perceptions and preferences, or even drive them away, causing a shift in the distribution of users. We introduce the term auto-induced distributional shift (ADS) to describe the phenomenon of an algorithm causing a change in the distribution of its own inputs. Our goal is to ensure that machine learning systems do not leverage ADS to increase performance when doing so could be undesirable. We demonstrate that changes to the learning algorithm, such as the introduction of meta-learning, can cause hidden incentives for auto-induced distributional shift (HI-ADS) to be revealed. To address this issue, we introduce `unit tests' and a mitigation strategy for HI-ADS, as well as a toy environment for modelling real-world issues with HI-ADS in content recommendation, where we demonstrate that strong meta-learners achieve gains in performance via ADS. We show meta-learning and Q-learning both sometimes fail unit tests, but pass when using our mitigation strategy.
Click-Through Rate (CTR) prediction is one of the most important machine learning tasks in recommender systems, driving personalized experience for billions of consumers. Neural architecture search (NAS), as an emerging field, has demonstrated its capabilities in discovering powerful neural network architectures, which motivates us to explore its potential for CTR predictions. Due to 1) diverse unstructured feature interactions, 2) heterogeneous feature space, and 3) high data volume and intrinsic data randomness, it is challenging to construct, search, and compare different architectures effectively for recommendation models. To address these challenges, we propose an automated interaction architecture discovering framework for CTR prediction named AutoCTR. Via modularizing simple yet representative interactions as virtual building blocks and wiring them into a space of direct acyclic graphs, AutoCTR performs evolutionary architecture exploration with learning-to-rank guidance at the architecture level and achieves acceleration using low-fidelity model. Empirical analysis demonstrates the effectiveness of AutoCTR on different datasets comparing to human-crafted architectures. The discovered architecture also enjoys generalizability and transferability among different datasets.
In today's large enterprises there is a significant increasing trend in the amount of data that has to be stored and processed. To complicate this scenario the complexity of organizing and managing a large collection of data, structured according to a single, unified schema, makes so that there is almost never a single place where to look to satisfy an information need. The Ontology-Based Data Access (OBDA) paradigm aims at mitigating this phenomenon by providing to the users of the system a unified and shared conceptual view of the domain of interest (ontology), while still enabling the data to be stored in different data sources, which are managed by a relational database. In an OBDA system the link between the data stored at the sources and the ontology is provided through a declarative specification given in terms of a set of mappings. In this work we focus on comparing two of the available systems for OBDA, namely, Mastro and Ontop, by adopting OBDA specifications based on W3C recommendations. We first show how support for R2RML mappings has been integrated in Mastro, which was the last feature missing in order to enable the system to use specifications based solely on W3C recommendations relevant to OBDA. We then proceed in performing a comparison between these systems over two OBDA specifications, the NPD Benchmark and the ACI specification.
We consider methods for learning vector representations of SQL queries to support generalized workload analytics tasks, including workload summarization for index selection and predicting queries that will trigger memory errors. We consider vector representations of both raw SQL text and optimized query plans, and evaluate these methods on synthetic and real SQL workloads. We find that general algorithms based on vector representations can outperform existing approaches that rely on specialized features. For index recommendation, we cluster the vector representations to compress large workloads with no loss in performance from the recommended index. For error prediction, we train a classifier over learned vectors that can automatically relate subtle syntactic patterns with specific errors raised during query execution. Surprisingly, we also find that these methods enable transfer learning, where a model trained on one SQL corpus can be applied to an unrelated corpus and still enable good performance. We find that these general approaches, when trained on a large corpus of SQL queries, provides a robust foundation for a variety of workload analysis tasks and database features, without requiring application-specific feature engineering.
Matrix factorization is a key component of collaborative filtering-based recommendation systems because it allows us to complete sparse user-by-item ratings matrices under a low-rank assumption that encodes the belief that similar users give similar ratings and that similar items garner similar ratings. This paradigm has had immeasurable practical success, but it is not the complete story for understanding and inferring the preferences of people. First, peoples' preferences and their observable manifestations as ratings evolve over time along general patterns of trajectories. Second, an individual person's preferences evolve over time through influence of their social connections. In this paper, we develop a unified process model for both types of dynamics within a state space approach, together with an efficient optimization scheme for estimation within that model. The model combines elements from recent developments in dynamic matrix factorization, opinion dynamics and social learning, and trust-based recommendation. The estimation builds upon recent advances in numerical nonlinear optimization. Empirical results on a large-scale data set from the Epinions website demonstrate consistent reduction in root mean squared error by consideration of the two types of dynamics.
Approximate nearest neighbor search (ANNS) constitutes an important operation in a multitude of applications, including recommendation systems, information retrieval, and pattern recognition. In the past decade, graph-based ANNS algorithms have been the leading paradigm in this domain, with dozens of graph-based ANNS algorithms proposed. Such algorithms aim to provide effective, efficient solutions for retrieving the nearest neighbors for a given query. Nevertheless, these efforts focus on developing and optimizing algorithms with different approaches, so there is a real need for a comprehensive survey about the approaches' relative performance, strengths, and pitfalls. Thus here we provide a thorough comparative analysis and experimental evaluation of 13 representative graph-based ANNS algorithms via a new taxonomy and fine-grained pipeline. We compared each algorithm in a uniform test environment on eight real-world datasets and 12 synthetic datasets with varying sizes and characteristics. Our study yields novel discoveries, offerings several useful principles to improve algorithms. This effort also helped us pinpoint algorithms' working portions, along with rule-of-thumb recommendations about promising research directions and suitable algorithms for practitioners in different fields.
Are Graph Neural Networks (GNNs) fair? In many real world graphs, the formation of edges is related to certain node attributes (e.g. gender, community, reputation). In this case, standard GNNs using these edges will be biased by this information, as it is encoded in the structure of the adjacency matrix itself. In this paper, we show that when metadata is correlated with the formation of node neighborhoods, unsupervised node embedding dimensions learn this metadata. This bias implies an inability to control for important covariates in real-world applications, such as recommendation systems. To solve these issues, we introduce the Metadata-Orthogonal Node Embedding Training (MONET) unit, a general model for debiasing embeddings of nodes in a graph. MONET achieves this by ensuring that the node embeddings are trained on a hyperplane orthogonal to that of the node metadata. This effectively organizes unstructured embedding dimensions into an interpretable topology-only, metadata-only division with no linear interactions. We illustrate the effectiveness of MONET though our experiments on a variety of real world graphs, which shows that our method can learn and remove the effect of arbitrary covariates in tasks such as preventing the leakage of political party affiliation in a blog network, and thwarting the gaming of embedding-based recommendation systems.
The Defense Advanced Research Projects Agency (DARPA) Real-time Adversarial Intelligence and Decision-making (RAID) program is investigating the feasibility of "reading the mind of the enemy" - to estimate and anticipate, in real-time, the enemy's likely goals, deceptions, actions, movements and positions. This program focuses specifically on urban battles at echelons of battalion and below. The RAID program leverages approximate game-theoretic and deception-sensitive algorithms to provide real-time enemy estimates to a tactical commander. A key hypothesis of the program is that these predictions and recommendations will make the commander more effective, i.e. he should be able to achieve his operational goals safer, faster, and more efficiently. Realistic experimentation and evaluation drive the development process using human-in-the-loop wargames to compare humans and the RAID system. Two experiments were conducted in 2005 as part of Phase I to determine if the RAID software could make predictions and recommendations as effectively and accurately as a 4-person experienced staff. This report discusses the intriguing and encouraging results of these first two experiments conducted by the RAID program. It also provides details about the experiment environment and methodology that were used to demonstrate and prove the research goals.
Similarity networks are important abstractions in many information management applications such as recommender systems, corpora analysis, and medical informatics. For instance, by inducing similarity networks between movies rated similarly by users, or between documents containing common terms, and or between clinical trials involving the same themes, we can aim to find the global structure of connectivities underlying the data, and use the network as a basis to make connections between seemingly disparate entities. In the above applications, composing similarities between objects of interest finds uses in serendipitous recommendation, in storytelling, and in clinical diagnosis, respectively. We present an algorithmic framework for traversing similarity paths using the notion of `hammock' paths which are generalization of traditional paths. Our framework is exploratory in nature so that, given starting and ending objects of interest, it explores candidate objects for path following, and heuristics to admissibly estimate the potential for paths to lead to a desired destination. We present three diverse applications: exploring movie similarities in the Netflix dataset, exploring abstract similarities across the PubMed corpus, and exploring description similarities in a database of clinical trials. Experimental results demonstrate the potential of our approach for unstructured knowledge discovery in similarity networks.
Temporal collaborative filtering (TCF) methods aim at modelling non-static aspects behind recommender systems, such as the dynamics in users' preferences and social trends around items. State-of-the-art TCF methods employ recurrent neural networks (RNNs) to model such aspects. These methods deploy matrix-factorization-based (MF-based) approaches to learn the user and item representations. Recently, graph-neural-network-based (GNN-based) approaches have shown improved performance in providing accurate recommendations over traditional MF-based approaches in non-temporal CF settings. Motivated by this, we propose a novel TCF method that leverages GNNs to learn user and item representations, and RNNs to model their temporal dynamics. A challenge with this method lies in the increased data sparsity, which negatively impacts obtaining meaningful quality representations with GNNs. To overcome this challenge, we train a GNN model at each time step using a set of observed interactions accumulated time-wise. Comprehensive experiments on real-world data show the improved performance obtained by our method over several state-of-the-art temporal and non-temporal CF models.