The possible impact of algorithmic recommendation on the autonomy and free choice of Internet users is being increasingly discussed, especially in terms of the rendering of information and the structuring of interactions. This paper aims at reviewing and framing this issue along a double dichotomy. The first one addresses the discrepancy between users' intentions and actions (1) under some algorithmic influence and (2) without it. The second one distinguishes algorithmic biases on (1) prior information rearrangement and (2) posterior information arrangement. In all cases, we focus on and differentiate situations where algorithms empirically appear to expand the cognitive and social horizon of users, from those where they seem to limit that horizon. We additionally suggest that these biases may not be properly appraised without taking into account the underlying social processes which algorithms are building upon.
We address the problem of constructing a knowledge base of entity-oriented search intents. Search intents are defined on the level of entity types, each comprising of a high-level intent category (property, website, service, or other), along with a cluster of query terms used to express that intent. These machine-readable statements can be leveraged in various applications, e.g., for generating entity cards or query recommendations. By structuring service-oriented search intents, we take one step towards making entities actionable. The main contribution of this paper is a pipeline of components we develop to construct a knowledge base of entity intents. We evaluate performance both component-wise and end-to-end, and demonstrate that our approach is able to generate high-quality data.
The presented study is an eye tracking experiment for high-resolution satellite (HRS) images. The reported experiment explores the Area Of Interest (AOI) based analysis of eye fixation data for complex HRS images. The study reflects the requisite of reference data for bottom-up saliency-based segmentation and the struggle of eye tracking data analysis for complex satellite images. The intended fixation data analysis aims towards the reference data creation for bottom-up saliency-based segmentation of high-resolution satellite images. The analytical outcome of this experimental study provides a solution for AOI-based analysis for fixation data in the complex environment of satellite images and recommendations for reference data construction which is already an ongoing effort.
We investigate crowdsourcing algorithms for finding the top-quality item within a large collection of objects with unknown intrinsic quality values. This is an important problem with many relevant applications, for example in networked recommendation systems. The core of the algorithms is that objects are distributed to crowd workers, who return a noisy and biased evaluation. All received evaluations are then combined, to identify the top-quality object. We first present a simple probabilistic model for the system under investigation. Then, we devise and study a class of efficient adaptive algorithms to assign in an effective way objects to workers. We compare the performance of several algorithms, which correspond to different choices of the design parameters/metrics. In the simulations we show that some of the algorithms achieve near optimal performance for a suitable setting of the system parameters.
Motivated by the reconstruction and the prediction of electricity consumption, we extend Nonnegative Matrix Factorization~(NMF) to take into account side information (column or row features). We consider general linear measurement settings, and propose a framework which models non-linear relationships between features and the response variables. We extend previous theoretical results to obtain a sufficient condition on the identifiability of the NMF in this setting. Based the classical Hierarchical Alternating Least Squares~(HALS) algorithm, we propose a new algorithm (HALSX, or Hierarchical Alternating Least Squares with eXogeneous variables) which estimates the factorization model. The algorithm is validated on both simulated and real electricity consumption datasets as well as a recommendation dataset, to show its performance in matrix recovery and prediction for new rows and columns.
The outbreak of the SARS-CoV-2 pandemic of the new COVID-19 disease (COVID-19 for short) demands empowering existing medical, economic, and social emergency backend systems with data analytics capabilities. An impediment in taking advantages of data analytics in these systems is the lack of a unified framework or reference model. Ontologies are highlighted as a promising solution to bridge this gap by providing a formal representation of COVID-19 concepts such as symptoms, infections rate, contact tracing, and drug modelling. Ontology-based solutions enable the integration of diverse data sources that leads to a better understanding of pandemic data, management of smart lockdowns by identifying pandemic hotspots, and knowledge-driven inference, reasoning, and recommendations to tackle surrounding issues.
"Art is the lie that enables us to realize the truth." - Pablo Picasso. For centuries, humans have dedicated themselves to producing arts to convey their imagination. The advancement in technology and deep learning in particular, has caught the attention of many researchers trying to investigate whether art generation is possible by computers and algorithms. Using generative adversarial networks (GANs), applications such as synthesizing photorealistic human faces and creating captions automatically from images were realized. This survey takes a comprehensive look at the recent works using GANs for generating visual arts, music, and literary text. A performance comparison and description of the various GAN architecture are also presented. Finally, some of the key challenges in art generation using GANs are highlighted along with recommendations for future work.
Submodular maximization has become established as the method of choice for the task of selecting representative and diverse summaries of data. However, if datapoints have sensitive attributes such as gender or age, such machine learning algorithms, left unchecked, are known to exhibit bias: under- or over-representation of particular groups. This has made the design of fair machine learning algorithms increasingly important. In this work we address the question: Is it possible to create fair summaries for massive datasets? To this end, we develop the first streaming approximation algorithms for submodular maximization under fairness constraints, for both monotone and non-monotone functions. We validate our findings empirically on exemplar-based clustering, movie recommendation, DPP-based summarization, and maximum coverage in social networks, showing that fairness constraints do not significantly impact utility.
Podcast summary, an important factor affecting end-users' listening decisions, has often been considered a critical feature in podcast recommendation systems, as well as many downstream applications. Existing abstractive summarization approaches are mainly built on fine-tuned models on professionally edited texts such as CNN and DailyMail news. Different from news, podcasts are often longer, more colloquial and conversational, and noisier with contents on commercials and sponsorship, which makes automatic podcast summarization extremely challenging. This paper presents a baseline analysis of podcast summarization using the Spotify Podcast Dataset provided by TREC 2020. It aims to help researchers understand current state-of-the-art pre-trained models and hence build a foundation for creating better models.