Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Navid Rekabsaz

Unlearning Protected User Attributes in Recommendations with Adversarial Training

Jun 09, 2022

Christian Ganhör, David Penz, Navid Rekabsaz, Oleg Lesota, Markus Schedl

Figure 1 for Unlearning Protected User Attributes in Recommendations with Adversarial Training

Figure 2 for Unlearning Protected User Attributes in Recommendations with Adversarial Training

Figure 3 for Unlearning Protected User Attributes in Recommendations with Adversarial Training

Figure 4 for Unlearning Protected User Attributes in Recommendations with Adversarial Training

Abstract:Collaborative filtering algorithms capture underlying consumption patterns, including the ones specific to particular demographics or protected information of users, e.g. gender, race, and location. These encoded biases can influence the decision of a recommendation system (RS) towards further separation of the contents provided to various demographic subgroups, and raise privacy concerns regarding the disclosure of users' protected attributes. In this work, we investigate the possibility and challenges of removing specific protected information of users from the learned interaction representations of a RS algorithm, while maintaining its effectiveness. Specifically, we incorporate adversarial training into the state-of-the-art MultVAE architecture, resulting in a novel model, Adversarial Variational Auto-Encoder with Multinomial Likelihood (Adv-MultVAE), which aims at removing the implicit information of protected attributes while preserving recommendation performance. We conduct experiments on the MovieLens-1M and LFM-2b-DemoBias datasets, and evaluate the effectiveness of the bias mitigation method based on the inability of external attackers in revealing the users' gender information from the model. Comparing with baseline MultVAE, the results show that Adv-MultVAE, with marginal deterioration in performance (w.r.t. NDCG and recall), largely mitigates inherent biases in the model on both datasets.

* Accepted at SIGIR 2022

Via

Access Paper or Ask Questions

Parameter Efficient Diff Pruning for Bias Mitigation

May 30, 2022

Lukas Hauzenberger, Navid Rekabsaz

Figure 1 for Parameter Efficient Diff Pruning for Bias Mitigation

Figure 2 for Parameter Efficient Diff Pruning for Bias Mitigation

Figure 3 for Parameter Efficient Diff Pruning for Bias Mitigation

Figure 4 for Parameter Efficient Diff Pruning for Bias Mitigation

Abstract:In recent years language models have achieved state of the art performance on a wide variety of natural language processing tasks. As these models are continuously growing in size it becomes increasingly important to explore methods to make them more storage efficient. At the same time their increase cognitive abilities increase the danger that societal bias existing in datasets are implicitly encoded in the model weights. We propose an architecture which deals with these two challenges at the same time using two techniques: DiffPruning and Adverserial Training. The result is a modular architecture which extends the original DiffPurning setup with and additional sparse subnetwork applied as a mask to diminish the effects of a predefined protected attribute at inference time.

Via

Access Paper or Ask Questions

Do Perceived Gender Biases in Retrieval Results Affect Relevance Judgements?

Mar 03, 2022

Klara Krieg, Emilia Parada-Cabaleiro, Markus Schedl, Navid Rekabsaz

Figure 1 for Do Perceived Gender Biases in Retrieval Results Affect Relevance Judgements?

Figure 2 for Do Perceived Gender Biases in Retrieval Results Affect Relevance Judgements?

Figure 3 for Do Perceived Gender Biases in Retrieval Results Affect Relevance Judgements?

Figure 4 for Do Perceived Gender Biases in Retrieval Results Affect Relevance Judgements?

Abstract:This work investigates the effect of gender-stereotypical biases in the content of retrieved results on the relevance judgement of users/annotators. In particular, since relevance in information retrieval (IR) is a multi-dimensional concept, we study whether the value and quality of the retrieved documents for some bias-sensitive queries can be judged differently when the content of the documents represents different genders. To this aim, we conduct a set of experiments where the genders of the participants are known as well as experiments where the participants genders are not specified. The set of experiments comprise of retrieval tasks, where participants perform a rated relevance judgement for different search query and search result document compilations. The shown documents contain different gender indications and are either relevant or non-relevant to the query. The results show the differences between the average judged relevance scores among documents with various gender contents. Our work initiates further research on the connection of the perception of gender stereotypes in users with their judgements and effects on IR systems, and aim to raise awareness about the possible biases in this domain.

* Accepted at workshop on Algorithmic Bias in Search and Recommendation at ECIR 2022

Via

Access Paper or Ask Questions

Grep-BiasIR: A Dataset for Investigating Gender Representation-Bias in Information Retrieval Results

Jan 19, 2022

Klara Krieg, Emilia Parada-Cabaleiro, Gertraud Medicus, Oleg Lesota, Markus Schedl, Navid Rekabsaz

Figure 1 for Grep-BiasIR: A Dataset for Investigating Gender Representation-Bias in Information Retrieval Results

Figure 2 for Grep-BiasIR: A Dataset for Investigating Gender Representation-Bias in Information Retrieval Results

Figure 3 for Grep-BiasIR: A Dataset for Investigating Gender Representation-Bias in Information Retrieval Results

Abstract:The results of information retrieval (IR) systems on specific queries can reflect the existing societal biases and stereotypes, which will be further propagated and straightened through interactions of the uses with the systems. We introduce Grep-BiasIR, a novel thoroughly-audited dataset which aim to facilitate the studies of gender bias in the retrieved results of IR systems. The Grep-BiasIR dataset offers 105 bias-sensitive neutral search queries, where each query is accompanied with a set of relevant and non-relevant documents with contents indicating various genders. The dataset is available at https://github.com/KlaraKrieg/GrepBiasIR.

Via

Access Paper or Ask Questions

CODER: An efficient framework for improving retrieval through COntextualized Document Embedding Reranking

Dec 16, 2021

George Zerveas, Navid Rekabsaz, Daniel Cohen, Carsten Eickhoff

Figure 1 for CODER: An efficient framework for improving retrieval through COntextualized Document Embedding Reranking

Figure 2 for CODER: An efficient framework for improving retrieval through COntextualized Document Embedding Reranking

Figure 3 for CODER: An efficient framework for improving retrieval through COntextualized Document Embedding Reranking

Figure 4 for CODER: An efficient framework for improving retrieval through COntextualized Document Embedding Reranking

Abstract:We present a framework for improving the performance of a wide class of retrieval models at minimal computational cost. It utilizes precomputed document representations extracted by a base dense retrieval method and involves training a model to jointly score a large set of retrieved candidate documents for each query, while potentially transforming on the fly the representation of each document in the context of the other candidates as well as the query itself. When scoring a document representation based on its similarity to a query, the model is thus aware of the representation of its "peer" documents. We show that our approach leads to substantial improvement in retrieval performance over the base method and over scoring candidate documents in isolation from one another, as in a pair-wise training setting. Crucially, unlike term-interaction rerankers based on BERT-like encoders, it incurs a negligible computational overhead on top of any first-stage method at run time, allowing it to be easily combined with any state-of-the-art dense retrieval method. Finally, concurrently considering a set of candidate documents for a given query enables additional valuable capabilities in retrieval, such as score calibration and mitigating societal biases in ranking.

Via

Access Paper or Ask Questions

WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models

Dec 13, 2021

Benjamin Minixhofer, Fabian Paischer, Navid Rekabsaz

Figure 1 for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models

Figure 2 for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models

Figure 3 for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models

Figure 4 for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models

Abstract:Recently, large pretrained language models (LMs) have gained popularity. Training these models requires ever more computational resources and most of the existing models are trained on English text only. It is exceedingly expensive to train these models in other languages. To alleviate this problem, we introduce a method -- called WECHSEL -- to transfer English models to new languages. We exchange the tokenizer of the English model with a tokenizer in the target language and initialize token embeddings such that they are close to semantically similar English tokens by utilizing multilingual static word embeddings covering English and the target language. We use WECHSEL to transfer GPT-2 and RoBERTa models to 4 other languages (French, German, Chinese and Swahili). WECHSEL improves over a previously proposed method for cross-lingual parameter transfer and outperforms models of comparable size trained from scratch in the target language with up to 64x less training effort. Our method makes training large language models for new languages more accessible and less damaging to the environment. We make our code and models publicly available.

Via

Access Paper or Ask Questions

Analyzing Item Popularity Bias of Music Recommender Systems: Are Different Genders Equally Affected?

Aug 16, 2021

Oleg Lesota, Alessandro B. Melchiorre, Navid Rekabsaz, Stefan Brandl, Dominik Kowald, Elisabeth Lex, Markus Schedl

Figure 1 for Analyzing Item Popularity Bias of Music Recommender Systems: Are Different Genders Equally Affected?

Figure 2 for Analyzing Item Popularity Bias of Music Recommender Systems: Are Different Genders Equally Affected?

Figure 3 for Analyzing Item Popularity Bias of Music Recommender Systems: Are Different Genders Equally Affected?

Abstract:Several studies have identified discrepancies between the popularity of items in user profiles and the corresponding recommendation lists. Such behavior, which concerns a variety of recommendation algorithms, is referred to as popularity bias. Existing work predominantly adopts simple statistical measures, such as the difference of mean or median popularity, to quantify popularity bias. Moreover, it does so irrespective of user characteristics other than the inclination to popular content. In this work, in contrast, we propose to investigate popularity differences (between the user profile and recommendation list) in terms of median, a variety of statistical moments, as well as similarity measures that consider the entire popularity distributions (Kullback-Leibler divergence and Kendall's tau rank-order correlation). This results in a more detailed picture of the characteristics of popularity bias. Furthermore, we investigate whether such algorithmic popularity bias affects users of different genders in the same way. We focus on music recommendation and conduct experiments on the recently released standardized LFM-2b dataset, containing listening profiles of Last.fm users. We investigate the algorithmic popularity bias of seven common recommendation algorithms (five collaborative filtering and two baselines). Our experiments show that (1) the studied metrics provide novel insights into popularity bias in comparison with only using average differences, (2) algorithms less inclined towards popularity bias amplification do not necessarily perform worse in terms of utility (NDCG), (3) the majority of the investigated recommenders intensify the popularity bias of the female users.

* RecSys 2021 - LBR

Via

Access Paper or Ask Questions

A Modern Perspective on Query Likelihood with Deep Generative Retrieval Models

Jun 25, 2021

Oleg Lesota, Navid Rekabsaz, Daniel Cohen, Klaus Antonius Grasserbauer, Carsten Eickhoff, Markus Schedl

Figure 1 for A Modern Perspective on Query Likelihood with Deep Generative Retrieval Models

Figure 2 for A Modern Perspective on Query Likelihood with Deep Generative Retrieval Models

Figure 3 for A Modern Perspective on Query Likelihood with Deep Generative Retrieval Models

Figure 4 for A Modern Perspective on Query Likelihood with Deep Generative Retrieval Models

Abstract:Existing neural ranking models follow the text matching paradigm, where document-to-query relevance is estimated through predicting the matching score. Drawing from the rich literature of classical generative retrieval models, we introduce and formalize the paradigm of deep generative retrieval models defined via the cumulative probabilities of generating query terms. This paradigm offers a grounded probabilistic view on relevance estimation while still enabling the use of modern neural architectures. In contrast to the matching paradigm, the probabilistic nature of generative rankers readily offers a fine-grained measure of uncertainty. We adopt several current neural generative models in our framework and introduce a novel generative ranker (T-PGN), which combines the encoding capacity of Transformers with the Pointer Generator Network model. We conduct an extensive set of evaluation experiments on passage retrieval, leveraging the MS MARCO Passage Re-ranking and TREC Deep Learning 2019 Passage Re-ranking collections. Our results show the significantly higher performance of the T-PGN model when compared with other generative models. Lastly, we demonstrate that exploiting the uncertainty information of deep generative rankers opens new perspectives to query/collection understanding, and significantly improves the cut-off prediction task.

* ICTIR'21

Via

Access Paper or Ask Questions

Societal Biases in Retrieved Contents: Measurement Framework and Adversarial Mitigation for BERT Rankers

May 11, 2021

Navid Rekabsaz, Simone Kopeinik, Markus Schedl

Figure 1 for Societal Biases in Retrieved Contents: Measurement Framework and Adversarial Mitigation for BERT Rankers

Figure 2 for Societal Biases in Retrieved Contents: Measurement Framework and Adversarial Mitigation for BERT Rankers

Figure 3 for Societal Biases in Retrieved Contents: Measurement Framework and Adversarial Mitigation for BERT Rankers

Figure 4 for Societal Biases in Retrieved Contents: Measurement Framework and Adversarial Mitigation for BERT Rankers

Abstract:Societal biases resonate in the retrieved contents of information retrieval (IR) systems, resulting in reinforcing existing stereotypes. Approaching this issue requires established measures of fairness in respect to the representation of various social groups in retrieval results, as well as methods to mitigate such biases, particularly in the light of the advances in deep ranking models. In this work, we first provide a novel framework to measure the fairness in the retrieved text contents of ranking models. Introducing a ranker-agnostic measurement, the framework also enables the disentanglement of the effect on fairness of collection from that of rankers. To mitigate these biases, we propose AdvBert, a ranking model achieved by adapting adversarial bias mitigation for IR, which jointly learns to predict relevance and remove protected attributes. We conduct experiments on two passage retrieval collections (MSMARCO Passage Re-ranking and TREC Deep Learning 2019 Passage Re-ranking), which we extend by fairness annotations of a selected subset of queries regarding gender attributes. Our results on the MSMARCO benchmark show that, (1) all ranking models are less fair in comparison with ranker-agnostic baselines, and (2) the fairness of Bert rankers significantly improves when using the proposed AdvBert models. Lastly, we investigate the trade-off between fairness and utility, showing that we can maintain the significant improvements in fairness without any significant loss in utility.

* Accepted at SIGIR 2021

Via

Access Paper or Ask Questions

Not All Relevance Scores are Equal: Efficient Uncertainty and Calibration Modeling for Deep Retrieval Models

May 10, 2021

Daniel Cohen, Bhaskar Mitra, Oleg Lesota, Navid Rekabsaz, Carsten Eickhoff

Figure 1 for Not All Relevance Scores are Equal: Efficient Uncertainty and Calibration Modeling for Deep Retrieval Models

Figure 2 for Not All Relevance Scores are Equal: Efficient Uncertainty and Calibration Modeling for Deep Retrieval Models

Figure 3 for Not All Relevance Scores are Equal: Efficient Uncertainty and Calibration Modeling for Deep Retrieval Models

Figure 4 for Not All Relevance Scores are Equal: Efficient Uncertainty and Calibration Modeling for Deep Retrieval Models

Abstract:In any ranking system, the retrieval model outputs a single score for a document based on its belief on how relevant it is to a given search query. While retrieval models have continued to improve with the introduction of increasingly complex architectures, few works have investigated a retrieval model's belief in the score beyond the scope of a single value. We argue that capturing the model's uncertainty with respect to its own scoring of a document is a critical aspect of retrieval that allows for greater use of current models across new document distributions, collections, or even improving effectiveness for down-stream tasks. In this paper, we address this problem via an efficient Bayesian framework for retrieval models which captures the model's belief in the relevance score through a stochastic process while adding only negligible computational overhead. We evaluate this belief via a ranking based calibration metric showing that our approximate Bayesian framework significantly improves a retrieval model's ranking effectiveness through a risk aware reranking as well as its confidence calibration. Lastly, we demonstrate that this additional uncertainty information is actionable and reliable on down-stream tasks represented via cutoff prediction.

* ACM SIGIR preprint

Via

Access Paper or Ask Questions