Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fabian Haak

REANIMATOR: Reanimate Retrieval Test Collections with Extracted and Synthetic Resources

Apr 10, 2025

Björn Engelmann, Fabian Haak, Philipp Schaer, Mani Erfanian Abdoust, Linus Netze, Meik Bittkowski

Abstract:Retrieval test collections are essential for evaluating information retrieval systems, yet they often lack generalizability across tasks. To overcome this limitation, we introduce REANIMATOR, a versatile framework designed to enable the repurposing of existing test collections by enriching them with extracted and synthetic resources. REANIMATOR enhances test collections from PDF files by parsing full texts and machine-readable tables, as well as related contextual information. It then employs state-of-the-art large language models to produce synthetic relevance labels. Including an optional human-in-the-loop step can help validate the resources that have been extracted and generated. We demonstrate its potential with a revitalized version of the TREC-COVID test collection, showcasing the development of a retrieval-augmented generation system and evaluating the impact of tables on retrieval-augmented generation. REANIMATOR enables the reuse of test collections for new applications, lowering costs and broadening the utility of legacy resources.

Via

Access Paper or Ask Questions

Investigating Bias in Political Search Query Suggestions by Relative Comparison with LLMs

Oct 31, 2024

Fabian Haak, Björn Engelmann, Christin Katharina Kreutz, Philipp Schaer

Figure 1 for Investigating Bias in Political Search Query Suggestions by Relative Comparison with LLMs

Figure 2 for Investigating Bias in Political Search Query Suggestions by Relative Comparison with LLMs

Figure 3 for Investigating Bias in Political Search Query Suggestions by Relative Comparison with LLMs

Abstract:Search query suggestions affect users' interactions with search engines, which then influences the information they encounter. Thus, bias in search query suggestions can lead to exposure to biased search results and can impact opinion formation. This is especially critical in the political domain. Detecting and quantifying bias in web search engines is difficult due to its topic dependency, complexity, and subjectivity. The lack of context and phrasality of query suggestions emphasizes this problem. In a multi-step approach, we combine the benefits of large language models, pairwise comparison, and Elo-based scoring to identify and quantify bias in English search query suggestions. We apply our approach to the U.S. political news domain and compare bias in Google and Bing.

Via

Access Paper or Ask Questions

The Media Bias Taxonomy: A Systematic Literature Review on the Forms and Automated Detection of Media Bias

Jan 10, 2024

Timo Spinde, Smi Hinterreiter, Fabian Haak, Terry Ruas, Helge Giese, Norman Meuschke, Bela Gipp

Figure 1 for The Media Bias Taxonomy: A Systematic Literature Review on the Forms and Automated Detection of Media Bias

Figure 2 for The Media Bias Taxonomy: A Systematic Literature Review on the Forms and Automated Detection of Media Bias

Figure 3 for The Media Bias Taxonomy: A Systematic Literature Review on the Forms and Automated Detection of Media Bias

Figure 4 for The Media Bias Taxonomy: A Systematic Literature Review on the Forms and Automated Detection of Media Bias

Abstract:The way the media presents events can significantly affect public perception, which in turn can alter people's beliefs and views. Media bias describes a one-sided or polarizing perspective on a topic. This article summarizes the research on computational methods to detect media bias by systematically reviewing 3140 research papers published between 2019 and 2022. To structure our review and support a mutual understanding of bias across research domains, we introduce the Media Bias Taxonomy, which provides a coherent overview of the current state of research on media bias from different perspectives. We show that media bias detection is a highly active research field, in which transformer-based classification approaches have led to significant improvements in recent years. These improvements include higher classification accuracy and the ability to detect more fine-granular types of bias. However, we have identified a lack of interdisciplinarity in existing projects, and a need for more awareness of the various types of media bias to support methodologically thorough performance evaluations of media bias detection systems. Concluding from our analysis, we see the integration of recent machine learning advancements with reliable and diverse bias assessment strategies from other research areas as the most promising area for future research contributions in the field.

Via

Access Paper or Ask Questions

$Q_{bias}$ -- A Dataset on Media Bias in Search Queries and Query Suggestions

Nov 29, 2023

Fabian Haak, Philipp Schaer

$Figure 1 for $Q_{bias}$ -- A Dataset on Media Bias in Search Queries and Query Suggestions$

$Figure 2 for $Q_{bias}$ -- A Dataset on Media Bias in Search Queries and Query Suggestions$

Abstract:This publication describes the motivation and generation of $Q_{bias}$, a large dataset of Google and Bing search queries, a scraping tool and dataset for biased news articles, as well as language models for the investigation of bias in online search. Web search engines are a major factor and trusted source in information search, especially in the political domain. However, biased information can influence opinion formation and lead to biased opinions. To interact with search engines, users formulate search queries and interact with search query suggestions provided by the search engines. A lack of datasets on search queries inhibits research on the subject. We use $Q_{bias}$ to evaluate different approaches to fine-tuning transformer-based language models with the goal of producing models capable of biasing text with left and right political stance. Additionally to this work we provided datasets and language models for biasing texts that allow further research on bias in online information search.

* Paper accepted at ACM Web Science Conference 2023. 6 pages

Via

Access Paper or Ask Questions

Text Simplification of Scientific Texts for Non-Expert Readers

Jul 07, 2023

Björn Engelmann, Fabian Haak, Christin Katharina Kreutz, Narjes Nikzad Khasmakhi, Philipp Schaer

Abstract:Reading levels are highly individual and can depend on a text's language, a person's cognitive abilities, or knowledge on a topic. Text simplification is the task of rephrasing a text to better cater to the abilities of a specific target reader group. Simplification of scientific abstracts helps non-experts to access the core information by bypassing formulations that require domain or expert knowledge. This is especially relevant for, e.g., cancer patients reading about novel treatment options. The SimpleText lab hosts the simplification of scientific abstracts for non-experts (Task 3) to advance this field. We contribute three runs employing out-of-the-box summarization models (two based on T5, one based on PEGASUS) and one run using ChatGPT with complex phrase identification.

* Paper accepted at SimpleText@CLEF'23, 12 pages, 1 Figure, 4 Tables

Via

Access Paper or Ask Questions