Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Meik Bittkowski

REANIMATOR: Reanimate Retrieval Test Collections with Extracted and Synthetic Resources

Apr 10, 2025

Björn Engelmann, Fabian Haak, Philipp Schaer, Mani Erfanian Abdoust, Linus Netze, Meik Bittkowski

Figure 1 for REANIMATOR: Reanimate Retrieval Test Collections with Extracted and Synthetic Resources

Figure 2 for REANIMATOR: Reanimate Retrieval Test Collections with Extracted and Synthetic Resources

Figure 3 for REANIMATOR: Reanimate Retrieval Test Collections with Extracted and Synthetic Resources

Figure 4 for REANIMATOR: Reanimate Retrieval Test Collections with Extracted and Synthetic Resources

Abstract:Retrieval test collections are essential for evaluating information retrieval systems, yet they often lack generalizability across tasks. To overcome this limitation, we introduce REANIMATOR, a versatile framework designed to enable the repurposing of existing test collections by enriching them with extracted and synthetic resources. REANIMATOR enhances test collections from PDF files by parsing full texts and machine-readable tables, as well as related contextual information. It then employs state-of-the-art large language models to produce synthetic relevance labels. Including an optional human-in-the-loop step can help validate the resources that have been extracted and generated. We demonstrate its potential with a revitalized version of the TREC-COVID test collection, showcasing the development of a retrieval-augmented generation system and evaluating the impact of tables on retrieval-augmented generation. REANIMATOR enables the reuse of test collections for new applications, lowering costs and broadening the utility of legacy resources.

Via

Access Paper or Ask Questions

Automated Statement Extraction from Press Briefings

Feb 24, 2023

Jüri Keller, Meik Bittkowski, Philipp Schaer

Abstract:Scientific press briefings are a valuable information source. They consist of alternating expert speeches, questions from the audience and their answers. Therefore, they can contribute to scientific and fact-based media coverage. Even though press briefings are highly informative, extracting statements relevant to individual journalistic tasks is challenging and time-consuming. To support this task, an automated statement extraction system is proposed. Claims are used as the main feature to identify statements in press briefing transcripts. The statement extraction task is formulated as a four-step procedure. First, the press briefings are split into sentences and passages, then claim sentences are identified through sequence classification. Subsequently, topics are detected, and the sentences are filtered to improve the coherence and assess the length of the statements. The results indicate that claim detection can be used to identify statements in press briefings. While many statements can be extracted automatically with this system, they are not always as coherent as needed to be understood without context and may need further review by knowledgeable persons.

* Datenbanksysteme f\"ur Business, Technologie und Web (BTW 2023)

Via

Access Paper or Ask Questions