Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Adam Jatowt

University of Innsbruck, Austria

It's High Time: A Survey of Temporal Information Retrieval and Question Answering

May 26, 2025

Bhawna Piryani, Abdelrahman Abdullah, Jamshid Mozafari, Avishek Anand, Adam Jatowt

Abstract:Time plays a critical role in how information is generated, retrieved, and interpreted. In this survey, we provide a comprehensive overview of Temporal Information Retrieval and Temporal Question Answering, two research areas aimed at handling and understanding time-sensitive information. As the amount of time-stamped content from sources like news articles, web archives, and knowledge bases increases, systems must address challenges such as detecting temporal intent, normalizing time expressions, ordering events, and reasoning over evolving or ambiguous facts. These challenges are critical across many dynamic and time-sensitive domains, from news and encyclopedias to science, history, and social media. We review both traditional approaches and modern neural methods, including those that use transformer models and Large Language Models (LLMs). We also review recent advances in temporal language modeling, multi-hop reasoning, and retrieval-augmented generation (RAG), alongside benchmark datasets and evaluation strategies that test temporal robustness, recency awareness, and generalization.

Via

Access Paper or Ask Questions

The Impact of International Collaborations with Highly Publishing Countries in Computer Science

May 14, 2025

Alberto Gomez Espes, Michael Faerber, Adam Jatowt

Abstract:This paper analyzes international collaborations in Computer Science, focusing on three major players: China, the European Union, and the United States. Drawing from a comprehensive literature review, we examine collaboration patterns, research impact, retraction rates, and the role of the Development Index in shaping research outcomes. Our findings show that while China, the EU, and the US lead global research efforts, other regions are narrowing the gap in publication volume. Collaborations involving these key regions tend to have lower retraction rates, reflecting stronger adherence to scientific standards. We also find that countries with a Very High Development Index contribute to research with higher citation rates and fewer retractions. Overall, this study highlights the value of international collaboration and the importance of inclusive, ethical practices in advancing global research in Computer Science.

Via

Access Paper or Ask Questions

Evaluating Answer Reranking Strategies in Time-sensitive Question Answering

Mar 06, 2025

Mehmet Kardan, Bhawna Piryani, Adam Jatowt

Abstract:Despite advancements in state-of-the-art models and information retrieval techniques, current systems still struggle to handle temporal information and to correctly answer detailed questions about past events. In this paper, we investigate the impact of temporal characteristics of answers in Question Answering (QA) by exploring several simple answer selection techniques. Our findings emphasize the role of temporal features in selecting the most relevant answers from diachronic document collections and highlight differences between explicit and implicit temporal questions.

Via

Access Paper or Ask Questions

Extending Dense Passage Retrieval with Temporal Information

Feb 28, 2025

Abdelrahman Abdallah, Bhawna Piryani, Jonas Wallat, Avishek Anand, Adam Jatowt

Abstract:Temporal awareness is crucial in many information retrieval tasks, particularly in scenarios where the relevance of documents depends on their alignment with the query's temporal context. Traditional retrieval methods such as BM25 and Dense Passage Retrieval (DPR) excel at capturing lexical and semantic relevance but fall short in addressing time-sensitive queries. To bridge this gap, we introduce the temporal retrieval model that integrates explicit temporal signals by incorporating query timestamps and document dates into the representation space. Our approach ensures that retrieved passages are not only topically relevant but also temporally aligned with user intent. We evaluate our approach on two large-scale benchmark datasets, ArchivalQA and ChroniclingAmericaQA, achieving substantial performance gains over standard retrieval baselines. In particular, our model improves Top-1 retrieval accuracy by 6.63% and NDCG@10 by 3.79% on ArchivalQA, while yielding a 9.56% boost in Top-1 retrieval accuracy and 4.68% in NDCG@10 on ChroniclingAmericaQA. Additionally, we introduce a time-sensitive negative sampling strategy, which refines the model's ability to distinguish between temporally relevant and irrelevant documents during training. Our findings highlight the importance of explicitly modeling time in retrieval systems and set a new standard for handling temporally grounded queries.

Via

Access Paper or Ask Questions

From Retrieval to Generation: Comparing Different Approaches

Feb 27, 2025

Abdelrahman Abdallah, Jamshid Mozafari, Bhawna Piryani, Mohammed Ali, Adam Jatowt

Abstract:Knowledge-intensive tasks, particularly open-domain question answering (ODQA), document reranking, and retrieval-augmented language modeling, require a balance between retrieval accuracy and generative flexibility. Traditional retrieval models such as BM25 and Dense Passage Retrieval (DPR), efficiently retrieve from large corpora but often lack semantic depth. Generative models like GPT-4-o provide richer contextual understanding but face challenges in maintaining factual consistency. In this work, we conduct a systematic evaluation of retrieval-based, generation-based, and hybrid models, with a primary focus on their performance in ODQA and related retrieval-augmented tasks. Our results show that dense retrievers, particularly DPR, achieve strong performance in ODQA with a top-1 accuracy of 50.17\% on NQ, while hybrid models improve nDCG@10 scores on BEIR from 43.42 (BM25) to 52.59, demonstrating their strength in document reranking. Additionally, we analyze language modeling tasks using WikiText-103, showing that retrieval-based approaches like BM25 achieve lower perplexity compared to generative and hybrid methods, highlighting their utility in retrieval-augmented generation. By providing detailed comparisons and practical insights into the conditions where each approach excels, we aim to facilitate future optimizations in retrieval, reranking, and generative models for ODQA and related knowledge-intensive applications.

* work on progress

Via

Access Paper or Ask Questions

MultiOCR-QA: Dataset for Evaluating Robustness of LLMs in Question Answering on Multilingual OCR Texts

Feb 24, 2025

Bhawna Piryani, Jamshid Mozafari, Abdelrahman Abdallah, Antoine Doucet, Adam Jatowt

Abstract:Optical Character Recognition (OCR) plays a crucial role in digitizing historical and multilingual documents, yet OCR errors -- imperfect extraction of the text, including character insertion, deletion and permutation -- can significantly impact downstream tasks like question-answering (QA). In this work, we introduce a multilingual QA dataset MultiOCR-QA, designed to analyze the effects of OCR noise on QA systems' performance. The MultiOCR-QA dataset comprises 60K question-answer pairs covering three languages, English, French, and German. The dataset is curated from OCR-ed old documents, allowing for the evaluation of OCR-induced challenges on question answering. We evaluate MultiOCR-QA on various levels and types of OCR errors to access the robustness of LLMs in handling real-world digitization errors. Our findings show that QA systems are highly prone to OCR induced errors and exhibit performance degradation on noisy OCR text.

Via

Access Paper or Ask Questions

Wrong Answers Can Also Be Useful: PlausibleQA -- A Large-Scale QA Dataset with Answer Plausibility Scores

Feb 22, 2025

Jamshid Mozafari, Abdelrahman Abdallah, Bhawna Piryani, Adam Jatowt

Abstract:Large Language Models (LLMs) are revolutionizing information retrieval, with chatbots becoming an important source for answering user queries. As by their design, LLMs prioritize generating correct answers, the value of highly plausible yet incorrect answers (candidate answers) tends to be overlooked. However, such answers can still prove useful, for example, they can play a crucial role in tasks like Multiple-Choice Question Answering (MCQA) and QA Robustness Assessment (QARA). Existing QA datasets primarily focus on correct answers without explicit consideration of the plausibility of other candidate answers, limiting opportunity for more nuanced evaluations of models. To address this gap, we introduce PlausibleQA, a large-scale dataset comprising 10,000 questions and 100,000 candidate answers, each annotated with plausibility scores and justifications for their selection. Additionally, the dataset includes 900,000 justifications for pairwise comparisons between candidate answers, further refining plausibility assessments. We evaluate PlausibleQA through human assessments and empirical experiments, demonstrating its utility in MCQA and QARA analysis. Our findings show that plausibility-aware approaches are effective for MCQA distractor generation and QARA. We release PlausibleQA as a resource for advancing QA research and enhancing LLM performance in distinguishing plausible distractors from correct answers.

* Submitted to SIGIR 2025

Via

Access Paper or Ask Questions

Detecting Future-related Contexts of Entity Mentions

Feb 21, 2025

Puneet Prashar, Krishna Mohan Shukla, Adam Jatowt

Figure 1 for Detecting Future-related Contexts of Entity Mentions

Figure 2 for Detecting Future-related Contexts of Entity Mentions

Figure 3 for Detecting Future-related Contexts of Entity Mentions

Figure 4 for Detecting Future-related Contexts of Entity Mentions

Abstract:The ability to automatically identify whether an entity is referenced in a future context can have multiple applications including decision making, planning and trend forecasting. This paper focuses on detecting implicit future references in entity-centric texts, addressing the growing need for automated temporal analysis in information processing. We first present a novel dataset of 19,540 sentences built around popular entities sourced from Wikipedia, which consists of future-related and non-future-related contexts in which those entities appear. As a second contribution, we evaluate the performance of several Language Models including also Large Language Models (LLMs) on the task of distinguishing future-oriented content in the absence of explicit temporal references.

Via

Access Paper or Ask Questions

Wisdom of the Crowds in Forecasting: Forecast Summarization for Supporting Future Event Prediction

Feb 12, 2025

Anisha Saha, Adam Jatowt

Abstract:Future Event Prediction (FEP) is an essential activity whose demand and application range across multiple domains. While traditional methods like simulations, predictive and time-series forecasting have demonstrated promising outcomes, their application in forecasting complex events is not entirely reliable due to the inability of numerical data to accurately capture the semantic information related to events. One forecasting way is to gather and aggregate collective opinions on the future to make predictions as cumulative perspectives carry the potential to help estimating the likelihood of upcoming events. In this work, we organize the existing research and frameworks that aim to support future event prediction based on crowd wisdom through aggregating individual forecasts. We discuss the challenges involved, available datasets, as well as the scope of improvement and future research directions for this task. We also introduce a novel data model to represent individual forecast statements.

Via

Access Paper or Ask Questions

Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation

Feb 04, 2025

Abdelrahman Abdallah, Jamshid Mozafari, Bhawna Piryani, Mohammed Ali, Adam Jatowt

Figure 1 for Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation

Figure 2 for Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation

Figure 3 for Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation

Figure 4 for Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation

Abstract:Retrieval, re-ranking, and retrieval-augmented generation (RAG) are critical components of modern natural language processing (NLP) applications in information retrieval, question answering, and knowledge-based text generation. However, existing solutions are often fragmented, lacking a unified framework that easily integrates these essential processes. The absence of a standardized implementation, coupled with the complexity of retrieval and re-ranking workflows, makes it challenging for researchers to compare and evaluate different approaches in a consistent environment. While existing toolkits such as Rerankers and RankLLM provide general-purpose reranking pipelines, they often lack the flexibility required for fine-grained experimentation and benchmarking. In response to these challenges, we introduce \textbf{Rankify}, a powerful and modular open-source toolkit designed to unify retrieval, re-ranking, and RAG within a cohesive framework. Rankify supports a wide range of retrieval techniques, including dense and sparse retrievers, while incorporating state-of-the-art re-ranking models to enhance retrieval quality. Additionally, Rankify includes a collection of pre-retrieved datasets to facilitate benchmarking, available at Huggingface (https://huggingface.co/datasets/abdoelsayed/reranking-datasets). To encourage adoption and ease of integration, we provide comprehensive documentation (http://rankify.readthedocs.io/), an open-source implementation on GitHub(https://github.com/DataScienceUIBK/rankify), and a PyPI package for effortless installation(https://pypi.org/project/rankify/). By providing a unified and lightweight framework, Rankify allows researchers and practitioners to advance retrieval and re-ranking methodologies while ensuring consistency, scalability, and ease of use.

* Work in Progress

Via

Access Paper or Ask Questions