Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gareth J. F. Jones

Report on the Workshop on Simulations for Information Access (Sim4IA 2024) at SIGIR 2024

Sep 26, 2024

Timo Breuer, Christin Katharina Kreutz, Norbert Fuhr, Krisztian Balog, Philipp Schaer, Nolwenn Bernard, Ingo Frommholz, Marcel Gohsen, Kaixin Ji, Gareth J. F. Jones(+8 more)

Abstract:This paper is a report of the Workshop on Simulations for Information Access (Sim4IA) workshop at SIGIR 2024. The workshop had two keynotes, a panel discussion, nine lightning talks, and two breakout sessions. Key takeaways were user simulation's importance in academia and industry, the possible bridging of online and offline evaluation, and the issues of organizing a companion shared task around user simulations for information access. We report on how we organized the workshop, provide a brief overview of what happened at the workshop, and summarize the main topics and findings of the workshop and future work.

* Preprint of a SIGIR Forum submission for Vol. 58 No. 2 - December 2024

Via

Access Paper or Ask Questions

Examining the Potential for Conversational Exploratory Search using a Smart Speaker Digital Assistant

Mar 18, 2023

Abhishek Kaushik, Gareth J. F. Jones

Figure 1 for Examining the Potential for Conversational Exploratory Search using a Smart Speaker Digital Assistant

Figure 2 for Examining the Potential for Conversational Exploratory Search using a Smart Speaker Digital Assistant

Figure 3 for Examining the Potential for Conversational Exploratory Search using a Smart Speaker Digital Assistant

Figure 4 for Examining the Potential for Conversational Exploratory Search using a Smart Speaker Digital Assistant

Abstract:Online Digital Assistants, such as Amazon Alexa, Google Assistant, Apple Siri are very popular and provide a range or services to their users, a key function is their ability to satisfy user information needs from the sources available to them. Users may often regard these applications as providing search services similar to Google type search engines. However, while it is clear that they are in general able to answer factoid questions effectively, it is much less obvious how well they support less specific or exploratory type search tasks. We describe an investigation examining the behaviour of the standard Amazon Alexa for exploratory search tasks. The results of our study show that it not effective in addressing these types of information needs. We propose extensions to Alexa designed to overcome these shortcomings. Our Custom Alexa application extends Alexa's conversational functionality for exploratory search. A user study shows that our extended Alexa application both enables users to more successfully complete exploratory search tasks and is well accepted by our test users.

* Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - HUCAPP, ISBN 978-989-758-634-7; ISSN 2184-4321, SciTePress, pages 305-317, 2023

Via

Access Paper or Ask Questions

Achieving Reliable Human Assessment of Open-Domain Dialogue Systems

Mar 11, 2022

Tianbo Ji, Yvette Graham, Gareth J. F. Jones, Chenyang Lyu, Qun Liu

Figure 1 for Achieving Reliable Human Assessment of Open-Domain Dialogue Systems

Figure 2 for Achieving Reliable Human Assessment of Open-Domain Dialogue Systems

Figure 3 for Achieving Reliable Human Assessment of Open-Domain Dialogue Systems

Figure 4 for Achieving Reliable Human Assessment of Open-Domain Dialogue Systems

Abstract:Evaluation of open-domain dialogue systems is highly challenging and development of better techniques is highlighted time and again as desperately needed. Despite substantial efforts to carry out reliable live evaluation of systems in recent competitions, annotations have been abandoned and reported as too unreliable to yield sensible results. This is a serious problem since automatic metrics are not known to provide a good indication of what may or may not be a high-quality conversation. Answering the distress call of competitions that have emphasized the urgent need for better evaluation techniques in dialogue, we present the successful development of human evaluation that is highly reliable while still remaining feasible and low cost. Self-replication experiments reveal almost perfectly repeatable results with a correlation of $r=0.969$. Furthermore, due to the lack of appropriate methods of statistical significance testing, the likelihood of potential improvements to systems occurring due to chance is rarely taken into account in dialogue evaluation, and the evaluation we propose facilitates application of standard tests. Since we have developed a highly reliable evaluation method, new insights into system performance can be revealed. We therefore include a comparison of state-of-the-art models (i) with and without personas, to measure the contribution of personas to conversation quality, as well as (ii) prescribed versus freely chosen topics. Interestingly with respect to personas, results indicate that personas do not positively contribute to conversation quality as expected.

* to appear at ACL 2022 main conference

Via

Access Paper or Ask Questions

Translation Quality Assessment: A Brief Survey on Manual and Automatic Methods

May 05, 2021

Lifeng Han, Gareth J. F. Jones, Alan F. Smeaton

Figure 1 for Translation Quality Assessment: A Brief Survey on Manual and Automatic Methods

Figure 2 for Translation Quality Assessment: A Brief Survey on Manual and Automatic Methods

Abstract:To facilitate effective translation modeling and translation studies, one of the crucial questions to address is how to assess translation quality. From the perspectives of accuracy, reliability, repeatability and cost, translation quality assessment (TQA) itself is a rich and challenging task. In this work, we present a high-level and concise survey of TQA methods, including both manual judgement criteria and automated evaluation metrics, which we classify into further detailed sub-categories. We hope that this work will be an asset for both translation model researchers and quality assessment researchers. In addition, we hope that it will enable practitioners to quickly develop a better understanding of the conventional TQA field, and to find corresponding closely relevant evaluation solutions for their own needs. This work may also serve inspire further development of quality assessment and evaluation methodologies for other natural language processing (NLP) tasks in addition to machine translation (MT), such as automatic text summarization (ATS), natural language understanding (NLU) and natural language generation (NLG).

* Accepted to 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021): Workshop on Modelling Translation: Translatology in the Digital Age (MoTra21). arXiv admin note: substantial text overlap with arXiv:1605.04515

Via

Access Paper or Ask Questions

TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains

Apr 27, 2021

George Awad, Asad A. Butt, Keith Curtis, Jonathan Fiscus, Afzal Godil, Yooyoung Lee, Andrew Delgado, Jesse Zhang, Eliot Godard, Baptiste Chocot(+7 more)

Figure 1 for TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains

Figure 2 for TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains

Figure 3 for TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains

Figure 4 for TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains

Abstract:The TREC Video Retrieval Evaluation (TRECVID) is a TREC-style video analysis and retrieval evaluation with the goal of promoting progress in research and development of content-based exploitation and retrieval of information from digital video via open, metrics-based evaluation. Over the last twenty years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID has been funded by NIST (National Institute of Standards and Technology) and other US government agencies. In addition, many organizations and individuals worldwide contribute significant time and effort. TRECVID 2020 represented a continuation of four tasks and the addition of two new tasks. In total, 29 teams from various research organizations worldwide completed one or more of the following six tasks: 1. Ad-hoc Video Search (AVS), 2. Instance Search (INS), 3. Disaster Scene Description and Indexing (DSDI), 4. Video to Text Description (VTT), 5. Activities in Extended Video (ActEV), 6. Video Summarization (VSUM). This paper is an introduction to the evaluation framework, tasks, data, and measures used in the evaluation campaign.

* TRECVID 2020 Workshop Overview Paper. arXiv admin note: substantial text overlap with arXiv:2009.09984

Via

Access Paper or Ask Questions

Exploring Current User Web Search Behaviours in Analysis Tasks to be Supported in Conversational Search

Apr 09, 2021

Abhishek Kaushik, Gareth J. F. Jones

Figure 1 for Exploring Current User Web Search Behaviours in Analysis Tasks to be Supported in Conversational Search

Figure 2 for Exploring Current User Web Search Behaviours in Analysis Tasks to be Supported in Conversational Search

Figure 3 for Exploring Current User Web Search Behaviours in Analysis Tasks to be Supported in Conversational Search

Figure 4 for Exploring Current User Web Search Behaviours in Analysis Tasks to be Supported in Conversational Search

Abstract:Conversational search presents opportunities to support users in their search activities to improve the effectiveness and efficiency of search while reducing their cognitive load. Limitations of the potential competency of conversational agents restrict the situations for which conversational search agents can replace human intermediaries. It is thus more interesting, initially at least, to investigate opportunities for conversational interaction to support less complex information retrieval tasks, such as typical web search, which do not require human-level intelligence in the conversational agent. In order to move towards the development of a system to enable conversational search of this type, we need to understand their required capabilities. To progress our understanding of these, we report a study examining the behaviour of users when using a standard web search engine, designed to enable us to identify opportunities to support their search activities using a conversational agent.

* Accepted in SIGIR 2018 Second International Workshop on Conversational Approaches to Information Retrieval (CAIR 18), July 12, 2018, Ann Arbor Michigan, USA

Via

Access Paper or Ask Questions

Chinese Character Decomposition for Neural MT with Multi-Word Expressions

Apr 09, 2021

Lifeng Han, Gareth J. F. Jones, Alan F. Smeaton, Paolo Bolzoni

Figure 1 for Chinese Character Decomposition for Neural MT with Multi-Word Expressions

Figure 2 for Chinese Character Decomposition for Neural MT with Multi-Word Expressions

Figure 3 for Chinese Character Decomposition for Neural MT with Multi-Word Expressions

Figure 4 for Chinese Character Decomposition for Neural MT with Multi-Word Expressions

Abstract:Chinese character decomposition has been used as a feature to enhance Machine Translation (MT) models, combining radicals into character and word level models. Recent work has investigated ideograph or stroke level embedding. However, questions remain about different decomposition levels of Chinese character representations, radical and strokes, best suited for MT. To investigate the impact of Chinese decomposition embedding in detail, i.e., radical, stroke, and intermediate levels, and how well these decompositions represent the meaning of the original character sequences, we carry out analysis with both automated and human evaluation of MT. Furthermore, we investigate if the combination of decomposed Multiword Expressions (MWEs) can enhance the model learning. MWE integration into MT has seen more than a decade of exploration. However, decomposed MWEs has not previously been explored.

* Accepted to publish in NoDaLiDa2021

Via

Access Paper or Ask Questions

A Conceptual Framework for Implicit Evaluation of Conversational Search Interfaces

Apr 08, 2021

Abhishek Kaushik, Gareth J. F. Jones

Figure 1 for A Conceptual Framework for Implicit Evaluation of Conversational Search Interfaces

Figure 2 for A Conceptual Framework for Implicit Evaluation of Conversational Search Interfaces

Figure 3 for A Conceptual Framework for Implicit Evaluation of Conversational Search Interfaces

Figure 4 for A Conceptual Framework for Implicit Evaluation of Conversational Search Interfaces

Abstract:Conversational search (CS) has recently become a significant focus of the information retrieval (IR) research community. Multiple studies have been conducted which explore the concept of conversational search. Understanding and advancing research in CS requires careful and detailed evaluation. Existing CS studies have been limited to evaluation based on simple user feedback on task completion. We propose a CS evaluation framework which includes multiple dimensions: search experience, knowledge gain, software usability, cognitive load and user experience, based on studies of conversational systems and IR. We introduce these evaluation criteria and propose their use in a framework for the evaluation of CS systems.

* Accepted in MICROS (Mixed-Initiative ConveRsatiOnal Systems) Workshop at 43rd European Conference on Information Retrieval

Via

Access Paper or Ask Questions

TREC 2020 Podcasts Track Overview

Mar 29, 2021

Rosie Jones, Ben Carterette, Ann Clifton, Maria Eskevich, Gareth J. F. Jones, Jussi Karlgren, Aasish Pappu, Sravana Reddy, Yongze Yu

Figure 1 for TREC 2020 Podcasts Track Overview

Figure 2 for TREC 2020 Podcasts Track Overview

Figure 3 for TREC 2020 Podcasts Track Overview

Figure 4 for TREC 2020 Podcasts Track Overview

Abstract:The Podcast Track is new at the Text Retrieval Conference (TREC) in 2020. The podcast track was designed to encourage research into podcasts in the information retrieval and NLP research communities. The track consisted of two shared tasks: segment retrieval and summarization, both based on a dataset of over 100,000 podcast episodes (metadata, audio, and automatic transcripts) which was released concurrently with the track. The track generated considerable interest, attracted hundreds of new registrations to TREC and fifteen teams, mostly disjoint between search and summarization, made final submissions for assessment. Deep learning was the dominant experimental approach for both search experiments and summarization. This paper gives an overview of the tasks and the results of the participants' experiments. The track will return to TREC 2021 with the same two tasks, incorporating slight modifications in response to participant feedback.

* The Proceedings of the Twenty-Ninth Text REtrieval Conference Proceedings (TREC 2020)

Via

Access Paper or Ask Questions

Response to LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts

Jun 04, 2020

Hao Wu, Gareth J. F. Jones, Francois Pitie

Figure 1 for Response to LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts

Figure 2 for Response to LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts

Figure 3 for Response to LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts

Figure 4 for Response to LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts

Abstract:Live video commenting systems are an emerging feature of online video sites. Recently the Chinese video sharing platform Bilibili, has popularised a novel captioning system where user comments are displayed as streams of moving subtitles overlaid on the video playback screen and broadcast to all viewers in real-time. LiveBot was recently introduced as a novel Automatic Live Video Commenting (ALVC) application. This enables the automatic generation of live video comments from both the existing video stream and existing viewers comments. In seeking to reproduce the baseline results reported in the original Livebot paper, we found differences between the reproduced results using the project codebase and the numbers reported in the paper. Further examination of this situation suggests that this may be caused by a number of small issues in the project code, including a non-obvious overlap between the training and test sets. In this paper, we study these discrepancies in detail and propose an alternative baseline implementation as a reference for other researchers in this field.

* 4 pages, 2 figures

Via

Access Paper or Ask Questions