Alert button
Picture for Gareth J. F. Jones

Gareth J. F. Jones

Alert button

Examining the Potential for Conversational Exploratory Search using a Smart Speaker Digital Assistant

Mar 18, 2023
Abhishek Kaushik, Gareth J. F. Jones

Figure 1 for Examining the Potential for Conversational Exploratory Search using a Smart Speaker Digital Assistant
Figure 2 for Examining the Potential for Conversational Exploratory Search using a Smart Speaker Digital Assistant
Figure 3 for Examining the Potential for Conversational Exploratory Search using a Smart Speaker Digital Assistant
Figure 4 for Examining the Potential for Conversational Exploratory Search using a Smart Speaker Digital Assistant

Online Digital Assistants, such as Amazon Alexa, Google Assistant, Apple Siri are very popular and provide a range or services to their users, a key function is their ability to satisfy user information needs from the sources available to them. Users may often regard these applications as providing search services similar to Google type search engines. However, while it is clear that they are in general able to answer factoid questions effectively, it is much less obvious how well they support less specific or exploratory type search tasks. We describe an investigation examining the behaviour of the standard Amazon Alexa for exploratory search tasks. The results of our study show that it not effective in addressing these types of information needs. We propose extensions to Alexa designed to overcome these shortcomings. Our Custom Alexa application extends Alexa's conversational functionality for exploratory search. A user study shows that our extended Alexa application both enables users to more successfully complete exploratory search tasks and is well accepted by our test users.

* Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - HUCAPP, ISBN 978-989-758-634-7; ISSN 2184-4321, SciTePress, pages 305-317, 2023  
Viaarxiv icon

Achieving Reliable Human Assessment of Open-Domain Dialogue Systems

Mar 11, 2022
Tianbo Ji, Yvette Graham, Gareth J. F. Jones, Chenyang Lyu, Qun Liu

Figure 1 for Achieving Reliable Human Assessment of Open-Domain Dialogue Systems
Figure 2 for Achieving Reliable Human Assessment of Open-Domain Dialogue Systems
Figure 3 for Achieving Reliable Human Assessment of Open-Domain Dialogue Systems
Figure 4 for Achieving Reliable Human Assessment of Open-Domain Dialogue Systems

Evaluation of open-domain dialogue systems is highly challenging and development of better techniques is highlighted time and again as desperately needed. Despite substantial efforts to carry out reliable live evaluation of systems in recent competitions, annotations have been abandoned and reported as too unreliable to yield sensible results. This is a serious problem since automatic metrics are not known to provide a good indication of what may or may not be a high-quality conversation. Answering the distress call of competitions that have emphasized the urgent need for better evaluation techniques in dialogue, we present the successful development of human evaluation that is highly reliable while still remaining feasible and low cost. Self-replication experiments reveal almost perfectly repeatable results with a correlation of $r=0.969$. Furthermore, due to the lack of appropriate methods of statistical significance testing, the likelihood of potential improvements to systems occurring due to chance is rarely taken into account in dialogue evaluation, and the evaluation we propose facilitates application of standard tests. Since we have developed a highly reliable evaluation method, new insights into system performance can be revealed. We therefore include a comparison of state-of-the-art models (i) with and without personas, to measure the contribution of personas to conversation quality, as well as (ii) prescribed versus freely chosen topics. Interestingly with respect to personas, results indicate that personas do not positively contribute to conversation quality as expected.

* to appear at ACL 2022 main conference 
Viaarxiv icon

Translation Quality Assessment: A Brief Survey on Manual and Automatic Methods

May 05, 2021
Lifeng Han, Gareth J. F. Jones, Alan F. Smeaton

Figure 1 for Translation Quality Assessment: A Brief Survey on Manual and Automatic Methods
Figure 2 for Translation Quality Assessment: A Brief Survey on Manual and Automatic Methods

To facilitate effective translation modeling and translation studies, one of the crucial questions to address is how to assess translation quality. From the perspectives of accuracy, reliability, repeatability and cost, translation quality assessment (TQA) itself is a rich and challenging task. In this work, we present a high-level and concise survey of TQA methods, including both manual judgement criteria and automated evaluation metrics, which we classify into further detailed sub-categories. We hope that this work will be an asset for both translation model researchers and quality assessment researchers. In addition, we hope that it will enable practitioners to quickly develop a better understanding of the conventional TQA field, and to find corresponding closely relevant evaluation solutions for their own needs. This work may also serve inspire further development of quality assessment and evaluation methodologies for other natural language processing (NLP) tasks in addition to machine translation (MT), such as automatic text summarization (ATS), natural language understanding (NLU) and natural language generation (NLG).

* Accepted to 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021): Workshop on Modelling Translation: Translatology in the Digital Age (MoTra21). arXiv admin note: substantial text overlap with arXiv:1605.04515 
Viaarxiv icon

TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains

Apr 27, 2021
George Awad, Asad A. Butt, Keith Curtis, Jonathan Fiscus, Afzal Godil, Yooyoung Lee, Andrew Delgado, Jesse Zhang, Eliot Godard, Baptiste Chocot, Lukas Diduch, Jeffrey Liu, Alan F. Smeaton, Yvette Graham, Gareth J. F. Jones, Wessel Kraaij, Georges Quenot

Figure 1 for TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains
Figure 2 for TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains
Figure 3 for TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains
Figure 4 for TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains

The TREC Video Retrieval Evaluation (TRECVID) is a TREC-style video analysis and retrieval evaluation with the goal of promoting progress in research and development of content-based exploitation and retrieval of information from digital video via open, metrics-based evaluation. Over the last twenty years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID has been funded by NIST (National Institute of Standards and Technology) and other US government agencies. In addition, many organizations and individuals worldwide contribute significant time and effort. TRECVID 2020 represented a continuation of four tasks and the addition of two new tasks. In total, 29 teams from various research organizations worldwide completed one or more of the following six tasks: 1. Ad-hoc Video Search (AVS), 2. Instance Search (INS), 3. Disaster Scene Description and Indexing (DSDI), 4. Video to Text Description (VTT), 5. Activities in Extended Video (ActEV), 6. Video Summarization (VSUM). This paper is an introduction to the evaluation framework, tasks, data, and measures used in the evaluation campaign.

* TRECVID 2020 Workshop Overview Paper. arXiv admin note: substantial text overlap with arXiv:2009.09984 
Viaarxiv icon

Exploring Current User Web Search Behaviours in Analysis Tasks to be Supported in Conversational Search

Apr 09, 2021
Abhishek Kaushik, Gareth J. F. Jones

Figure 1 for Exploring Current User Web Search Behaviours in Analysis Tasks to be Supported in Conversational Search
Figure 2 for Exploring Current User Web Search Behaviours in Analysis Tasks to be Supported in Conversational Search
Figure 3 for Exploring Current User Web Search Behaviours in Analysis Tasks to be Supported in Conversational Search
Figure 4 for Exploring Current User Web Search Behaviours in Analysis Tasks to be Supported in Conversational Search

Conversational search presents opportunities to support users in their search activities to improve the effectiveness and efficiency of search while reducing their cognitive load. Limitations of the potential competency of conversational agents restrict the situations for which conversational search agents can replace human intermediaries. It is thus more interesting, initially at least, to investigate opportunities for conversational interaction to support less complex information retrieval tasks, such as typical web search, which do not require human-level intelligence in the conversational agent. In order to move towards the development of a system to enable conversational search of this type, we need to understand their required capabilities. To progress our understanding of these, we report a study examining the behaviour of users when using a standard web search engine, designed to enable us to identify opportunities to support their search activities using a conversational agent.

* Accepted in SIGIR 2018 Second International Workshop on Conversational Approaches to Information Retrieval (CAIR 18), July 12, 2018, Ann Arbor Michigan, USA 
Viaarxiv icon

Chinese Character Decomposition for Neural MT with Multi-Word Expressions

Apr 09, 2021
Lifeng Han, Gareth J. F. Jones, Alan F. Smeaton, Paolo Bolzoni

Figure 1 for Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Figure 2 for Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Figure 3 for Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Figure 4 for Chinese Character Decomposition for Neural MT with Multi-Word Expressions

Chinese character decomposition has been used as a feature to enhance Machine Translation (MT) models, combining radicals into character and word level models. Recent work has investigated ideograph or stroke level embedding. However, questions remain about different decomposition levels of Chinese character representations, radical and strokes, best suited for MT. To investigate the impact of Chinese decomposition embedding in detail, i.e., radical, stroke, and intermediate levels, and how well these decompositions represent the meaning of the original character sequences, we carry out analysis with both automated and human evaluation of MT. Furthermore, we investigate if the combination of decomposed Multiword Expressions (MWEs) can enhance the model learning. MWE integration into MT has seen more than a decade of exploration. However, decomposed MWEs has not previously been explored.

* Accepted to publish in NoDaLiDa2021 
Viaarxiv icon

A Conceptual Framework for Implicit Evaluation of Conversational Search Interfaces

Apr 08, 2021
Abhishek Kaushik, Gareth J. F. Jones

Figure 1 for A Conceptual Framework for Implicit Evaluation of Conversational Search Interfaces
Figure 2 for A Conceptual Framework for Implicit Evaluation of Conversational Search Interfaces
Figure 3 for A Conceptual Framework for Implicit Evaluation of Conversational Search Interfaces
Figure 4 for A Conceptual Framework for Implicit Evaluation of Conversational Search Interfaces

Conversational search (CS) has recently become a significant focus of the information retrieval (IR) research community. Multiple studies have been conducted which explore the concept of conversational search. Understanding and advancing research in CS requires careful and detailed evaluation. Existing CS studies have been limited to evaluation based on simple user feedback on task completion. We propose a CS evaluation framework which includes multiple dimensions: search experience, knowledge gain, software usability, cognitive load and user experience, based on studies of conversational systems and IR. We introduce these evaluation criteria and propose their use in a framework for the evaluation of CS systems.

* Accepted in MICROS (Mixed-Initiative ConveRsatiOnal Systems) Workshop at 43rd European Conference on Information Retrieval 
Viaarxiv icon

TREC 2020 Podcasts Track Overview

Mar 29, 2021
Rosie Jones, Ben Carterette, Ann Clifton, Maria Eskevich, Gareth J. F. Jones, Jussi Karlgren, Aasish Pappu, Sravana Reddy, Yongze Yu

Figure 1 for TREC 2020 Podcasts Track Overview
Figure 2 for TREC 2020 Podcasts Track Overview
Figure 3 for TREC 2020 Podcasts Track Overview
Figure 4 for TREC 2020 Podcasts Track Overview

The Podcast Track is new at the Text Retrieval Conference (TREC) in 2020. The podcast track was designed to encourage research into podcasts in the information retrieval and NLP research communities. The track consisted of two shared tasks: segment retrieval and summarization, both based on a dataset of over 100,000 podcast episodes (metadata, audio, and automatic transcripts) which was released concurrently with the track. The track generated considerable interest, attracted hundreds of new registrations to TREC and fifteen teams, mostly disjoint between search and summarization, made final submissions for assessment. Deep learning was the dominant experimental approach for both search experiments and summarization. This paper gives an overview of the tasks and the results of the participants' experiments. The track will return to TREC 2021 with the same two tasks, incorporating slight modifications in response to participant feedback.

* The Proceedings of the Twenty-Ninth Text REtrieval Conference Proceedings (TREC 2020)  
Viaarxiv icon

Response to LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts

Jun 04, 2020
Hao Wu, Gareth J. F. Jones, Francois Pitie

Figure 1 for Response to LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts
Figure 2 for Response to LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts
Figure 3 for Response to LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts
Figure 4 for Response to LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts

Live video commenting systems are an emerging feature of online video sites. Recently the Chinese video sharing platform Bilibili, has popularised a novel captioning system where user comments are displayed as streams of moving subtitles overlaid on the video playback screen and broadcast to all viewers in real-time. LiveBot was recently introduced as a novel Automatic Live Video Commenting (ALVC) application. This enables the automatic generation of live video comments from both the existing video stream and existing viewers comments. In seeking to reproduce the baseline results reported in the original Livebot paper, we found differences between the reproduced results using the project codebase and the numbers reported in the paper. Further examination of this situation suggests that this may be caused by a number of small issues in the project code, including a non-obvious overlap between the training and test sets. In this paper, we study these discrepancies in detail and propose an alternative baseline implementation as a reference for other researchers in this field.

* 4 pages, 2 figures 
Viaarxiv icon

MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora

May 21, 2020
Lifeng Han, Gareth J. F. Jones, Alan F. Smeaton

Figure 1 for MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
Figure 2 for MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
Figure 3 for MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
Figure 4 for MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora

Multi-word expressions (MWEs) are a hot topic in research in natural language processing (NLP), including topics such as MWE detection, MWE decomposition, and research investigating the exploitation of MWEs in other NLP fields such as Machine Translation. However, the availability of bilingual or multi-lingual MWE corpora is very limited. The only bilingual MWE corpora that we are aware of is from the PARSEME (PARSing and Multi-word Expressions) EU Project. This is a small collection of only 871 pairs of English-German MWEs. In this paper, we present multi-lingual and bilingual MWE corpora that we have extracted from root parallel corpora. Our collections are 3,159,226 and 143,042 bilingual MWE pairs for German-English and Chinese-English respectively after filtering. We examine the quality of these extracted bilingual MWEs in MT experiments. Our initial experiments applying MWEs in MT show improved translation performances on MWE terms in qualitative analysis and better general evaluation scores in quantitative analysis, on both German-English and Chinese-English language pairs. We follow a standard experimental pipeline to create our MultiMWE corpora which are available online. Researchers can use this free corpus for their own models or use them in a knowledge base as model features.

* Accepted to LREC2020 
Viaarxiv icon