Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

José Lopes

Going for GOAL: A Resource for Grounded Football Commentaries

Nov 08, 2022

Alessandro Suglia, José Lopes, Emanuele Bastianelli, Andrea Vanzo, Shubham Agarwal, Malvina Nikandrou, Lu Yu, Ioannis Konstas, Verena Rieser

Figure 1 for Going for GOAL: A Resource for Grounded Football Commentaries

Figure 2 for Going for GOAL: A Resource for Grounded Football Commentaries

Figure 3 for Going for GOAL: A Resource for Grounded Football Commentaries

Figure 4 for Going for GOAL: A Resource for Grounded Football Commentaries

Abstract:Recent video+language datasets cover domains where the interaction is highly structured, such as instructional videos, or where the interaction is scripted, such as TV shows. Both of these properties can lead to spurious cues to be exploited by models rather than learning to ground language. In this paper, we present GrOunded footbAlL commentaries (GOAL), a novel dataset of football (or `soccer') highlights videos with transcribed live commentaries in English. As the course of a game is unpredictable, so are commentaries, which makes them a unique resource to investigate dynamic language grounding. We also provide state-of-the-art baselines for the following tasks: frame reordering, moment retrieval, live commentary retrieval and play-by-play live commentary generation. Results show that SOTA models perform reasonably well in most tasks. We discuss the implications of these results and suggest new tasks for which GOAL can be used. Our codebase is available at: https://gitlab.com/grounded-sport-convai/goal-baselines.

* Preprint formatted using the ACM Multimedia template (8 pages + appendix)

Via

Access Paper or Ask Questions

Exploring Multi-Modal Representations for Ambiguity Detection & Coreference Resolution in the SIMMC 2.0 Challenge

Feb 25, 2022

Francisco Javier Chiyah-Garcia, Alessandro Suglia, José Lopes, Arash Eshghi, Helen Hastie

Figure 1 for Exploring Multi-Modal Representations for Ambiguity Detection & Coreference Resolution in the SIMMC 2.0 Challenge

Figure 2 for Exploring Multi-Modal Representations for Ambiguity Detection & Coreference Resolution in the SIMMC 2.0 Challenge

Figure 3 for Exploring Multi-Modal Representations for Ambiguity Detection & Coreference Resolution in the SIMMC 2.0 Challenge

Figure 4 for Exploring Multi-Modal Representations for Ambiguity Detection & Coreference Resolution in the SIMMC 2.0 Challenge

Abstract:Anaphoric expressions, such as pronouns and referential descriptions, are situated with respect to the linguistic context of prior turns, as well as, the immediate visual environment. However, a speaker's referential descriptions do not always uniquely identify the referent, leading to ambiguities in need of resolution through subsequent clarificational exchanges. Thus, effective Ambiguity Detection and Coreference Resolution are key to task success in Conversational AI. In this paper, we present models for these two tasks as part of the SIMMC 2.0 Challenge (Kottur et al. 2021). Specifically, we use TOD-BERT and LXMERT based models, compare them to a number of baselines and provide ablation experiments. Our results show that (1) language models are able to exploit correlations in the data to detect ambiguity; and (2) unimodal coreference resolution models can avoid the need for a vision component, through the use of smart object representations.

* Accepted to AAAI 2022 DSTC10 Workshop

Via

Access Paper or Ask Questions

Domain Adaptation in Dialogue Systems using Transfer and Meta-Learning

Feb 22, 2021

Rui Ribeiro, Alberto Abad, José Lopes

Figure 1 for Domain Adaptation in Dialogue Systems using Transfer and Meta-Learning

Figure 2 for Domain Adaptation in Dialogue Systems using Transfer and Meta-Learning

Figure 3 for Domain Adaptation in Dialogue Systems using Transfer and Meta-Learning

Figure 4 for Domain Adaptation in Dialogue Systems using Transfer and Meta-Learning

Abstract:Current generative-based dialogue systems are data-hungry and fail to adapt to new unseen domains when only a small amount of target data is available. Additionally, in real-world applications, most domains are underrepresented, so there is a need to create a system capable of generalizing to these domains using minimal data. In this paper, we propose a method that adapts to unseen domains by combining both transfer and meta-learning (DATML). DATML improves the previous state-of-the-art dialogue model, DiKTNet, by introducing a different learning technique: meta-learning. We use Reptile, a first-order optimization-based meta-learning algorithm as our improved training method. We evaluated our model on the MultiWOZ dataset and outperformed DiKTNet in both BLEU and Entity F1 scores when the same amount of data is available.

* 5 pages, 2 figures, accepted at IberSPEECH 2020

Via

Access Paper or Ask Questions

The Lab vs The Crowd: An Investigation into Data Quality for Neural Dialogue Models

Dec 07, 2020

José Lopes, Francisco J. Chiyah Garcia, Helen Hastie

Figure 1 for The Lab vs The Crowd: An Investigation into Data Quality for Neural Dialogue Models

Figure 2 for The Lab vs The Crowd: An Investigation into Data Quality for Neural Dialogue Models

Figure 3 for The Lab vs The Crowd: An Investigation into Data Quality for Neural Dialogue Models

Figure 4 for The Lab vs The Crowd: An Investigation into Data Quality for Neural Dialogue Models

Abstract:Challenges around collecting and processing quality data have hampered progress in data-driven dialogue models. Previous approaches are moving away from costly, resource-intensive lab settings, where collection is slow but where the data is deemed of high quality. The advent of crowd-sourcing platforms, such as Amazon Mechanical Turk, has provided researchers with an alternative cost-effective and rapid way to collect data. However, the collection of fluid, natural spoken or textual interaction can be challenging, particularly between two crowd-sourced workers. In this study, we compare the performance of dialogue models for the same interaction task but collected in two different settings: in the lab vs. crowd-sourced. We find that fewer lab dialogues are needed to reach similar accuracy, less than half the amount of lab data as crowd-sourced data. We discuss the advantages and disadvantages of each data collection method.

* Accepted at Human in the Loop Dialogue Systems Workshop @NeurIPS 2020

Via

Access Paper or Ask Questions

CRWIZ: A Framework for Crowdsourcing Real-Time Wizard-of-Oz Dialogues

Mar 12, 2020

Francisco J. Chiyah Garcia, José Lopes, Xingkun Liu, Helen Hastie

Figure 1 for CRWIZ: A Framework for Crowdsourcing Real-Time Wizard-of-Oz Dialogues

Figure 2 for CRWIZ: A Framework for Crowdsourcing Real-Time Wizard-of-Oz Dialogues

Figure 3 for CRWIZ: A Framework for Crowdsourcing Real-Time Wizard-of-Oz Dialogues

Figure 4 for CRWIZ: A Framework for Crowdsourcing Real-Time Wizard-of-Oz Dialogues

Abstract:Large corpora of task-based and open-domain conversational dialogues are hugely valuable in the field of data-driven dialogue systems. Crowdsourcing platforms, such as Amazon Mechanical Turk, have been an effective method for collecting such large amounts of data. However, difficulties arise when task-based dialogues require expert domain knowledge or rapid access to domain-relevant information, such as databases for tourism. This will become even more prevalent as dialogue systems become increasingly ambitious, expanding into tasks with high levels of complexity that require collaboration and forward planning, such as in our domain of emergency response. In this paper, we propose CRWIZ: a framework for collecting real-time Wizard of Oz dialogues through crowdsourcing for collaborative, complex tasks. This framework uses semi-guided dialogue to avoid interactions that breach procedures and processes only known to experts, while enabling the capture of a wide variety of interactions. The framework is available at https://github.com/JChiyah/crwiz

* 10 pages, 5 figures. To Appear in LREC 2020

Via

Access Paper or Ask Questions

Natural Language Interaction to Facilitate Mental Models of Remote Robots

Mar 12, 2020

Francisco J. Chiyah Garcia, José Lopes, Helen Hastie

Figure 1 for Natural Language Interaction to Facilitate Mental Models of Remote Robots

Figure 2 for Natural Language Interaction to Facilitate Mental Models of Remote Robots

Abstract:Increasingly complex and autonomous robots are being deployed in real-world environments with far-reaching consequences. High-stakes scenarios, such as emergency response or offshore energy platform and nuclear inspections, require robot operators to have clear mental models of what the robots can and can't do. However, operators are often not the original designers of the robots and thus, they do not necessarily have such clear mental models, especially if they are novice users. This lack of mental model clarity can slow adoption and can negatively impact human-machine teaming. We propose that interaction with a conversational assistant, who acts as a mediator, can help the user with understanding the functionality of remote robots and increase transparency through natural language explanations, as well as facilitate the evaluation of operators' mental models.

* In Workshop on Mental Models of Robots at HRI 2020

Via

Access Paper or Ask Questions

Challenges in Collaborative HRI for Remote Robot Teams

May 17, 2019

Helen Hastie, David A. Robb, José Lopes, Muneeb Ahmad, Pierre Le Bras, Xingkun Liu, Ronald P. A. Petrick, Katrin Lohan, Mike J. Chantler

Figure 1 for Challenges in Collaborative HRI for Remote Robot Teams

Figure 2 for Challenges in Collaborative HRI for Remote Robot Teams

Figure 3 for Challenges in Collaborative HRI for Remote Robot Teams

Figure 4 for Challenges in Collaborative HRI for Remote Robot Teams

Abstract:Collaboration between human supervisors and remote teams of robots is highly challenging, particularly in high-stakes, distant, hazardous locations, such as off-shore energy platforms. In order for these teams of robots to truly be beneficial, they need to be trusted to operate autonomously, performing tasks such as inspection and emergency response, thus reducing the number of personnel placed in harm's way. As remote robots are generally trusted less than robots in close-proximity, we present a solution to instil trust in the operator through a `mediator robot' that can exhibit social skills, alongside sophisticated visualisation techniques. In this position paper, we present general challenges and then take a closer look at one challenge in particular, discussing an initial study, which investigates the relationship between the level of control the supervisor hands over to the mediator robot and how this affects their trust. We show that the supervisor is more likely to have higher trust overall if their initial experience involves handing over control of the emergency situation to the robotic assistant. We discuss this result, here, as well as other challenges and interaction techniques for human-robot collaboration.

* 9 pages. Peer reviewed position paper accepted in the CHI 2019 Workshop: The Challenges of Working on Social Robots that Collaborate with People (SIRCHI2019), ACM CHI Conference on Human Factors in Computing Systems, May 2019, Glasgow, UK

Via

Access Paper or Ask Questions

The Spot the Difference corpus: a multi-modal corpus of spontaneous task oriented spoken interactions

May 14, 2018

José Lopes, Nils Hemmingsson, Oliver Åstrand

Figure 1 for The Spot the Difference corpus: a multi-modal corpus of spontaneous task oriented spoken interactions

Figure 2 for The Spot the Difference corpus: a multi-modal corpus of spontaneous task oriented spoken interactions

Figure 3 for The Spot the Difference corpus: a multi-modal corpus of spontaneous task oriented spoken interactions

Figure 4 for The Spot the Difference corpus: a multi-modal corpus of spontaneous task oriented spoken interactions

Abstract:This paper describes the Spot the Difference Corpus which contains 54 interactions between pairs of subjects interacting to find differences in two very similar scenes. The setup used, the participants' metadata and details about collection are described. We are releasing this corpus of task-oriented spontaneous dialogues. This release includes rich transcriptions, annotations, audio and video. We believe that this dataset constitutes a valuable resource to study several dimensions of human communication that go from turn-taking to the study of referring expressions. In our preliminary analyses we have looked at task success (how many differences were found out of the total number of differences) and how it evolves over time. In addition we have looked at scene complexity provided by the RGB components' entropy and how it could relate to speech overlaps, interruptions and the expression of uncertainty. We found there is a tendency that more complex scenes have more competitive interruptions.

* Proceedings of the Language Evaluation and Resources Conference (LREC) 2018

Via

Access Paper or Ask Questions

Assessing User Expertise in Spoken Dialog System Interactions

Jan 18, 2017

Eugénio Ribeiro, Fernando Batista, Isabel Trancoso, José Lopes, Ricardo Ribeiro, David Martins de Matos

Figure 1 for Assessing User Expertise in Spoken Dialog System Interactions

Figure 2 for Assessing User Expertise in Spoken Dialog System Interactions

Figure 3 for Assessing User Expertise in Spoken Dialog System Interactions

Abstract:Identifying the level of expertise of its users is important for a system since it can lead to a better interaction through adaptation techniques. Furthermore, this information can be used in offline processes of root cause analysis. However, not much effort has been put into automatically identifying the level of expertise of an user, especially in dialog-based interactions. In this paper we present an approach based on a specific set of task related features. Based on the distribution of the features among the two classes - Novice and Expert - we used Random Forests as a classification approach. Furthermore, we used a Support Vector Machine classifier, in order to perform a result comparison. By applying these approaches on data from a real system, Let's Go, we obtained preliminary results that we consider positive, given the difficulty of the task and the lack of competing approaches for comparison.

* Advances in Speech and Language Technologies for Iberian Languages: Third International Conference, IberSPEECH 2016, Lisbon, Portugal, November 23-25, pp. 245-254
* 10 pages

Via

Access Paper or Ask Questions