Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Seokhwan Kim

Beyond Domain APIs: Task-oriented Conversational Modeling with Unstructured Knowledge Access

Jun 05, 2020

Seokhwan Kim, Mihail Eric, Karthik Gopalakrishnan, Behnam Hedayatnia, Yang Liu, Dilek Hakkani-Tur

Figure 1 for Beyond Domain APIs: Task-oriented Conversational Modeling with Unstructured Knowledge Access

Figure 2 for Beyond Domain APIs: Task-oriented Conversational Modeling with Unstructured Knowledge Access

Figure 3 for Beyond Domain APIs: Task-oriented Conversational Modeling with Unstructured Knowledge Access

Figure 4 for Beyond Domain APIs: Task-oriented Conversational Modeling with Unstructured Knowledge Access

Abstract:Most prior work on task-oriented dialogue systems are restricted to a limited coverage of domain APIs, while users oftentimes have domain related requests that are not covered by the APIs. In this paper, we propose to expand coverage of task-oriented dialogue systems by incorporating external unstructured knowledge sources. We define three sub-tasks: knowledge-seeking turn detection, knowledge selection, and knowledge-grounded response generation, which can be modeled individually or jointly. We introduce an augmented version of MultiWOZ 2.1, which includes new out-of-API-coverage turns and responses grounded on external knowledge sources. We present baselines for each sub-task using both conventional and neural approaches. Our experimental results demonstrate the need for further research in this direction to enable more informative conversational systems.

* To be presented at SIGDIAL 2020

Via

Access Paper or Ask Questions

TutorialVQA: Question Answering Dataset for Tutorial Videos

Dec 02, 2019

Anthony Colas, Seokhwan Kim, Franck Dernoncourt, Siddhesh Gupte, Daisy Zhe Wang, Doo Soon Kim

Figure 1 for TutorialVQA: Question Answering Dataset for Tutorial Videos

Figure 2 for TutorialVQA: Question Answering Dataset for Tutorial Videos

Figure 3 for TutorialVQA: Question Answering Dataset for Tutorial Videos

Figure 4 for TutorialVQA: Question Answering Dataset for Tutorial Videos

Abstract:Despite the number of currently available datasets on video question answering, there still remains a need for a dataset involving multi-step and non-factoid answers. Moreover, relying on video transcripts remains an under-explored topic. To adequately address this, We propose a new question answering task on instructional videos, because of their verbose and narrative nature. While previous studies on video question answering have focused on generating a short text as an answer, given a question and video clip, our task aims to identify a span of a video segment as an answer which contains instructional details with various granularities. This work focuses on screencast tutorial videos pertaining to an image editing program. We introduce a dataset, TutorialVQA, consisting of about 6,000manually collected triples of (video, question, answer span). We also provide experimental results with several baselines algorithms using the video transcripts. The results indicate that the task is challenging and call for the investigation of new algorithms.

Via

Access Paper or Ask Questions

Just Ask:An Interactive Learning Framework for Vision and Language Navigation

Dec 02, 2019

Ta-Chung Chi, Mihail Eric, Seokhwan Kim, Minmin Shen, Dilek Hakkani-tur

Figure 1 for Just Ask:An Interactive Learning Framework for Vision and Language Navigation

Figure 2 for Just Ask:An Interactive Learning Framework for Vision and Language Navigation

Figure 3 for Just Ask:An Interactive Learning Framework for Vision and Language Navigation

Figure 4 for Just Ask:An Interactive Learning Framework for Vision and Language Navigation

Abstract:In the vision and language navigation task, the agent may encounter ambiguous situations that are hard to interpret by just relying on visual information and natural language instructions. We propose an interactive learning framework to endow the agent with the ability to ask for users' help in such situations. As part of this framework, we investigate multiple learning approaches for the agent with different levels of complexity. The simplest model-confusion-based method lets the agent ask questions based on its confusion, relying on the predefined confidence threshold of a next action prediction model. To build on this confusion-based method, the agent is expected to demonstrate more sophisticated reasoning such that it discovers the timing and locations to interact with a human. We achieve this goal using reinforcement learning (RL) with a proposed reward shaping term, which enables the agent to ask questions only when necessary. The success rate can be boosted by at least 15% with only one question asked on average during the navigation. Furthermore, we show that the RL agent is capable of adjusting dynamically to noisy human responses. Finally, we design a continual learning strategy, which can be viewed as a data augmentation method, for the agent to improve further utilizing its interaction history with a human. We demonstrate the proposed strategy is substantially more realistic and data-efficient compared to previously proposed pre-exploration techniques.

* 8 pages, accepted to AAAI 2020

Via

Access Paper or Ask Questions

The Eighth Dialog System Technology Challenge

Nov 14, 2019

Seokhwan Kim, Michel Galley, Chulaka Gunasekara, Sungjin Lee, Adam Atkinson, Baolin Peng, Hannes Schulz, Jianfeng Gao, Jinchao Li, Mahmoud Adada(+11 more)

Figure 1 for The Eighth Dialog System Technology Challenge

Figure 2 for The Eighth Dialog System Technology Challenge

Figure 3 for The Eighth Dialog System Technology Challenge

Figure 4 for The Eighth Dialog System Technology Challenge

Abstract:This paper introduces the Eighth Dialog System Technology Challenge. In line with recent challenges, the eighth edition focuses on applying end-to-end dialog technologies in a pragmatic way for multi-domain task-completion, noetic response selection, audio visual scene-aware dialog, and schema-guided dialog state tracking tasks. This paper describes the task definition, provided datasets, and evaluation set-up for each track. We also summarize the results of the submitted systems to highlight the overall trends of the state-of-the-art technologies for the tasks.

* Submitted to NeurIPS 2019 3rd Conversational AI Workshop

Via

Access Paper or Ask Questions

Analyzing Sentence Fusion in Abstractive Summarization

Oct 01, 2019

Logan Lebanoff, John Muchovej, Franck Dernoncourt, Doo Soon Kim, Seokhwan Kim, Walter Chang, Fei Liu

Figure 1 for Analyzing Sentence Fusion in Abstractive Summarization

Figure 2 for Analyzing Sentence Fusion in Abstractive Summarization

Figure 3 for Analyzing Sentence Fusion in Abstractive Summarization

Figure 4 for Analyzing Sentence Fusion in Abstractive Summarization

Abstract:While recent work in abstractive summarization has resulted in higher scores in automatic metrics, there is little understanding on how these systems combine information taken from multiple document sentences. In this paper, we analyze the outputs of five state-of-the-art abstractive summarizers, focusing on summary sentences that are formed by sentence fusion. We ask assessors to judge the grammaticality, faithfulness, and method of fusion for summary sentences. Our analysis reveals that system sentences are mostly grammatical, but often fail to remain faithful to the original article.

Via

Access Paper or Ask Questions

Scoring Sentence Singletons and Pairs for Abstractive Summarization

May 31, 2019

Logan Lebanoff, Kaiqiang Song, Franck Dernoncourt, Doo Soon Kim, Seokhwan Kim, Walter Chang, Fei Liu

Figure 1 for Scoring Sentence Singletons and Pairs for Abstractive Summarization

Figure 2 for Scoring Sentence Singletons and Pairs for Abstractive Summarization

Figure 3 for Scoring Sentence Singletons and Pairs for Abstractive Summarization

Figure 4 for Scoring Sentence Singletons and Pairs for Abstractive Summarization

Abstract:When writing a summary, humans tend to choose content from one or two sentences and merge them into a single summary sentence. However, the mechanisms behind the selection of one or multiple source sentences remain poorly understood. Sentence fusion assumes multi-sentence input; yet sentence selection methods only work with single sentences and not combinations of them. There is thus a crucial gap between sentence selection and fusion to support summarizing by both compressing single sentences and fusing pairs. This paper attempts to bridge the gap by ranking sentence singletons and pairs together in a unified space. Our proposed framework attempts to model human methodology by selecting either a single sentence or a pair of sentences, then compressing or fusing the sentence(s) to produce a summary sentence. We conduct extensive experiments on both single- and multi-document summarization datasets and report findings on sentence selection and abstraction.

* ACL 2019 (Long Paper)

Via

Access Paper or Ask Questions

Neural Sentence Embedding using Only In-domain Sentences for Out-of-domain Sentence Detection in Dialog Systems

Jul 27, 2018

Seonghan Ryu, Seokhwan Kim, Junhwi Choi, Hwanjo Yu, Gary Geunbae Lee

Figure 1 for Neural Sentence Embedding using Only In-domain Sentences for Out-of-domain Sentence Detection in Dialog Systems

Figure 2 for Neural Sentence Embedding using Only In-domain Sentences for Out-of-domain Sentence Detection in Dialog Systems

Figure 3 for Neural Sentence Embedding using Only In-domain Sentences for Out-of-domain Sentence Detection in Dialog Systems

Figure 4 for Neural Sentence Embedding using Only In-domain Sentences for Out-of-domain Sentence Detection in Dialog Systems

Abstract:To ensure satisfactory user experience, dialog systems must be able to determine whether an input sentence is in-domain (ID) or out-of-domain (OOD). We assume that only ID sentences are available as training data because collecting enough OOD sentences in an unbiased way is a laborious and time-consuming job. This paper proposes a novel neural sentence embedding method that represents sentences in a low-dimensional continuous vector space that emphasizes aspects that distinguish ID cases from OOD cases. We first used a large set of unlabeled text to pre-train word representations that are used to initialize neural sentence embedding. Then we used domain-category analysis as an auxiliary task to train neural sentence embedding for OOD sentence detection. After the sentence representations were learned, we used them to train an autoencoder aimed at OOD sentence detection. We evaluated our method by experimentally comparing it to the state-of-the-art methods in an eight-domain dialog system; our proposed method achieved the highest accuracy in all tests.

* Published in Pattern Recognition Letters, 88:26-32, 2017

Via

Access Paper or Ask Questions

A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents

May 22, 2018

Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, Nazli Goharian

Figure 1 for A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents

Figure 2 for A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents

Figure 3 for A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents

Figure 4 for A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents

Abstract:Neural abstractive summarization models have led to promising results in summarizing relatively short documents. We propose the first model for abstractive summarization of single, longer-form documents (e.g., research papers). Our approach consists of a new hierarchical encoder that models the discourse structure of a document, and an attentive discourse-aware decoder to generate the summary. Empirical results on two large-scale datasets of scientific papers show that our model significantly outperforms state-of-the-art models.

* NAACL HLT 2018

Via

Access Paper or Ask Questions

Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text

Jul 10, 2017

Zhe Wang, Kingsley Kuan, Mathieu Ravaut, Gaurav Manek, Sibo Song, Yuan Fang, Seokhwan Kim, Nancy Chen, Luis Fernando D'Haro, Luu Anh Tuan(+6 more)

Figure 1 for Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text

Figure 2 for Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text

Figure 3 for Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text

Figure 4 for Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text

Abstract:The YouTube-8M video classification challenge requires teams to classify 0.7 million videos into one or more of 4,716 classes. In this Kaggle competition, we placed in the top 3% out of 650 participants using released video and audio features. Beyond that, we extend the original competition by including text information in the classification, making this a truly multi-modal approach with vision, audio and text. The newly introduced text data is termed as YouTube-8M-Text. We present a classification framework for the joint use of text, visual and audio features, and conduct an extensive set of experiments to quantify the benefit that this additional mode brings. The inclusion of text yields state-of-the-art results, e.g. 86.7% GAP on the YouTube-8M-Text validation dataset.

* 8 pages, Accepted to CVPR'17 Workshop on YouTube-8M Large-Scale Video Understanding

Via

Access Paper or Ask Questions