Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Claudiu Musat

Swisscom AG: Data Analytics & AI

Embedding Individual Table Columns for Resilient SQL Chatbots

Nov 01, 2018

Bojan Petrovski, Ignacio Aguado, Andreea Hossmann, Michael Baeriswyl, Claudiu Musat

Figure 1 for Embedding Individual Table Columns for Resilient SQL Chatbots

Figure 2 for Embedding Individual Table Columns for Resilient SQL Chatbots

Figure 3 for Embedding Individual Table Columns for Resilient SQL Chatbots

Figure 4 for Embedding Individual Table Columns for Resilient SQL Chatbots

Abstract:Most of the world's data is stored in relational databases. Accessing these requires specialized knowledge of the Structured Query Language (SQL), putting them out of the reach of many people. A recent research thread in Natural Language Processing (NLP) aims to alleviate this problem by automatically translating natural language questions into SQL queries. While the proposed solutions are a great start, they lack robustness and do not easily generalize: the methods require high quality descriptions of the database table columns, and the most widely used training dataset, WikiSQL, is heavily biased towards using those descriptions as part of the questions. In this work, we propose solutions to both problems: we entirely eliminate the need for column descriptions, by relying solely on their contents, and we augment the WikiSQL dataset by paraphrasing column names to reduce bias. We show that the accuracy of existing methods drops when trained on our augmented, column-agnostic dataset, and that our own method reaches state of the art accuracy, while relying on column contents only.

* SCAI, 2018

Via

Access Paper or Ask Questions

Simple Unsupervised Keyphrase Extraction using Sentence Embeddings

Sep 05, 2018

Kamil Bennani-Smires, Claudiu Musat, Andreea Hossmann, Michael Baeriswyl, Martin Jaggi

Figure 1 for Simple Unsupervised Keyphrase Extraction using Sentence Embeddings

Figure 2 for Simple Unsupervised Keyphrase Extraction using Sentence Embeddings

Figure 3 for Simple Unsupervised Keyphrase Extraction using Sentence Embeddings

Figure 4 for Simple Unsupervised Keyphrase Extraction using Sentence Embeddings

Abstract:Keyphrase extraction is the task of automatically selecting a small set of phrases that best describe a given free text document. Supervised keyphrase extraction requires large amounts of labeled training data and generalizes very poorly outside the domain of the training data. At the same time, unsupervised systems have poor accuracy, and often do not generalize well, as they require the input document to belong to a larger corpus also given as input. Addressing these drawbacks, in this paper, we tackle keyphrase extraction from single documents with EmbedRank: a novel unsupervised method, that leverages sentence embeddings. EmbedRank achieves higher F-scores than graph-based state of the art systems on standard datasets and is suitable for real-time processing of large amounts of Web data. With EmbedRank, we also explicitly increase coverage and diversity among the selected keyphrases by introducing an embedding-based maximal marginal relevance (MMR) for new phrases. A user study including over 200 votes showed that, although reducing the phrases' semantic overlap leads to no gains in F-score, our high diversity selection is preferred by humans.

Via

Access Paper or Ask Questions

Churn Intent Detection in Multilingual Chatbot Conversations and Social Media

Aug 25, 2018

Christian Abbet, Meryem M'hamdi, Athanasios Giannakopoulos, Robert West, Andreea Hossmann, Michael Baeriswyl, Claudiu Musat

Figure 1 for Churn Intent Detection in Multilingual Chatbot Conversations and Social Media

Figure 2 for Churn Intent Detection in Multilingual Chatbot Conversations and Social Media

Figure 3 for Churn Intent Detection in Multilingual Chatbot Conversations and Social Media

Figure 4 for Churn Intent Detection in Multilingual Chatbot Conversations and Social Media

Abstract:We propose a new method to detect when users express the intent to leave a service, also known as churn. While previous work focuses solely on social media, we show that this intent can be detected in chatbot conversations. As companies increasingly rely on chatbots they need an overview of potentially churny users. To this end, we crowdsource and publish a dataset of churn intent expressions in chatbot interactions in German and English. We show that classifiers trained on social media data can detect the same intent in the context of chatbots. We introduce a classification architecture that outperforms existing work on churn intent detection in social media. Moreover, we show that, using bilingual word embeddings, a system trained on combined English and German data outperforms monolingual approaches. As the only existing dataset is in English, we crowdsource and publish a novel dataset of German tweets. We thus underline the universal aspect of the problem, as examples of churn intent in English help us identify churn in German tweets and chatbot conversations.

* 10 pages

Via

Access Paper or Ask Questions

Goal-Oriented Chatbot Dialog Management Bootstrapping with Transfer Learning

Jul 24, 2018

Vladimir Ilievski, Claudiu Musat, Andreea Hossmann, Michael Baeriswyl

Figure 1 for Goal-Oriented Chatbot Dialog Management Bootstrapping with Transfer Learning

Figure 2 for Goal-Oriented Chatbot Dialog Management Bootstrapping with Transfer Learning

Figure 3 for Goal-Oriented Chatbot Dialog Management Bootstrapping with Transfer Learning

Figure 4 for Goal-Oriented Chatbot Dialog Management Bootstrapping with Transfer Learning

Abstract:Goal-Oriented (GO) Dialogue Systems, colloquially known as goal oriented chatbots, help users achieve a predefined goal (e.g. book a movie ticket) within a closed domain. A first step is to understand the user's goal by using natural language understanding techniques. Once the goal is known, the bot must manage a dialogue to achieve that goal, which is conducted with respect to a learnt policy. The success of the dialogue system depends on the quality of the policy, which is in turn reliant on the availability of high-quality training data for the policy learning method, for instance Deep Reinforcement Learning. Due to the domain specificity, the amount of available data is typically too low to allow the training of good dialogue policies. In this paper we introduce a transfer learning method to mitigate the effects of the low in-domain data availability. Our transfer learning based approach improves the bot's success rate by 20% in relative terms for distant domains and we more than double it for close domains, compared to the model without transfer learning. Moreover, the transfer learning chatbots learn the policy up to 5 to 10 times faster. Finally, as the transfer learning approach is complementary to additional processing such as warm-starting, we show that their joint application gives the best outcomes.

* 7 pages (6 pages plus 1 page of references), 5 figures, 1 pseudocode figure

Via

Access Paper or Ask Questions

Submodularity-Inspired Data Selection for Goal-Oriented Chatbot Training Based on Sentence Embeddings

Jul 08, 2018

Mladen Dimovski, Claudiu Musat, Vladimir Ilievski, Andreea Hossmann, Michael Baeriswyl

Figure 1 for Submodularity-Inspired Data Selection for Goal-Oriented Chatbot Training Based on Sentence Embeddings

Figure 2 for Submodularity-Inspired Data Selection for Goal-Oriented Chatbot Training Based on Sentence Embeddings

Figure 3 for Submodularity-Inspired Data Selection for Goal-Oriented Chatbot Training Based on Sentence Embeddings

Figure 4 for Submodularity-Inspired Data Selection for Goal-Oriented Chatbot Training Based on Sentence Embeddings

Abstract:Spoken language understanding (SLU) systems, such as goal-oriented chatbots or personal assistants, rely on an initial natural language understanding (NLU) module to determine the intent and to extract the relevant information from the user queries they take as input. SLU systems usually help users to solve problems in relatively narrow domains and require a large amount of in-domain training data. This leads to significant data availability issues that inhibit the development of successful systems. To alleviate this problem, we propose a technique of data selection in the low-data regime that enables us to train with fewer labeled sentences, thus smaller labelling costs. We propose a submodularity-inspired data ranking function, the ratio-penalty marginal gain, for selecting data points to label based only on the information extracted from the textual embedding space. We show that the distances in the embedding space are a viable source of information that can be used for data selection. Our method outperforms two known active learning techniques and enables cost-efficient training of the NLU unit. Moreover, our proposed selection technique does not need the model to be retrained in between the selection steps, making it time efficient as well.

Via

Access Paper or Ask Questions

DataBright: Towards a Global Exchange for Decentralized Data Ownership and Trusted Computation

Feb 13, 2018

David Dao, Dan Alistarh, Claudiu Musat, Ce Zhang

Abstract:It is safe to assume that, for the foreseeable future, machine learning, especially deep learning will remain both data- and computation-hungry. In this paper, we ask: Can we build a global exchange where everyone can contribute computation and data to train the next generation of machine learning applications? We present an early, but running prototype of DataBright, a system that turns the creation of training examples and the sharing of computation into an investment mechanism. Unlike most crowdsourcing platforms, where the contributor gets paid when they submit their data, DataBright pays dividends whenever a contributor's data or hardware is used by someone to train a machine learning model. The contributor becomes a shareholder in the dataset they created. To enable the measurement of usage, a computation platform that contributors can trust is also necessary. DataBright thus merges both a data market and a trusted computation market. We illustrate that trusted computation can enable the creation of an AI market, where each data point has an exact value that should be paid to its creator. DataBright allows data creators to retain ownership of their contribution and attaches to it a measurable value. The value of the data is given by its utility in subsequent distributed computation done on the DataBright computation market. The computation market allocates tasks and subsequent payments to pooled hardware. This leads to the creation of a decentralized AI cloud. Our experiments show that trusted hardware such as Intel SGX can be added to the usual ML pipeline with no additional costs. We use this setting to orchestrate distributed computation that enables the creation of a computation market. DataBright is available for download at https://github.com/ds3lab/databright.

Via

Access Paper or Ask Questions

Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German

Feb 06, 2018

Pierre-Edouard Honnet, Andrei Popescu-Belis, Claudiu Musat, Michael Baeriswyl

Figure 1 for Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German

Figure 2 for Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German

Figure 3 for Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German

Figure 4 for Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German

Abstract:The goal of this work is to design a machine translation (MT) system for a low-resource family of dialects, collectively known as Swiss German, which are widely spoken in Switzerland but seldom written. We collected a significant number of parallel written resources to start with, up to a total of about 60k words. Moreover, we identified several other promising data sources for Swiss German. Then, we designed and compared three strategies for normalizing Swiss German input in order to address the regional diversity. We found that character-based neural MT was the best solution for text normalization. In combination with phrase-based statistical MT, our solution reached 36% BLEU score when translating from the Bernese dialect. This value, however, decreases as the testing data becomes more remote from the training one, geographically and topically. These resources and normalization techniques are a first step towards full MT of Swiss German dialects.

* 11th Language Resources and Evaluation Conference (LREC), 7-12 May 2018, Miyazaki (Japan)

Via

Access Paper or Ask Questions

Diverse Beam Search for Increased Novelty in Abstractive Summarization

Feb 05, 2018

André Cibils, Claudiu Musat, Andreea Hossman, Michael Baeriswyl

Figure 1 for Diverse Beam Search for Increased Novelty in Abstractive Summarization

Figure 2 for Diverse Beam Search for Increased Novelty in Abstractive Summarization

Abstract:Text summarization condenses a text to a shorter version while retaining the important informations. Abstractive summarization is a recent development that generates new phrases, rather than simply copying or rephrasing sentences within the original text. Recently neural sequence-to-sequence models have achieved good results in the field of abstractive summarization, which opens new possibilities and applications for industrial purposes. However, most practitioners observe that these models still use large parts of the original text in the output summaries, making them often similar to extractive frameworks. To address this drawback, we first introduce a new metric to measure how much of a summary is extracted from the input text. Secondly, we present a novel method, that relies on a diversity factor in computing the neural network loss, to improve the diversity of the summaries generated by any neural abstractive model implementing beam search. Finally, we show that this method not only makes the system less extractive, but also improves the overall rouge score of state-of-the-art methods by at least 2 points.

Via

Access Paper or Ask Questions

GitGraph - Architecture Search Space Creation through Frequent Computational Subgraph Mining

Jan 16, 2018

Kamil Bennani-Smires, Claudiu Musat, Andreea Hossmann, Michael Baeriswyl

Figure 1 for GitGraph - Architecture Search Space Creation through Frequent Computational Subgraph Mining

Figure 2 for GitGraph - Architecture Search Space Creation through Frequent Computational Subgraph Mining

Figure 3 for GitGraph - Architecture Search Space Creation through Frequent Computational Subgraph Mining

Figure 4 for GitGraph - Architecture Search Space Creation through Frequent Computational Subgraph Mining

Abstract:The dramatic success of deep neural networks across multiple application areas often relies on experts painstakingly designing a network architecture specific to each task. To simplify this process and make it more accessible, an emerging research effort seeks to automate the design of neural network architectures, using e.g. evolutionary algorithms or reinforcement learning or simple search in a constrained space of neural modules. Considering the typical size of the search space (e.g. $10^{10}$ candidates for a $10$-layer network) and the cost of evaluating a single candidate, current architecture search methods are very restricted. They either rely on static pre-built modules to be recombined for the task at hand, or they define a static hand-crafted framework within which they can generate new architectures from the simplest possible operations. In this paper, we relax these restrictions, by capitalizing on the collective wisdom contained in the plethora of neural networks published in online code repositories. Concretely, we (a) extract and publish GitGraph, a corpus of neural architectures and their descriptions; (b) we create problem-specific neural architecture search spaces, implemented as a textual search mechanism over GitGraph; (c) we propose a method of identifying unique common subgraphs within the architectures solving each problem (e.g., image processing, reinforcement learning), that can then serve as modules in the newly created problem specific neural search space.

Via

Access Paper or Ask Questions

Dataset Construction via Attention for Aspect Term Extraction with Distant Supervision

Sep 26, 2017

Athanasios Giannakopoulos, Diego Antognini, Claudiu Musat, Andreea Hossmann, Michael Baeriswyl

Figure 1 for Dataset Construction via Attention for Aspect Term Extraction with Distant Supervision

Figure 2 for Dataset Construction via Attention for Aspect Term Extraction with Distant Supervision

Figure 3 for Dataset Construction via Attention for Aspect Term Extraction with Distant Supervision

Figure 4 for Dataset Construction via Attention for Aspect Term Extraction with Distant Supervision

Abstract:Aspect Term Extraction (ATE) detects opinionated aspect terms in sentences or text spans, with the end goal of performing aspect-based sentiment analysis. The small amount of available datasets for supervised ATE and the fact that they cover only a few domains raise the need for exploiting other data sources in new and creative ways. Publicly available review corpora contain a plethora of opinionated aspect terms and cover a larger domain spectrum. In this paper, we first propose a method for using such review corpora for creating a new dataset for ATE. Our method relies on an attention mechanism to select sentences that have a high likelihood of containing actual opinionated aspects. We thus improve the quality of the extracted aspects. We then use the constructed dataset to train a model and perform ATE with distant supervision. By evaluating on human annotated datasets, we prove that our method achieves a significantly improved performance over various unsupervised and supervised baselines. Finally, we prove that sentence selection matters when it comes to creating new datasets for ATE. Specifically, we show that, using a set of selected sentences leads to higher ATE performance compared to using the whole sentence set.

Via

Access Paper or Ask Questions