Alert button
Picture for Xiaolan Wang

Xiaolan Wang

Alert button

Snippext: Semi-supervised Opinion Mining with Augmented Data

Feb 07, 2020
Zhengjie Miao, Yuliang Li, Xiaolan Wang, Wang-Chiew Tan

Figure 1 for Snippext: Semi-supervised Opinion Mining with Augmented Data
Figure 2 for Snippext: Semi-supervised Opinion Mining with Augmented Data
Figure 3 for Snippext: Semi-supervised Opinion Mining with Augmented Data
Figure 4 for Snippext: Semi-supervised Opinion Mining with Augmented Data

Online services are interested in solutions to opinion mining, which is the problem of extracting aspects, opinions, and sentiments from text. One method to mine opinions is to leverage the recent success of pre-trained language models which can be fine-tuned to obtain high-quality extractions from reviews. However, fine-tuning language models still requires a non-trivial amount of training data. In this paper, we study the problem of how to significantly reduce the amount of labeled training data required in fine-tuning language models for opinion mining. We describe Snippext, an opinion mining system developed over a language model that is fine-tuned through semi-supervised learning with augmented data. A novelty of Snippext is its clever use of a two-prong approach to achieve state-of-the-art (SOTA) performance with little labeled training data through: (1) data augmentation to automatically generate more labeled training data from existing ones, and (2) a semi-supervised learning technique to leverage the massive amount of unlabeled data in addition to the (limited amount of) labeled data. We show with extensive experiments that Snippext performs comparably and can even exceed previous SOTA results on several opinion mining tasks with only half the training data required. Furthermore, it achieves new SOTA results when all training data are leveraged. By comparison to a baseline pipeline, we found that Snippext extracts significantly more fine-grained opinions which enable new opportunities of downstream applications.

* Accepted to WWW 2020 
Viaarxiv icon

Voyageur: An Experiential Travel Search Engine

Mar 04, 2019
Sara Evensen, Aaron Feng, Alon Halevy, Jinfeng Li, Vivian Li, Yuliang Li, Huining Liu, George Mihaila, John Morales, Natalie Nuno, Ekaterina Pavlovic, Wang-Chiew Tan, Xiaolan Wang

Figure 1 for Voyageur: An Experiential Travel Search Engine
Figure 2 for Voyageur: An Experiential Travel Search Engine

We describe Voyageur, which is an application of experiential search to the domain of travel. Unlike traditional search engines for online services, experiential search focuses on the experiential aspects of the service under consideration. In particular, Voyageur needs to handle queries for subjective aspects of the service (e.g., quiet hotel, friendly staff) and combine these with objective attributes, such as price and location. Voyageur also highlights interesting facts and tips about the services the user is considering to provide them with further insights into their choices.

* Demo paper accepted to the Web Conference 
Viaarxiv icon

Scalable Semantic Querying of Text

May 03, 2018
Xiaolan Wang, Aaron Feng, Behzad Golshan, Alon Halevy, George Mihaila, Hidekazu Oiwa, Wang-Chiew Tan

Figure 1 for Scalable Semantic Querying of Text
Figure 2 for Scalable Semantic Querying of Text
Figure 3 for Scalable Semantic Querying of Text
Figure 4 for Scalable Semantic Querying of Text

We present the KOKO system that takes declarative information extraction to a new level by incorporating advances in natural language processing techniques in its extraction language. KOKO is novel in that its extraction language simultaneously supports conditions on the surface of the text and on the structure of the dependency parse tree of sentences, thereby allowing for more refined extractions. KOKO also supports conditions that are forgiving to linguistic variation of expressing concepts and allows to aggregate evidence from the entire document in order to filter extractions. To scale up, KOKO exploits a multi-indexing scheme and heuristics for efficient extractions. We extensively evaluate KOKO over publicly available text corpora. We show that KOKO indices take up the smallest amount of space, are notably faster and more effective than a number of prior indexing schemes. Finally, we demonstrate KOKO's scale up on a corpus of 5 million Wikipedia articles.

Viaarxiv icon