We present Deep Search DocQA. This application enables information extraction from documents via a question-answering conversational assistant. The system integrates several technologies from different AI disciplines consisting of document conversion to machine-readable format (via computer vision), finding relevant data (via natural language processing), and formulating an eloquent response (via large language models). Users can explore over 10,000 Environmental, Social, and Governance (ESG) disclosure reports from over 2000 corporations. The Deep Search platform can be accessed at: https://ds4sd.github.io.
Extracting dense representations for terms and phrases is a task of great importance for knowledge discovery platforms targeting highly-technical fields. Dense representations are used as features for downstream components and have multiple applications ranging from ranking results in search to summarization. Common approaches to create dense representations include training domain-specific embeddings with self-supervised setups or using sentence encoder models trained over similarity tasks. In contrast to static embeddings, sentence encoders do not suffer from the out-of-vocabulary (OOV) problem, but impose significant computational costs. In this paper, we propose a fully unsupervised approach to text encoding that consists of training small character-based models with the objective of reconstructing large pre-trained embedding matrices. Models trained with this approach can not only match the quality of sentence encoders in technical domains, but are 5 times smaller and up to 10 times faster, even on high-end GPUs.
Term extraction is an information extraction task at the root of knowledge discovery platforms. Developing term extractors that are able to generalize across very diverse and potentially highly technical domains is challenging, as annotations for domains requiring in-depth expertise are scarce and expensive to obtain. In this paper, we describe the term extraction subsystem of a commercial knowledge discovery platform that targets highly technical fields such as pharma, medical, and material science. To be able to generalize across domains, we introduce a fully unsupervised annotator (UA). It extracts terms by combining novel morphological signals from sub-word tokenization with term-to-topic and intra-term similarity metrics, computed using general-domain pre-trained sentence-encoders. The annotator is used to implement a weakly-supervised setup, where transformer-models are fine-tuned (or pre-trained) over the training data generated by running the UA over large unlabeled corpora. Our experiments demonstrate that our setup can improve the predictive performance while decreasing the inference latency on both CPUs and GPUs. Our annotators provide a very competitive baseline for all the cases where annotations are not available.
Aging civil infrastructures are closely monitored by engineers for damage and critical defects. As the manual inspection of such large structures is costly and time-consuming, we are working towards fully automating the visual inspections to support the prioritization of maintenance activities. To that end we combine recent advances in drone technology and deep learning. Unfortunately, annotation costs are incredibly high as our proprietary civil engineering dataset must be annotated by highly trained engineers. Active learning is, therefore, a valuable tool to optimize the trade-off between model performance and annotation costs. Our use-case differs from the classical active learning setting as our dataset suffers from heavy class imbalance and consists of a much larger already labeled data pool than other active learning research. We present a novel method capable of operating in this challenging setting by replacing the traditional active learning acquisition function with an auxiliary binary discriminator. We experimentally show that our novel method outperforms the best-performing traditional active learning method (BALD) by 5% and 38% accuracy on CIFAR-10 and our proprietary dataset respectively.
Artificial intelligence and machine learning algorithms have become ubiquitous. Although they offer a wide range of benefits, their adoption in decision-critical fields is limited by their lack of interpretability, particularly with textual data. Moreover, with more data available than ever before, it has become increasingly important to explain automated predictions. Generally, users find it difficult to understand the underlying computational processes and interact with the models, especially when the models fail to generate the outcomes or explanations, or both, correctly. This problem highlights the growing need for users to better understand the models' inner workings and gain control over their actions. This dissertation focuses on two fundamental challenges of addressing this need. The first involves explanation generation: inferring high-quality explanations from text documents in a scalable and data-driven manner. The second challenge consists in making explanations actionable, and we refer to it as critiquing. This dissertation examines two important applications in natural language processing and recommendation tasks. Overall, we demonstrate that interpretability does not come at the cost of reduced performance in two consequential applications. Our framework is applicable to other fields as well. This dissertation presents an effective means of closing the gap between promise and practice in artificial intelligence.
Explanation is important for text classification tasks. One prevalent type of explanation is rationales, which are text snippets of input text that suffice to yield the prediction and are meaningful to humans. A lot of research on rationalization has been based on the selective rationalization framework, which has recently been shown to be problematic due to the interlocking dynamics. In this paper, we show that we address the interlocking problem in the multi-aspect setting, where we aim to generate multiple rationales for multiple outputs. More specifically, we propose a multi-stage training method incorporating an additional self-supervised contrastive loss that helps to generate more semantically diverse rationales. Empirical results on the beer review dataset show that our method improves significantly the rationalization performance.
There has recently been growing interest in the automatic generation of cooking recipes that satisfy some form of dietary restrictions, thanks in part to the availability of online recipe data. Prior studies have used pre-trained language models, or relied on small paired recipe data (e.g., a recipe paired with a similar one that satisfies a dietary constraint). However, pre-trained language models generate inconsistent or incoherent recipes, and paired datasets are not available at scale. We address these deficiencies with RecipeCrit, a hierarchical denoising auto-encoder that edits recipes given ingredient-level critiques. The model is trained for recipe completion to learn semantic relationships within recipes. Our work's main innovation is our unsupervised critiquing module that allows users to edit recipes by interacting with the predicted ingredients; the system iteratively rewrites recipes to satisfy users' feedback. Experiments on the Recipe1M recipe dataset show that our model can more effectively edit recipes compared to strong language-modeling baselines, creating recipes that satisfy user constraints and are more correct, serendipitous, coherent, and relevant as measured by human judges.
Providing explanations for recommended items allows users to refine the recommendations by critiquing parts of the explanations. As a result of revisiting critiquing from the perspective of multimodal generative models, recent work has proposed M&Ms-VAE, which achieves state-of-the-art performance in terms of recommendation, explanation, and critiquing. M&Ms-VAE and similar models allow users to negatively critique (i.e., explicitly disagree). However, they share a significant drawback: users cannot positively critique (i.e., highlight a desired feature). We address this deficiency with M&Ms-VAE+, an extension of M&Ms-VAE that enables positive and negative critiquing. In addition to modeling users' interactions and keyphrase-usage preferences, we model their keyphrase-usage dislikes. Moreover, we design a novel critiquing module that is trained in a self-supervised fashion. Our experiments on two datasets show that M&Ms-VAE+ matches or exceeds M&Ms-VAE in recommendation and explanation performance. Furthermore, our results demonstrate that representing positive and negative critiques differently enables M&Ms-VAE+ to significantly outperform M&Ms-VAE and other models in positive and negative multi-step critiquing.
Recommendations with personalized explanations have been shown to increase user trust and perceived quality and help users make better decisions. Moreover, such explanations allow users to provide feedback by critiquing them. Several algorithms for recommender systems with multi-step critiquing have therefore been developed. However, providing a user-friendly interface based on personalized explanations and critiquing has not been addressed in the last decade. In this paper, we introduce four different web interfaces (available under https://lia.epfl.ch/critiquing/) helping users making decisions and finding their ideal item. We have chosen the hotel recommendation domain as a use case even though our approach is trivially adaptable for other domains. Moreover, our system is model-agnostic (for both recommender systems and critiquing models) allowing a great flexibility and further extensions. Our interfaces are above all a useful tool to help research in recommendation with critiquing. They allow to test such systems on a real use case and also to highlight some limitations of these approaches to find solutions to overcome them.