We describe a method for proactive information retrieval targeted at retrieving relevant information during a writing task. In our method, the current task and the needs of the user are estimated, and the potential next steps are unobtrusively predicted based on the user's past actions. We focus on the task of writing, in which the user is coalescing previously collected information into a text. Our proactive system automatically recommends the user relevant background information. The proposed system incorporates text input prediction using a long short-term memory (LSTM) network. We present simulations, which show that the system is able to reach higher precision values in an exploratory search setting compared to both a baseline and a comparison system.
Communication is a cornerstone of social interactions, be it with human or artificial intelligence (AI). Yet it can be harmful, depending on the honesty of the exchanged information. To study this, an agent based sociological simulation framework is presented, the reputation game. This illustrates the impact of different communication strategies on the agents' reputation. The game focuses on the trustworthiness of the participating agents, their honesty as perceived by others. In the game, each agent exchanges statements with the others about their own and each other's honesty, which lets their judgments evolve. Various sender and receiver strategies are studied, like sycophant, egocentricity, pathological lying, and aggressiveness for senders as well as awareness and lack thereof for receivers. Minimalist malicious strategies are identified, like being manipulative, dominant, or destructive, which significantly increase reputation at others' costs. Phenomena such as echo chambers, self-deception, deception symbiosis, clique formation, freezing of group opinions emerge from the dynamics. This indicates that the reputation game can be studied for complex group phenomena, to test behavioral hypothesis, and to analyze AI influenced social media. With refined rules it may help to understand social interactions, and to safeguard the design of non-abusive AI systems.
Deep complex convolution recurrent network (DCCRN), which extends CRN with complex structure, has achieved superior performance in MOS evaluation in Interspeech 2020 deep noise suppression challenge (DNS2020). This paper further extends DCCRN with the following significant revisions. We first extend the model to sub-band processing where the bands are split and merged by learnable neural network filters instead of engineered FIR filters, leading to a faster noise suppressor trained in an end-to-end manner. Then the LSTM is further substituted with a complex TF-LSTM to better model temporal dependencies along both time and frequency axes. Moreover, instead of simply concatenating the output of each encoder layer to the input of the corresponding decoder layer, we use convolution blocks to first aggregate essential information from the encoder output before feeding it to the decoder layers. We specifically formulate the decoder with an extra a priori SNR estimation module to maintain good speech quality while removing noise. Finally a post-processing module is adopted to further suppress the unnatural residual noise. The new model, named DCCRN+, has surpassed the original DCCRN as well as several competitive models in terms of PESQ and DNSMOS, and has achieved superior performance in the new Interspeech 2021 DNS challenge
Powerful deep learning algorithms open an opportunity for solving non-image Machine Learning (ML) problems by transforming these problems to into the image recognition problems. The CPC-R algorithm presented in this chapter converts non-image data into images by visualizing non-image data. Then deep learning CNN algorithms solve the learning problems on these images. The design of the CPC-R algorithm allows preserving all high-dimensional information in 2-D images. The use of pair values mapping instead of single value mapping used in the alternative approaches allows encoding each n-D point with 2 times fewer visual elements. The attributes of an n-D point are divided into pairs of its values and each pair is visualized as 2-D points in the same 2-D Cartesian coordinates. Next, grey scale or color intensity values are assigned to each pair to encode the order of pairs. This is resulted in the heatmap image. The computational experiments with CPC-R are conducted for different CNN architectures, and methods to optimize the CPC-R images showing that the combined CPC-R and deep learning CNN algorithms are able to solve non-image ML problems reaching high accuracy on the benchmark datasets. This chapter expands our prior work by adding more experiments to test accuracy of classification, exploring saliency and informativeness of discovered features to test their interpretability, and generalizing the approach.
Wide field small aperture telescopes (WFSATs) are mainly used to obtain scientific information of point--like and streak--like celestial objects. However, qualities of images obtained by WFSATs are seriously affected by the background noise and variable point spread functions. Developing high speed and high efficiency data processing method is of great importance for further scientific research. In recent years, deep neural networks have been proposed for detection and classification of celestial objects and have shown better performance than classical methods. In this paper, we further extend abilities of the deep neural network based astronomical target detection framework to make it suitable for photometry and astrometry. We add new branches into the deep neural network to obtain types, magnitudes and positions of different celestial objects at the same time. Tested with simulated data, we find that our neural network has better performance in photometry than classical methods. Because photometry and astrometry are regression algorithms, which would obtain high accuracy measurements instead of rough classification results, the accuracy of photometry and astrometry results would be affected by different observation conditions. To solve this problem, we further propose to use reference stars to train our deep neural network with transfer learning strategy when observation conditions change. The photometry framework proposed in this paper could be used as an end--to--end quick data processing framework for WFSATs, which can further increase response speed and scientific outputs of WFSATs.
The COVID-19 pandemic has driven ever-greater demand for tools which enable efficient exploration of biomedical literature. Although semi-structured information resulting from concept recognition and detection of the defining elements of clinical trials (e.g. PICO criteria) has been commonly used to support literature search, the contributions of this abstraction remain poorly understood, especially in relation to text-based retrieval. In this study, we compare the results retrieved by a standard search engine with those filtered using clinically-relevant concepts and their relations. With analysis based on the annotations from the TREC-COVID shared task, we obtain quantitative as well as qualitative insights into characteristics of relational and concept-based literature exploration. Most importantly, we find that the relational concept selection filters the original retrieved collection in a way that decreases the proportion of unjudged documents and increases the precision, which means that the user is likely to be exposed to a larger number of relevant documents.
Digital twins are emerging in many industries, typically consisting of simulation models and data associated with a specific physical system. One of the main reasons for developing a digital twin, is to enable the simulation of possible consequences of a given action, without the need to interfere with the physical system itself. Physical systems of interest, and the environments they operate in, do not always behave deterministically. Moreover, information about the system and its environment is typically incomplete or imperfect. Probabilistic representations of systems and environments may therefore be called for, especially to support decisions in application areas where actions may have severe consequences. In this paper we introduce the probabilistic digital twin (PDT). We will start by discussing how epistemic uncertainty can be treated using measure theory, by modelling epistemic information via $\sigma$-algebras. Based on this, we give a formal definition of how epistemic uncertainty can be updated in a PDT. We then study the problem of optimal sequential decision making. That is, we consider the case where the outcome of each decision may inform the next. Within the PDT framework, we formulate this optimization problem. We discuss how this problem may be solved (at least in theory) via the maximum principle method or the dynamic programming principle. However, due to the curse of dimensionality, these methods are often not tractable in practice. To mend this, we propose a generic approximate solution using deep reinforcement learning together with neural networks defined on sets. We illustrate the method on a practical problem, considering optimal information gathering for the estimation of a failure probability.
How much information do NLP tasks really need from a transformer's attention mechanism at application-time (inference)? From recent work, we know that there is sparsity in transformers and that the floating-points within its computation can be discretized to fewer values with minimal loss to task accuracies. However, this requires retraining or even creating entirely new models, both of which can be expensive and carbon-emitting. Focused on optimizations that do not require training, we systematically study the full range of typical attention values necessary. This informs the design of an inference-time quantization technique using both pruning and log-scaled mapping which produces only a few (e.g. $2^3$) unique values. Over the tasks of question answering and sentiment analysis, we find nearly 80% of attention values can be pruned to zeros with minimal ($< 1.0\%$) relative loss in accuracy. We use this pruning technique in conjunction with quantizing the attention values to only a 3-bit format, without retraining, resulting in only a 0.8% accuracy reduction on question answering with fine-tuned RoBERTa.
Short textual descriptions of entities provide summaries of their key attributes and have been shown to be useful sources of background knowledge for tasks such as entity linking and question answering. However, generating entity descriptions, especially for new and long-tail entities, can be challenging since relevant information is often scattered across multiple sources with varied content and style. We introduce DESCGEN: given mentions spread over multiple documents, the goal is to generate an entity summary description. DESCGEN consists of 37K entity descriptions from Wikipedia and Fandom, each paired with nine evidence documents on average. The documents were collected using a combination of entity linking and hyperlinks to the Wikipedia and Fandom entity pages, which together provide high-quality distant supervision. The resulting summaries are more abstractive than those found in existing datasets and provide a better proxy for the challenge of describing new and emerging entities. We also propose a two-stage extract-then-generate baseline and show that there exists a large gap (19.9% in ROUGE-L) between state-of-the-art models and human performance, suggesting that the data will support significant future work.
In recent, deep learning-based feature representation methods have shown a promising impact in electroencephalography (EEG)-based brain-computer interface (BCI). Nonetheless, due to high intra- and inter-subject variabilities, many studies on decoding EEG were designed in a subject-specific manner by using calibration samples, with no much concern on its less practical, time-consuming, and data-hungry process. To tackle this problem, recent studies took advantage of transfer learning, especially using domain adaptation techniques. However, there still remain two challenging limitations; i) most domain adaptation methods are designed for labeled source and unlabeled target domain whereas BCI tasks generally have multiple annotated domains. ii) most of the methods do not consider negatively transferable to disrupt generalization ability. In this paper, we propose a novel network architecture to tackle those limitations by estimating mutual information in high-level representation and low-level representation, separately. Specifically, our proposed method extracts domain-invariant and class-relevant features, thereby enhancing generalizability in classification across. It is also noteworthy that our method can be applicable to a new subject with a small amount of data via a fine-tuning, step only, reducing calibration time for practical uses. We validated our proposed method on a big motor imagery EEG dataset by showing promising results, compared to competing methods considered in our experiments.