Neural networks are one of the most investigated and widely used techniques in Machine Learning. In spite of their success, they still find limited application in safety- and security-related contexts, wherein assurance about networks' performances must be provided. In the recent past, automated reasoning techniques have been proposed by several researchers to close the gap between neural networks and applications requiring formal guarantees about their behavior. In this work, we propose a primer of such techniques and a comprehensive categorization of existing approaches for the automated verification of neural networks. A discussion about current limitations and directions for future investigation is provided to foster research on this topic at the crossroads of Machine Learning and Automated Reasoning.
We conducted an eye-tracking study where 30 participants performed searches on the web. We measured their topical knowledge before and after each task. Their eye-fixations were labelled as "reading" or "scanning". The series of reading fixations in a line, called "reading-sequences" were characterized by their length in pixels, fixation duration, and the number of fixations making up the sequence. We hypothesize that differences in knowledge-change of participants are reflected in their eye-tracking measures related to reading. Our results show that the participants with higher change in knowledge differ significantly in terms of their total reading-sequence-length, reading-sequence-duration, and number of reading fixations, when compared to participants with lower knowledge-change.
This working note discusses the topic of story generation, with a view to identifying the knowledge required to understand aviation incident narratives (which have structural similarities to stories), following the premise that to understand aviation incidents, one should at least be able to generate examples of them. We give a brief overview of aviation incidents and their relation to stories, and then describe two of our earlier attempts (using `scripts' and `story grammars') at incident generation which did not evolve promisingly. Following this, we describe a simple incident generator which did work (at a `toy' level), using a `world simulation' approach. This generator is based on Meehan's TALE-SPIN story generator (1977). We conclude with a critique of the approach.
Chinese poetry generation is a very challenging task in natural language processing. In this paper, we propose a novel two-stage poetry generating method which first plans the sub-topics of the poem according to the user's writing intent, and then generates each line of the poem sequentially, using a modified recurrent neural network encoder-decoder framework. The proposed planning-based method can ensure that the generated poem is coherent and semantically consistent with the user's intent. A comprehensive evaluation with human judgments demonstrates that our proposed approach outperforms the state-of-the-art poetry generating methods and the poem quality is somehow comparable to human poets.
Stochastic variational inference for collapsed models has recently been successfully applied to large scale topic modelling. In this paper, we propose a stochastic collapsed variational inference algorithm for hidden Markov models, in a sequential data setting. Given a collapsed hidden Markov Model, we break its long Markov chain into a set of short subchains. We propose a novel sum-product algorithm to update the posteriors of the subchains, taking into account their boundary transitions due to the sequential dependencies. Our experiments on two discrete datasets show that our collapsed algorithm is scalable to very large datasets, memory efficient and significantly more accurate than the existing uncollapsed algorithm.
This study is a first, exploratory attempt to use quantitative semantics techniques and topological analysis to analyze systemic patterns arising in a complex political system. In particular, we use a rich data set covering all speeches and debates in the UK House of Commons between 1975 and 2014. By the use of dynamic topic modeling (DTM) and topological data analysis (TDA) we show that both members and parties feature specific roles within the system, consistent over time, and extract global patterns indicating levels of political cohesion. Our results provide a wide array of novel hypotheses about the complex dynamics of political systems, with valuable policy applications.
Standard OCR is a well-researched topic of computer vision and can be considered solved for machine-printed text. However, when applied to unconstrained images, the recognition rates drop drastically. Therefore, the employment of object recognition-based techniques has become state of the art in scene text recognition applications. This paper presents a scene text recognition method tailored to ancient coin legends and compares the results achieved in character and word recognition experiments to a standard OCR engine. The conducted experiments show that the proposed method outperforms the standard OCR engine on a set of 180 cropped coin legend words.
Metaheuristic algorithms are becoming an important part of modern optimization. A wide range of metaheuristic algorithms have emerged over the last two decades, and many metaheuristics such as particle swarm optimization are becoming increasingly popular. Despite their popularity, mathematical analysis of these algorithms lacks behind. Convergence analysis still remains unsolved for the majority of metaheuristic algorithms, while efficiency analysis is equally challenging. In this paper, we intend to provide an overview of convergence and efficiency studies of metaheuristics, and try to provide a framework for analyzing metaheuristics in terms of convergence and efficiency. This can form a basis for analyzing other algorithms. We also outline some open questions as further research topics.
Although machine learning (ML) has been successful in automating various software engineering needs, software testing still remains a highly challenging topic. In this paper, we aim to improve the generative testing of software by directly augmenting the random number generator (RNG) with a deep reinforcement learning (RL) agent using an efficient, automatically extractable state representation of the software under test. Using the Cosmos SDK as the testbed, we show that the proposed DeepRNG framework provides a statistically significant improvement to the testing of the highly complex software library with over 350,000 lines of code. The source code of the DeepRNG framework is publicly available online.
In eDiscovery, a party to a lawsuit or similar action must search through available information to identify those documents and files that are relevant to the suit. Search efforts tend to identify less than 100% of the relevant documents and courts are frequently asked to adjudicate whether the search effort has been reasonable, or whether additional effort to find more of the relevant documents is justified. This article provides a method for estimating the probability that significant additional information will be found from extended effort. Modeling and two data sets indicate that the probability that facts/topics exist among the so-far unidentified documents that have not been observed in the identified documents is low for even moderate levels of Recall.