Alert button
Picture for Ingemar J. Cox

Ingemar J. Cox

Alert button

E-NER -- An Annotated Named Entity Recognition Corpus of Legal Text

Dec 19, 2022
Ting Wai Terence Au, Ingemar J. Cox, Vasileios Lampos

Figure 1 for E-NER -- An Annotated Named Entity Recognition Corpus of Legal Text
Figure 2 for E-NER -- An Annotated Named Entity Recognition Corpus of Legal Text
Figure 3 for E-NER -- An Annotated Named Entity Recognition Corpus of Legal Text
Figure 4 for E-NER -- An Annotated Named Entity Recognition Corpus of Legal Text

Identifying named entities such as a person, location or organization, in documents can highlight key information to readers. Training Named Entity Recognition (NER) models requires an annotated data set, which can be a time-consuming labour-intensive task. Nevertheless, there are publicly available NER data sets for general English. Recently there has been interest in developing NER for legal text. However, prior work and experimental results reported here indicate that there is a significant degradation in performance when NER methods trained on a general English data set are applied to legal text. We describe a publicly available legal NER data set, called E-NER, based on legal company filings available from the US Securities and Exchange Commission's EDGAR data set. Training a number of different NER algorithms on the general English CoNLL-2003 corpus but testing on our test collection confirmed significant degradations in accuracy, as measured by the F1-score, of between 29.4\% and 60.4\%, compared to training and testing on the E-NER collection.

* 5 pages, 3 figures, submitted to NLLP workshop in EMNLP 2022 
Viaarxiv icon

Estimating the Uncertainty of Neural Network Forecasts for Influenza Prevalence Using Web Search Activity

May 26, 2021
Michael Morris, Peter Hayes, Ingemar J. Cox, Vasileios Lampos

Figure 1 for Estimating the Uncertainty of Neural Network Forecasts for Influenza Prevalence Using Web Search Activity
Figure 2 for Estimating the Uncertainty of Neural Network Forecasts for Influenza Prevalence Using Web Search Activity
Figure 3 for Estimating the Uncertainty of Neural Network Forecasts for Influenza Prevalence Using Web Search Activity

Influenza is an infectious disease with the potential to become a pandemic, and hence, forecasting its prevalence is an important undertaking for planning an effective response. Research has found that web search activity can be used to improve influenza models. Neural networks (NN) can provide state-of-the-art forecasting accuracy but do not commonly incorporate uncertainty in their estimates, something essential for using them effectively during decision making. In this paper, we demonstrate how Bayesian Neural Networks (BNNs) can be used to both provide a forecast and a corresponding uncertainty without significant loss in forecasting accuracy compared to traditional NNs. Our method accounts for two sources of uncertainty: data and model uncertainty, arising due to measurement noise and model specification, respectively. Experiments are conducted using 14 years of data for England, assessing the model's accuracy over the last 4 flu seasons in this dataset. We evaluate the performance of different models including competitive baselines with conventional metrics as well as error functions that incorporate uncertainty estimates. Our empirical analysis indicates that considering both sources of uncertainty simultaneously is superior to considering either one separately. We also show that a BNN with recurrent layers that models both sources of uncertainty yields superior accuracy for these metrics for forecasting horizons greater than 7 days.

Viaarxiv icon

Multi-Dueling Bandits and Their Application to Online Ranker Evaluation

Aug 22, 2016
Brian Brost, Yevgeny Seldin, Ingemar J. Cox, Christina Lioma

Figure 1 for Multi-Dueling Bandits and Their Application to Online Ranker Evaluation
Figure 2 for Multi-Dueling Bandits and Their Application to Online Ranker Evaluation
Figure 3 for Multi-Dueling Bandits and Their Application to Online Ranker Evaluation
Figure 4 for Multi-Dueling Bandits and Their Application to Online Ranker Evaluation

New ranking algorithms are continually being developed and refined, necessitating the development of efficient methods for evaluating these rankers. Online ranker evaluation focuses on the challenge of efficiently determining, from implicit user feedback, which ranker out of a finite set of rankers is the best. Online ranker evaluation can be modeled by dueling ban- dits, a mathematical model for online learning under limited feedback from pairwise comparisons. Comparisons of pairs of rankers is performed by interleaving their result sets and examining which documents users click on. The dueling bandits model addresses the key issue of which pair of rankers to compare at each iteration, thereby providing a solution to the exploration-exploitation trade-off. Recently, methods for simultaneously comparing more than two rankers have been developed. However, the question of which rankers to compare at each iteration was left open. We address this question by proposing a generalization of the dueling bandits model that uses simultaneous comparisons of an unrestricted number of rankers. We evaluate our algorithm on synthetic data and several standard large-scale online ranker evaluation datasets. Our experimental results show that the algorithm yields orders of magnitude improvement in performance compared to stateof- the-art dueling bandit algorithms.

Viaarxiv icon