



Abstract:The generation of political event data has remained much the same since the mid-1990s, both in terms of data acquisition and the process of coding text into data. Since the 1990s, however, there have been significant improvements in open-source natural language processing software and in the availability of digitized news content. This paper presents a new, next-generation event dataset, named Phoenix, that builds from these and other advances. This dataset includes improvements in the underlying news collection process and event coding software, along with the creation of a general processing pipeline necessary to produce daily-updated data. This paper provides a face validity checks by briefly examining the data for the conflict in Syria, and a comparison between Phoenix and the Integrated Crisis Early Warning System data.




Abstract:Automatically generated political event data is an important part of the social science data ecosystem. The approaches for generating this data, though, have remained largely the same for two decades. During this time, the field of computational linguistics has progressed tremendously. This paper presents an overview of political event data, including methods and ontologies, and a set of experiments to determine the applicability of deep neural networks to the extraction of political events from news text.