Gender imbalance in Wikipedia content is a known challenge which the editor community is actively addressing. The aim of this paper is to provide the Wikipedia community with instruments to estimate the magnitude of the problem for different entity types (also known as classes) in Wikipedia. To this end, we apply class completeness estimation methods based on the gender attribute. Our results show not only which gender for different sub-classes of Person is more prevalent in Wikipedia, but also an idea of how complete the coverage is for difference genders and sub-classes of Person.
The problem of event extraction is a relatively difficult task for low resource languages due to the non-availability of sufficient annotated data. Moreover, the task becomes complex for tail (rarely occurring) labels wherein extremely less data is available. In this paper, we present a new dataset (InDEE-2019) in the disaster domain for multiple Indic languages, collected from news websites. Using this dataset, we evaluate several rule-based mechanisms to augment deep learning based models. We formulate our problem of event extraction as a sequence labeling task and perform extensive experiments to study and understand the effectiveness of different approaches. We further show that tail labels can be easily incorporated by creating new rules without the requirement of large annotated data.