Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Muhammad Tayyab Zamir

EDU-NER-2025: Named Entity Recognition in Urdu Educational Texts using XLM-RoBERTa with X (formerly Twitter)

Apr 25, 2025

Fida Ullah, Muhammad Ahmad, Muhammad Tayyab Zamir, Muhammad Arif, Grigori sidorov, Edgardo Manuel Felipe Riverón, Alexander Gelbukh

Abstract:Named Entity Recognition (NER) plays a pivotal role in various Natural Language Processing (NLP) tasks by identifying and classifying named entities (NEs) from unstructured data into predefined categories such as person, organization, location, date, and time. While extensive research exists for high-resource languages and general domains, NER in Urdu particularly within domain-specific contexts like education remains significantly underexplored. This is Due to lack of annotated datasets for educational content which limits the ability of existing models to accurately identify entities such as academic roles, course names, and institutional terms, underscoring the urgent need for targeted resources in this domain. To the best of our knowledge, no dataset exists in the domain of the Urdu language for this purpose. To achieve this objective this study makes three key contributions. Firstly, we created a manually annotated dataset in the education domain, named EDU-NER-2025, which contains 13 unique most important entities related to education domain. Second, we describe our annotation process and guidelines in detail and discuss the challenges of labelling EDU-NER-2025 dataset. Third, we addressed and analyzed key linguistic challenges, such as morphological complexity and ambiguity, which are prevalent in formal Urdu texts.

Via

Access Paper or Ask Questions

A Named Entity Recognition and Topic Modeling-based Solution for Locating and Better Assessment of Natural Disasters in Social Media

May 01, 2024

Ayaz Mehmood, Muhammad Tayyab Zamir, Muhammad Asif Ayub, Nasir Ahmad, Kashif Ahmad

Figure 1 for A Named Entity Recognition and Topic Modeling-based Solution for Locating and Better Assessment of Natural Disasters in Social Media

Figure 2 for A Named Entity Recognition and Topic Modeling-based Solution for Locating and Better Assessment of Natural Disasters in Social Media

Figure 3 for A Named Entity Recognition and Topic Modeling-based Solution for Locating and Better Assessment of Natural Disasters in Social Media

Figure 4 for A Named Entity Recognition and Topic Modeling-based Solution for Locating and Better Assessment of Natural Disasters in Social Media

Abstract:Over the last decade, similar to other application domains, social media content has been proven very effective in disaster informatics. However, due to the unstructured nature of the data, several challenges are associated with disaster analysis in social media content. To fully explore the potential of social media content in disaster informatics, access to relevant content and the correct geo-location information is very critical. In this paper, we propose a three-step solution to tackling these challenges. Firstly, the proposed solution aims to classify social media posts into relevant and irrelevant posts followed by the automatic extraction of location information from the posts' text through Named Entity Recognition (NER) analysis. Finally, to quickly analyze the topics covered in large volumes of social media posts, we perform topic modeling resulting in a list of top keywords, that highlight the issues discussed in the tweet. For the Relevant Classification of Twitter Posts (RCTP), we proposed a merit-based fusion framework combining the capabilities of four different models namely BERT, RoBERTa, Distil BERT, and ALBERT obtaining the highest F1-score of 0.933 on a benchmark dataset. For the Location Extraction from Twitter Text (LETT), we evaluated four models namely BERT, RoBERTa, Distil BERTA, and Electra in an NER framework obtaining the highest F1-score of 0.960. For topic modeling, we used the BERTopic library to discover the hidden topic patterns in the relevant tweets. The experimental results of all the components of the proposed end-to-end solution are very encouraging and hint at the potential of social media content and NLP in disaster management.

* 15 pages; 4 tables; 4 figures

Via

Access Paper or Ask Questions

Social Media and Artificial Intelligence for Sustainable Cities and Societies: A Water Quality Analysis Use-case

Apr 23, 2024

Muhammad Asif Auyb, Muhammad Tayyab Zamir, Imran Khan, Hannia Naseem, Nasir Ahmad, Kashif Ahmad

Figure 1 for Social Media and Artificial Intelligence for Sustainable Cities and Societies: A Water Quality Analysis Use-case

Figure 2 for Social Media and Artificial Intelligence for Sustainable Cities and Societies: A Water Quality Analysis Use-case

Figure 3 for Social Media and Artificial Intelligence for Sustainable Cities and Societies: A Water Quality Analysis Use-case

Figure 4 for Social Media and Artificial Intelligence for Sustainable Cities and Societies: A Water Quality Analysis Use-case

Abstract:This paper focuses on a very important societal challenge of water quality analysis. Being one of the key factors in the economic and social development of society, the provision of water and ensuring its quality has always remained one of the top priorities of public authorities. To ensure the quality of water, different methods for monitoring and assessing the water networks, such as offline and online surveys, are used. However, these surveys have several limitations, such as the limited number of participants and low frequency due to the labor involved in conducting such surveys. In this paper, we propose a Natural Language Processing (NLP) framework to automatically collect and analyze water-related posts from social media for data-driven decisions. The proposed framework is composed of two components, namely (i) text classification, and (ii) topic modeling. For text classification, we propose a merit-fusion-based framework incorporating several Large Language Models (LLMs) where different weight selection and optimization methods are employed to assign weights to the LLMs. In topic modeling, we employed the BERTopic library to discover the hidden topic patterns in the water-related tweets. We also analyzed relevant tweets originating from different regions and countries to explore global, regional, and country-specific issues and water-related concerns. We also collected and manually annotated a large-scale dataset, which is expected to facilitate future research on the topic.

* 11 pages, 6 figures, and 3 tables

Via

Access Paper or Ask Questions

Stylometry Analysis of Multi-authored Documents for Authorship and Author Style Change Detection

Jan 12, 2024

Muhammad Tayyab Zamir, Muhammad Asif Ayub, Asma Gul, Nasir Ahmad, Kashif Ahmad

Figure 1 for Stylometry Analysis of Multi-authored Documents for Authorship and Author Style Change Detection

Figure 2 for Stylometry Analysis of Multi-authored Documents for Authorship and Author Style Change Detection

Figure 3 for Stylometry Analysis of Multi-authored Documents for Authorship and Author Style Change Detection

Figure 4 for Stylometry Analysis of Multi-authored Documents for Authorship and Author Style Change Detection

Abstract:In recent years, the increasing use of Artificial Intelligence based text generation tools has posed new challenges in document provenance, authentication, and authorship detection. However, advancements in stylometry have provided opportunities for automatic authorship and author change detection in multi-authored documents using style analysis techniques. Style analysis can serve as a primary step toward document provenance and authentication through authorship detection. This paper investigates three key tasks of style analysis: (i) classification of single and multi-authored documents, (ii) single change detection, which involves identifying the point where the author switches, and (iii) multiple author-switching detection in multi-authored documents. We formulate all three tasks as classification problems and propose a merit-based fusion framework that integrates several state-of-the-art natural language processing (NLP) algorithms and weight optimization techniques. We also explore the potential of special characters, which are typically removed during pre-processing in NLP applications, on the performance of the proposed methods for these tasks by conducting extensive experiments on both cleaned and raw datasets. Experimental results demonstrate significant improvements over existing solutions for all three tasks on a benchmark dataset.

* Tables 7, pages 4;

Via

Access Paper or Ask Questions

Document Provenance and Authentication through Authorship Classification

Mar 02, 2023

Muhammad Tayyab Zamir, Muhammad Asif Ayub, Jebran Khan, Muhammad Jawad Ikram, Nasir Ahmad, Kashif Ahmad

Figure 1 for Document Provenance and Authentication through Authorship Classification

Figure 2 for Document Provenance and Authentication through Authorship Classification

Figure 3 for Document Provenance and Authentication through Authorship Classification

Figure 4 for Document Provenance and Authentication through Authorship Classification

Abstract:Style analysis, which is relatively a less explored topic, enables several interesting applications. For instance, it allows authors to adjust their writing style to produce a more coherent document in collaboration. Similarly, style analysis can also be used for document provenance and authentication as a primary step. In this paper, we propose an ensemble-based text-processing framework for the classification of single and multi-authored documents, which is one of the key tasks in style analysis. The proposed framework incorporates several state-of-the-art text classification algorithms including classical Machine Learning (ML) algorithms, transformers, and deep learning algorithms both individually and in merit-based late fusion. For the merit-based late fusion, we employed several weight optimization and selection methods to assign merit-based weights to the individual text classification algorithms. We also analyze the impact of the characters on the task that are usually excluded in NLP applications during pre-processing by conducting experiments on both clean and un-clean data. The proposed framework is evaluated on a large-scale benchmark dataset, significantly improving performance over the existing solutions.

* IEEE ICAISC 2023
* 7 pages; 3 tables; 1 figure

Via

Access Paper or Ask Questions