Alert button
Picture for Younes Karimi

Younes Karimi

Alert button

Automated Detection of Doxing on Twitter

Feb 02, 2022
Younes Karimi, Anna Squicciarini, Shomir Wilson

Figure 1 for Automated Detection of Doxing on Twitter
Figure 2 for Automated Detection of Doxing on Twitter
Figure 3 for Automated Detection of Doxing on Twitter
Figure 4 for Automated Detection of Doxing on Twitter

Doxing refers to the practice of disclosing sensitive personal information about a person without their consent. This form of cyberbullying is an unpleasant and sometimes dangerous phenomenon for online social networks. Although prior work exists on automated identification of other types of cyberbullying, a need exists for methods capable of detecting doxing on Twitter specifically. We propose and evaluate a set of approaches for automatically detecting second- and third-party disclosures on Twitter of sensitive private information, a subset of which constitutes doxing. We summarize our findings of common intentions behind doxing episodes and compare nine different approaches for automated detection based on string-matching and one-hot encoded heuristics, as well as word and contextualized string embedding representations of tweets. We identify an approach providing 96.86% accuracy and 97.37% recall using contextualized string embeddings and conclude by discussing the practicality of our proposed methods.

* 24 pages, 1 figure. Accepted in the 25th ACM Conference on Computer-Supported Cooperative Work and Social Computing (ACM CSCW 2022) 
Viaarxiv icon

A Longitudinal Dataset of Twitter ISIS Users

Feb 02, 2022
Younes Karimi, Anna Squicciarini, Peter K. Forster, Kira M. Leavitt

Figure 1 for A Longitudinal Dataset of Twitter ISIS Users
Figure 2 for A Longitudinal Dataset of Twitter ISIS Users
Figure 3 for A Longitudinal Dataset of Twitter ISIS Users
Figure 4 for A Longitudinal Dataset of Twitter ISIS Users

We present a large longitudinal dataset of tweets from two sets of users that are suspected to be affiliated with ISIS. These sets of users are identified based on a prior study and a campaign aimed at shutting down ISIS Twitter accounts. These users have engaged with known ISIS accounts at least once during 2014-2015 and are still active as of 2021. Some of them have directly supported the ISIS users and their tweets by retweeting them, and some of the users that have quoted tweets of ISIS, have uncertain connections to ISIS seed accounts. This study and the dataset represent a unique approach to analyzing ISIS data. Although much research exists on ISIS online activities, few studies have focused on individual accounts. Our approach to validating accounts as well as developing a framework for differentiating accounts' functionality (e.g., propaganda versus operational planning) offers a foundation for future research. We perform some descriptive statistics and preliminary analyses on our collected data to provide deeper insight and highlight the significance and practicality of such analyses. We further discuss several cross-disciplinary potential use cases and research directions.

* 10 pages, 7 figures; Submitted to the 16th International Conference on Web and Social Media (AAAI ICWSM-2022) 
Viaarxiv icon

The Panacea Threat Intelligence and Active Defense Platform

Apr 20, 2020
Adam Dalton, Ehsan Aghaei, Ehab Al-Shaer, Archna Bhatia, Esteban Castillo, Zhuo Cheng, Sreekar Dhaduvai, Qi Duan, Md Mazharul Islam, Younes Karimi, Amir Masoumzadeh, Brodie Mather, Sashank Santhanam, Samira Shaikh, Tomek Strzalkowski, Bonnie J. Dorr

Figure 1 for The Panacea Threat Intelligence and Active Defense Platform
Figure 2 for The Panacea Threat Intelligence and Active Defense Platform

We describe Panacea, a system that supports natural language processing (NLP) components for active defenses against social engineering attacks. We deploy a pipeline of human language technology, including Ask and Framing Detection, Named Entity Recognition, Dialogue Engineering, and Stylometry. Panacea processes modern message formats through a plug-in architecture to accommodate innovative approaches for message analysis, knowledge representation and dialogue generation. The novelty of the Panacea system is that uses NLP for cyber defense and engages the attacker using bots to elicit evidence to attribute to the attacker and to waste the attacker's time and resources.

* Accepted at STOC 
Viaarxiv icon