Alert button
Picture for Ahmed El-Kishky

Ahmed El-Kishky

Alert button

NTULM: Enriching Social Media Text Representations with Non-Textual Units

Oct 29, 2022
Jinning Li, Shubhanshu Mishra, Ahmed El-Kishky, Sneha Mehta, Vivek Kulkarni

Figure 1 for NTULM: Enriching Social Media Text Representations with Non-Textual Units
Figure 2 for NTULM: Enriching Social Media Text Representations with Non-Textual Units
Figure 3 for NTULM: Enriching Social Media Text Representations with Non-Textual Units
Figure 4 for NTULM: Enriching Social Media Text Representations with Non-Textual Units

On social media, additional context is often present in the form of annotations and meta-data such as the post's author, mentions, Hashtags, and hyperlinks. We refer to these annotations as Non-Textual Units (NTUs). We posit that NTUs provide social context beyond their textual semantics and leveraging these units can enrich social media text representations. In this work we construct an NTU-centric social heterogeneous network to co-embed NTUs. We then principally integrate these NTU embeddings into a large pretrained language model by fine-tuning with these additional units. This adds context to noisy short-text social media. Experiments show that utilizing NTU-augmented text representations significantly outperforms existing text-only baselines by 2-5\% relative points on many downstream tasks highlighting the importance of context to social media NLP. We also highlight that including NTU context into the initial layers of language model alongside text is better than using it after the text embedding is generated. Our work leads to the generation of holistic general purpose social media content embedding.

* 14 pages, 5 figures, Accepted to the Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). URL: https://aclanthology.org/2022.wnut-1.7/ 
Viaarxiv icon

MiCRO: Multi-interest Candidate Retrieval Online

Oct 28, 2022
Frank Portman, Stephen Ragain, Ahmed El-Kishky

Figure 1 for MiCRO: Multi-interest Candidate Retrieval Online
Figure 2 for MiCRO: Multi-interest Candidate Retrieval Online
Figure 3 for MiCRO: Multi-interest Candidate Retrieval Online
Figure 4 for MiCRO: Multi-interest Candidate Retrieval Online

Providing personalized recommendations in an environment where items exhibit ephemerality and temporal relevancy (e.g. in social media) presents a few unique challenges: (1) inductively understanding ephemeral appeal for items in a setting where new items are created frequently, (2) adapting to trends within engagement patterns where items may undergo temporal shifts in relevance, (3) accurately modeling user preferences over this item space where users may express multiple interests. In this work we introduce MiCRO, a generative statistical framework that models multi-interest user preferences and temporal multi-interest item representations. Our framework is specifically formulated to adapt to both new items and temporal patterns of engagement. MiCRO demonstrates strong empirical performance on candidate retrieval experiments performed on two large scale user-item datasets: (1) an open-source temporal dataset of (User, User) follow interactions and (2) a temporal dataset of (User, Tweet) favorite interactions which we will open-source as an additional contribution to the community.

* Preprint 
Viaarxiv icon

TwHIN-BERT: A Socially-Enriched Pre-trained Language Model for Multilingual Tweet Representations

Sep 15, 2022
Xinyang Zhang, Yury Malkov, Omar Florez, Serim Park, Brian McWilliams, Jiawei Han, Ahmed El-Kishky

Figure 1 for TwHIN-BERT: A Socially-Enriched Pre-trained Language Model for Multilingual Tweet Representations
Figure 2 for TwHIN-BERT: A Socially-Enriched Pre-trained Language Model for Multilingual Tweet Representations
Figure 3 for TwHIN-BERT: A Socially-Enriched Pre-trained Language Model for Multilingual Tweet Representations
Figure 4 for TwHIN-BERT: A Socially-Enriched Pre-trained Language Model for Multilingual Tweet Representations

We present TwHIN-BERT, a multilingual language model trained on in-domain data from the popular social network Twitter. TwHIN-BERT differs from prior pre-trained language models as it is trained with not only text-based self-supervision, but also with a social objective based on the rich social engagements within a Twitter heterogeneous information network (TwHIN). Our model is trained on 7 billion tweets covering over 100 distinct languages providing a valuable representation to model short, noisy, user-generated text. We evaluate our model on a variety of multilingual social recommendation and semantic understanding tasks and demonstrate significant metric improvement over established pre-trained language models. We will freely open-source TwHIN-BERT and our curated hashtag prediction and social engagement benchmark datasets to the research community.

Viaarxiv icon

Non-Parametric Temporal Adaptation for Social Media Topic Classification

Sep 13, 2022
Fatemehsadat Mireshghallah, Nikolai Vogler, Junxian He, Omar Florez, Ahmed El-Kishky, Taylor Berg-Kirkpatrick

Figure 1 for Non-Parametric Temporal Adaptation for Social Media Topic Classification
Figure 2 for Non-Parametric Temporal Adaptation for Social Media Topic Classification
Figure 3 for Non-Parametric Temporal Adaptation for Social Media Topic Classification
Figure 4 for Non-Parametric Temporal Adaptation for Social Media Topic Classification

User-generated social media data is constantly changing as new trends influence online discussion, causing distribution shift in test data for social media NLP applications. In addition, training data is often subject to change as user data is deleted. Most current NLP systems are static and rely on fixed training data. As a result, they are unable to adapt to temporal change -- both test distribution shift and deleted training data -- without frequent, costly re-training. In this paper, we study temporal adaptation through the task of longitudinal hashtag prediction and propose a non-parametric technique as a simple but effective solution: non-parametric classifiers use datastores which can be updated, either to adapt to test distribution shift or training data deletion, without re-training. We release a new benchmark dataset comprised of 7.13M Tweets from 2021, along with their hashtags, broken into consecutive temporal buckets. We compare parametric neural hashtag classification and hashtag generation models, which need re-training for adaptation, with a non-parametric, training-free dense retrieval method that returns the nearest neighbor's hashtags based on text embedding distance. In experiments on our longitudinal Twitter dataset we find that dense nearest neighbor retrieval has a relative performance gain of 64.12% over the best parametric baseline on test sets that exhibit distribution shift without requiring gradient-based re-training. Furthermore, we show that our datastore approach is particularly well-suited to dynamically deleted user data, with negligible computational cost and performance loss. Our novel benchmark dataset and empirical analysis can support future inquiry into the important challenges presented by temporality in the deployment of AI systems on real-world user data.

Viaarxiv icon

kNN-Embed: Locally Smoothed Embedding Mixtures For Multi-interest Candidate Retrieval

May 13, 2022
Ahmed El-Kishky, Thomas Markovich, Kenny Leung, Frank Portman, Aria Haghighi, Ying Xiao

Figure 1 for kNN-Embed: Locally Smoothed Embedding Mixtures For Multi-interest Candidate Retrieval
Figure 2 for kNN-Embed: Locally Smoothed Embedding Mixtures For Multi-interest Candidate Retrieval
Figure 3 for kNN-Embed: Locally Smoothed Embedding Mixtures For Multi-interest Candidate Retrieval
Figure 4 for kNN-Embed: Locally Smoothed Embedding Mixtures For Multi-interest Candidate Retrieval

Candidate generation is the first stage in recommendation systems, where a light-weight system is used to retrieve potentially relevant items for an input user. These candidate items are then ranked and pruned in later stages of recommender systems using a more complex ranking model. Since candidate generation is the top of the recommendation funnel, it is important to retrieve a high-recall candidate set to feed into downstream ranking models. A common approach for candidate generation is to leverage approximate nearest neighbor (ANN) search from a single dense query embedding; however, this approach this can yield a low-diversity result set with many near duplicates. As users often have multiple interests, candidate retrieval should ideally return a diverse set of candidates reflective of the user's multiple interests. To this end, we introduce kNN-Embed, a general approach to improving diversity in dense ANN-based retrieval. kNN-Embed represents each user as a smoothed mixture over learned item clusters that represent distinct `interests' of the user. By querying each of a user's mixture component in proportion to their mixture weights, we retrieve a high-diversity set of candidates reflecting elements from each of a user's interests. We experimentally compare kNN-Embed to standard ANN candidate retrieval, and show significant improvements in overall recall and improved diversity across three datasets. Accompanying this work, we open source a large Twitter follow-graph dataset, to spur further research in graph-mining and representation learning for recommender systems.

Viaarxiv icon

Learning Stance Embeddings from Signed Social Graphs

Jan 27, 2022
John Pougué-Biyong, Akshay Gupta, Aria Haghighi, Ahmed El-Kishky

Figure 1 for Learning Stance Embeddings from Signed Social Graphs
Figure 2 for Learning Stance Embeddings from Signed Social Graphs
Figure 3 for Learning Stance Embeddings from Signed Social Graphs
Figure 4 for Learning Stance Embeddings from Signed Social Graphs

A key challenge in social network analysis is understanding the position, or stance, of people in the graph on a large set of topics. While past work has modeled (dis)agreement in social networks using signed graphs, these approaches have not modeled agreement patterns across a range of correlated topics. For instance, disagreement on one topic may make disagreement(or agreement) more likely for related topics. We propose the Stance Embeddings Model(SEM), which jointly learns embeddings for each user and topic in signed social graphs with distinct edge types for each topic. By jointly learning user and topic embeddings, SEM is able to perform cold-start topic stance detection, predicting the stance of a user on topics for which we have not observed their engagement. We demonstrate the effectiveness of SEM using two large-scale Twitter signed graph datasets we open-source. One dataset, TwitterSG, labels (dis)agreements using engagements between users via tweets to derive topic-informed, signed edges. The other, BirdwatchSG, leverages community reports on misinformation and misleading content. On TwitterSG and BirdwatchSG, SEM shows a 39% and 26% error reduction respectively against strong baselines.

Viaarxiv icon

Classification-based Quality Estimation: Small and Efficient Models for Real-world Applications

Sep 17, 2021
Shuo Sun, Ahmed El-Kishky, Vishrav Chaudhary, James Cross, Francisco Guzmán, Lucia Specia

Figure 1 for Classification-based Quality Estimation: Small and Efficient Models for Real-world Applications
Figure 2 for Classification-based Quality Estimation: Small and Efficient Models for Real-world Applications
Figure 3 for Classification-based Quality Estimation: Small and Efficient Models for Real-world Applications
Figure 4 for Classification-based Quality Estimation: Small and Efficient Models for Real-world Applications

Sentence-level Quality estimation (QE) of machine translation is traditionally formulated as a regression task, and the performance of QE models is typically measured by Pearson correlation with human labels. Recent QE models have achieved previously-unseen levels of correlation with human judgments, but they rely on large multilingual contextualized language models that are computationally expensive and make them infeasible for real-world applications. In this work, we evaluate several model compression techniques for QE and find that, despite their popularity in other NLP tasks, they lead to poor performance in this regression setting. We observe that a full model parameterization is required to achieve SoTA results in a regression task. However, we argue that the level of expressiveness of a model in a continuous range is unnecessary given the downstream applications of QE, and show that reframing QE as a classification problem and evaluating QE models using classification metrics would better reflect their actual performance in real-world applications.

* EMNLP 2021 
Viaarxiv icon

As Easy as 1, 2, 3: Behavioural Testing of NMT Systems for Numerical Translation

Jul 18, 2021
Jun Wang, Chang Xu, Francisco Guzman, Ahmed El-Kishky, Benjamin I. P. Rubinstein, Trevor Cohn

Figure 1 for As Easy as 1, 2, 3: Behavioural Testing of NMT Systems for Numerical Translation
Figure 2 for As Easy as 1, 2, 3: Behavioural Testing of NMT Systems for Numerical Translation
Figure 3 for As Easy as 1, 2, 3: Behavioural Testing of NMT Systems for Numerical Translation
Figure 4 for As Easy as 1, 2, 3: Behavioural Testing of NMT Systems for Numerical Translation

Mistranslated numbers have the potential to cause serious effects, such as financial loss or medical misinformation. In this work we develop comprehensive assessments of the robustness of neural machine translation systems to numerical text via behavioural testing. We explore a variety of numerical translation capabilities a system is expected to exhibit and design effective test examples to expose system underperformance. We find that numerical mistranslation is a general issue: major commercial systems and state-of-the-art research models fail on many of our test examples, for high- and low-resource languages. Our tests reveal novel errors that have not previously been reported in NMT systems, to the best of our knowledge. Lastly, we discuss strategies to mitigate numerical mistranslation.

* Findings of ACL, to appear 
Viaarxiv icon

Putting words into the system's mouth: A targeted attack on neural machine translation using monolingual data poisoning

Jul 12, 2021
Jun Wang, Chang Xu, Francisco Guzman, Ahmed El-Kishky, Yuqing Tang, Benjamin I. P. Rubinstein, Trevor Cohn

Figure 1 for Putting words into the system's mouth: A targeted attack on neural machine translation using monolingual data poisoning
Figure 2 for Putting words into the system's mouth: A targeted attack on neural machine translation using monolingual data poisoning
Figure 3 for Putting words into the system's mouth: A targeted attack on neural machine translation using monolingual data poisoning
Figure 4 for Putting words into the system's mouth: A targeted attack on neural machine translation using monolingual data poisoning

Neural machine translation systems are known to be vulnerable to adversarial test inputs, however, as we show in this paper, these systems are also vulnerable to training attacks. Specifically, we propose a poisoning attack in which a malicious adversary inserts a small poisoned sample of monolingual text into the training set of a system trained using back-translation. This sample is designed to induce a specific, targeted translation behaviour, such as peddling misinformation. We present two methods for crafting poisoned examples, and show that only a tiny handful of instances, amounting to only 0.02% of the training set, is sufficient to enact a successful attack. We outline a defence method against said attacks, which partly ameliorates the problem. However, we stress that this is a blind-spot in modern NMT, demanding immediate attention.

* Findings of ACL, to appear 
Viaarxiv icon

Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data

Jun 02, 2021
Wei-Jen Ko, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Naman Goyal, Francisco Guzmán, Pascale Fung, Philipp Koehn, Mona Diab

Figure 1 for Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data
Figure 2 for Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data
Figure 3 for Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data
Figure 4 for Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data

The scarcity of parallel data is a major obstacle for training high-quality machine translation systems for low-resource languages. Fortunately, some low-resource languages are linguistically related or similar to high-resource languages; these related languages may share many lexical or syntactic structures. In this work, we exploit this linguistic overlap to facilitate translating to and from a low-resource language with only monolingual data, in addition to any parallel data in the related high-resource language. Our method, NMT-Adapt, combines denoising autoencoding, back-translation and adversarial objectives to utilize monolingual data for low-resource adaptation. We experiment on 7 languages from three different language families and show that our technique significantly improves translation into low-resource language compared to other translation baselines.

* ACL 2021 
Viaarxiv icon