Advancements in sign language processing have been hindered by a lack of sufficient data, impeding progress in recognition, translation, and production tasks. The absence of comprehensive sign language datasets across the world's sign languages has widened the gap in this field, resulting in a few sign languages being studied more than others, making this research area extremely skewed mostly towards sign languages from high-income countries. In this work we introduce a new large and highly multilingual dataset for sign language translation: JWSign. The dataset consists of 2,530 hours of Bible translations in 98 sign languages, featuring more than 1,500 individual signers. On this dataset, we report neural machine translation experiments. Apart from bilingual baseline systems, we also train multilingual systems, including some that take into account the typological relatedness of signed or spoken languages. Our experiments highlight that multilingual systems are superior to bilingual baselines, and that in higher-resource scenarios, clustering language pairs that are related improves translation quality.
On December 7, 2020, Ghanaians participated in the polls to determine their president for the next four years. To gain insights from this presidential election, we conducted stance analysis (which is not always equivalent to sentiment analysis) to understand how Twitter, a popular social media platform, reflected the opinions of its users regarding the two main presidential candidates. We collected a total of 99,356 tweets using the Twitter API (Tweepy) and manually annotated 3,090 tweets into three classes: Against, Neutral, and Support. We then performed preprocessing on the tweets. The resulting dataset was evaluated using two lexicon-based approaches, VADER and TextBlob, as well as five supervised machine learning-based approaches: Support Vector Machine (SVM), Logistic Regression (LR), Multinomial Na\"ive Bayes (MNB), Stochastic Gradient Descent (SGD), and Random Forest (RF), based on metrics such as accuracy, precision, recall, and F1-score. The best performance was achieved by Logistic Regression with an accuracy of 71.13%. We utilized Logistic Regression to classify all the extracted tweets and subsequently conducted an analysis and discussion of the results. For access to our data and code, please visit: https://github.com/ShesterG/Stance-Detection-Ghana-2020-Elections.git