Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jugal Kalita

University of Colorado at Colorado Springs

Explaining Math Word Problem Solvers

Jul 24, 2023

Abby Newcomb, Jugal Kalita

Abstract:Automated math word problem solvers based on neural networks have successfully managed to obtain 70-80\% accuracy in solving arithmetic word problems. However, it has been shown that these solvers may rely on superficial patterns to obtain their equations. In order to determine what information math word problem solvers use to generate solutions, we remove parts of the input and measure the model's performance on the perturbed dataset. Our results show that the model is not sensitive to the removal of many words from the input and can still manage to find a correct answer when given a nonsense question. This indicates that automatic solvers do not follow the semantic logic of math word problems, and may be overfitting to the presence of specific words.

* Published in 6th International Conference on Natural Language Processing and Information Retrieval (NLPIR 2022)

Via

Access Paper or Ask Questions

Training-free Neural Architecture Search for RNNs and Transformers

Jun 01, 2023

Aaron Serianni, Jugal Kalita

Abstract:Neural architecture search (NAS) has allowed for the automatic creation of new and effective neural network architectures, offering an alternative to the laborious process of manually designing complex architectures. However, traditional NAS algorithms are slow and require immense amounts of computing power. Recent research has investigated training-free NAS metrics for image classification architectures, drastically speeding up search algorithms. In this paper, we investigate training-free NAS metrics for recurrent neural network (RNN) and BERT-based transformer architectures, targeted towards language modeling tasks. First, we develop a new training-free metric, named hidden covariance, that predicts the trained performance of an RNN architecture and significantly outperforms existing training-free metrics. We experimentally evaluate the effectiveness of the hidden covariance metric on the NAS-Bench-NLP benchmark. Second, we find that the current search space paradigm for transformer architectures is not optimized for training-free neural architecture search. Instead, a simple qualitative analysis can effectively shrink the search space to the best performing architectures. This conclusion is based on our investigation of existing training-free metrics and new metrics developed from recent transformer pruning literature, evaluated on our own benchmark of trained BERT architectures. Ultimately, our analysis shows that the architecture search space and the training-free metric must be developed together in order to achieve effective results.

* Code is available at https://github.com/aaronserianni/training-free-nas

Via

Access Paper or Ask Questions

Enhancing Translation for Indigenous Languages: Experiments with Multilingual Models

May 27, 2023

Atnafu Lambebo Tonja, Hellina Hailu Nigatu, Olga Kolesnikova, Grigori Sidorov, Alexander Gelbukh, Jugal Kalita

Figure 1 for Enhancing Translation for Indigenous Languages: Experiments with Multilingual Models

Figure 2 for Enhancing Translation for Indigenous Languages: Experiments with Multilingual Models

Figure 3 for Enhancing Translation for Indigenous Languages: Experiments with Multilingual Models

Abstract:This paper describes CIC NLP's submission to the AmericasNLP 2023 Shared Task on machine translation systems for indigenous languages of the Americas. We present the system descriptions for three methods. We used two multilingual models, namely M2M-100 and mBART50, and one bilingual (one-to-one) -- Helsinki NLP Spanish-English translation model, and experimented with different transfer learning setups. We experimented with 11 languages from America and report the setups we used as well as the results we achieved. Overall, the mBART setup was able to improve upon the baseline for three out of the eleven languages.

* Accepted to Third Workshop on NLP for Indigenous Languages of the Americas

Via

Access Paper or Ask Questions

Abstractive Text Summarization Using the BRIO Training Paradigm

May 23, 2023

Khang Nhut Lam, Thieu Gia Doan, Khang Thua Pham, Jugal Kalita

Abstract:Summary sentences produced by abstractive summarization models may be coherent and comprehensive, but they lack control and rely heavily on reference summaries. The BRIO training paradigm assumes a non-deterministic distribution to reduce the model's dependence on reference summaries, and improve model performance during inference. This paper presents a straightforward but effective technique to improve abstractive summaries by fine-tuning pre-trained language models, and training them with the BRIO paradigm. We build a text summarization dataset for Vietnamese, called VieSum. We perform experiments with abstractive summarization models trained with the BRIO paradigm on the CNNDM and the VieSum datasets. The results show that the models, trained on basic hardware, outperform all existing abstractive summarization models, especially for Vietnamese.

* Findings of the Association for Computational Linguistics: ACL 2023
* 6 pages, Findings of the Association for Computational Linguistics: ACL 2023

Via

Access Paper or Ask Questions

Spatiotemporal Transformer for Stock Movement Prediction

May 05, 2023

Daniel Boyle, Jugal Kalita

Abstract:Financial markets are an intriguing place that offer investors the potential to gain large profits if timed correctly. Unfortunately, the dynamic, non-linear nature of financial markets makes it extremely hard to predict future price movements. Within the US stock exchange, there are a countless number of factors that play a role in the price of a company's stock, including but not limited to financial statements, social and news sentiment, overall market sentiment, political happenings and trading psychology. Correlating these factors is virtually impossible for a human. Therefore, we propose STST, a novel approach using a Spatiotemporal Transformer-LSTM model for stock movement prediction. Our model obtains accuracies of 63.707 and 56.879 percent against the ACL18 and KDD17 datasets, respectively. In addition, our model was used in simulation to determine its real-life applicability. It obtained a minimum of 10.41% higher profit than the S&P500 stock index, with a minimum annualized return of 31.24%.

Via

Access Paper or Ask Questions

Utilizing Priming to Identify Optimal Class Ordering to Alleviate Catastrophic Forgetting

Dec 24, 2022

Gabriel Mantione-Holmes, Justin Leo, Jugal Kalita

Abstract:In order for artificial neural networks to begin accurately mimicking biological ones, they must be able to adapt to new exigencies without forgetting what they have learned from previous training. Lifelong learning approaches to artificial neural networks attempt to strive towards this goal, yet have not progressed far enough to be realistically deployed for natural language processing tasks. The proverbial roadblock of catastrophic forgetting still gate-keeps researchers from an adequate lifelong learning model. While efforts are being made to quell catastrophic forgetting, there is a lack of research that looks into the importance of class ordering when training on new classes for incremental learning. This is surprising as the ordering of "classes" that humans learn is heavily monitored and incredibly important. While heuristics to develop an ideal class order have been researched, this paper examines class ordering as it relates to priming as a scheme for incremental class learning. By examining the connections between various methods of priming found in humans and how those are mimicked yet remain unexplained in life-long machine learning, this paper provides a better understanding of the similarities between our biological systems and the synthetic systems while simultaneously improving current practices to combat catastrophic forgetting. Through the merging of psychological priming practices with class ordering, this paper is able to identify a generalizable method for class ordering in NLP incremental learning tasks that consistently outperforms random class ordering.

* Accepted to IEEE International Conference on Semantic Computing (ICSC) 2023

Via

Access Paper or Ask Questions

CAMeMBERT: Cascading Assistant-Mediated Multilingual BERT

Dec 22, 2022

Dan DeGenaro, Jugal Kalita

Abstract:Large language models having hundreds of millions, and even billions, of parameters have performed extremely well on a variety of natural language processing (NLP) tasks. Their widespread use and adoption, however, is hindered by the lack of availability and portability of sufficiently large computational resources. This paper proposes a knowledge distillation (KD) technique building on the work of LightMBERT, a student model of multilingual BERT (mBERT). By repeatedly distilling mBERT through increasingly compressed toplayer distilled teacher assistant networks, CAMeMBERT aims to improve upon the time and space complexities of mBERT while keeping loss of accuracy beneath an acceptable threshold. At present, CAMeMBERT has an average accuracy of around 60.1%, which is subject to change after future improvements to the hyperparameters used in fine-tuning.

* 4 pages, 2 figures, 3 tables

Via

Access Paper or Ask Questions

Facial Expression Recognition and Image Description Generation in Vietnamese

Aug 12, 2022

Khang Nhut Lam, Kim-Ngoc Thi Nguyen, Loc Huu Nguy, Jugal Kalita

Figure 1 for Facial Expression Recognition and Image Description Generation in Vietnamese

Figure 2 for Facial Expression Recognition and Image Description Generation in Vietnamese

Figure 3 for Facial Expression Recognition and Image Description Generation in Vietnamese

Figure 4 for Facial Expression Recognition and Image Description Generation in Vietnamese

Abstract:This paper discusses a facial expression recognition model and a description generation model to build descriptive sentences for images and facial expressions of people in images. Our study shows that YOLOv5 achieves better results than a traditional CNN for all emotions on the KDEF dataset. In particular, the accuracies of the CNN and YOLOv5 models for emotion recognition are 0.853 and 0.938, respectively. A model for generating descriptions for images based on a merged architecture is proposed using VGG16 with the descriptions encoded over an LSTM model. YOLOv5 is also used to recognize dominant colors of objects in the images and correct the color words in the descriptions generated if it is necessary. If the description contains words referring to a person, we recognize the emotion of the person in the image. Finally, we combine the results of all models to create sentences that describe the visual content and the human emotions in the images. Experimental results on the Flickr8k dataset in Vietnamese achieve BLEU-1, BLEU-2, BLEU-3, BLEU-4 scores of 0.628; 0.425; 0.280; and 0.174, respectively.

* Fuzzy Systems and Data Mining VII: Proceedings of FSDM 2021 340 (2021): 63
* 7 pages

Via

Access Paper or Ask Questions

Automatically Creating a Large Number of New Bilingual Dictionaries

Aug 12, 2022

Khang Nhut Lam, Feras Al Tarouti, Jugal Kalita

Figure 1 for Automatically Creating a Large Number of New Bilingual Dictionaries

Figure 2 for Automatically Creating a Large Number of New Bilingual Dictionaries

Figure 3 for Automatically Creating a Large Number of New Bilingual Dictionaries

Figure 4 for Automatically Creating a Large Number of New Bilingual Dictionaries

Abstract:This paper proposes approaches to automatically create a large number of new bilingual dictionaries for low-resource languages, especially resource-poor and endangered languages, from a single input bilingual dictionary. Our algorithms produce translations of words in a source language to plentiful target languages using available Wordnets and a machine translator (MT). Since our approaches rely on just one input dictionary, available Wordnets and an MT, they are applicable to any bilingual dictionary as long as one of the two languages is English or has a Wordnet linked to the Princeton Wordnet. Starting with 5 available bilingual dictionaries, we create 48 new bilingual dictionaries. Of these, 30 pairs of languages are not supported by the popular MTs: Google and Bing.

* Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29, no. 1. 2015
* 7 pages

Via

Access Paper or Ask Questions

Using Artificial Intelligence and IoT for Constructing a Smart Trash Bin

Aug 12, 2022

Khang Nhut Lam, Nguyen Hoang Huynh, Nguyen Bao Ngoc, To Thi Huynh Nhu, Nguyen Thanh Thao, Pham Hoang Hao, Vo Van Kiet, Bui Xuan Huynh, Jugal Kalita

Abstract:The research reported in this paper transforms a normal trash bin into a smarter one by applying computer vision technology. With the support of sensors and actuator devices, the trash bin can automatically classify garbage. In particular, a camera on the trash bin takes pictures of trash, then the central processing unit analyzes and makes decisions regarding which bin to drop trash into. The accuracy of our trash bin system achieves 90%. Besides, our model is connected to the Internet to update the bin status for further management. A mobile application is developed for managing the bin.

* International Conference on Future Data and Security Engineering, pp. 427-435. Springer, Singapore, 2021
* 8 pages

Via

Access Paper or Ask Questions