The surge of e-commerce reviews has presented a challenge in manually annotating the vast volume of reviews to comprehend their underlying aspects and sentiments. This research focused on leveraging weakly supervised learning to tackle aspect category learning and the sentiment classification of reviews. Our approach involves the generation of labels for both aspects and sentiments, employing the Snorkel framework of WSL, which incorporates aspect terms, review sentiment scores, and review ratings as sources of weak signals. This innovative strategy significantly reduces the laborious labeling efforts required for processing such extensive datasets. In this study, we deployed hybrid models, namely BiLSTM, CNN-BiLSTM, and CNN-LSTM, which harness multiple inputs, including review text, aspect terms, and ratings. Our proposed model employs two distinct loss functions: Binary Cross Entropy with Sigmoid Activation for Multi-Label Classification, enabling us to learn aspect Labels such as Quality, Usability, Service, Size, and Price, and Categorical Cross Entropy with Softmax Activations for Multi-Class Classification. Subsequently, we meticulously evaluate the performance metrics of these three implemented models, including Macro F1 score and Macro Precision. CNN & Bi-LSTM model attained 0.78 and 0.79 F1 scores on aspect and sentiment identification, respectively. The outcomes of this research are poised to make a substantial contribution to e-commerce platforms, offering an efficient and automated means to label and analyze vast troves of user reviews.
Time series forecasting has seen many methods attempted over the past few decades, including traditional technical analysis, algorithmic statistical models, and more recent machine learning and artificial intelligence approaches. Recently, neural networks have been incorporated into the forecasting scenario, such as the LSTM and conventional RNN approaches, which utilize short-term and long-term dependencies. This study evaluates traditional forecasting methods, such as ARIMA, SARIMA, and SARIMAX, and newer neural network approaches, such as DF-RNN, DSSM, and Deep AR, built using RNNs. The standard NIFTY-50 dataset from Kaggle is used to assess these models using metrics such as MSE, RMSE, MAPE, POCID, and Theil's U. Results show that Deep AR outperformed all other conventional deep learning and traditional approaches, with the lowest MAPE of 0.01 and RMSE of 189. Additionally, the performance of Deep AR and GRU did not degrade when the amount of training data was reduced, suggesting that these models may not require a large amount of data to achieve consistent and reliable performance. The study demonstrates that incorporating deep learning approaches in a forecasting scenario significantly outperforms conventional approaches and can handle complex datasets, with potential applications in various domains, such as weather predictions and other time series applications in a real-world scenario.
The present study aimed to address the issue of imbalanced data in classification tasks and evaluated the suitability of SMOTE, ADASYN, and GAN techniques in generating synthetic data to address the class imbalance and improve the performance of classification models in low-resource settings. The study employed the Generalised Linear Model (GLM) algorithm for class balancing experiments and the Random Forest (RF) algorithm for low-resource setting experiments to assess model performance under varying training data. The recall metric was the primary evaluation metric for all classification models. The results of the class balancing experiments showed that the GLM model trained on GAN-balanced data achieved the highest recall value. Similarly, in low-resource experiments, models trained on data enhanced with GAN-synthesized data exhibited better recall values than original data. These findings demonstrate the potential of GAN-generated synthetic data for addressing the challenge of imbalanced data in classification tasks and improving model performance in low-resource settings.
One of the most pressing challenges prevalent in the steel manufacturing industry is the identification of surface defects. Early identification of casting defects can help boost performance, including streamlining production processes. Though, deep learning models have helped bridge this gap and automate most of these processes, there is a dire need to come up with lightweight models that can be deployed easily with faster inference times. This research proposes a lightweight architecture that is efficient in terms of accuracy and inference time compared with sophisticated pre-trained CNN architectures like MobileNet, Inception, and ResNet, including vision transformers. Methodologies to minimize computational requirements such as depth-wise separable convolution and global average pooling (GAP) layer, including techniques that improve architectural efficiencies and augmentations, have been experimented. Our results indicate that a custom model of 590K parameters with depth-wise separable convolutions outperformed pretrained architectures such as Resnet and Vision transformers in terms of accuracy (81.87%) and comfortably outdid architectures such as Resnet, Inception, and Vision transformers in terms of faster inference times (12 ms). Blurpool fared outperformed other techniques, with an accuracy of 83.98%. Augmentations had a paradoxical effect on the model performance. No direct correlation between depth-wise and 3x3 convolutions on inference time, they, however, they played a direct role in improving model efficiency by enabling the networks to go deeper and by decreasing the number of trainable parameters. Our work sheds light on the fact that custom networks with efficient architectures and faster inference times can be built without the need of relying on pre-trained architectures.
Deep Learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval and more. Many techniques have evolved over the past decade that made models lighter, faster, and robust with better generalization. However, many deep learning practitioners persist with pre-trained models and architectures trained mostly on standard datasets such as Imagenet, MS-COCO, IMDB-Wiki Dataset, and Kinetics-700 and are either hesitant or unaware of redesigning the architecture from scratch that will lead to better performance. This scenario leads to inefficient models that are not suitable on various devices such as mobile, edge, and fog. In addition, these conventional training methods are of concern as they consume a lot of computing power. In this paper, we revisit various SOTA techniques that deal with architecture efficiency (Global Average Pooling, depth-wise convolutions & squeeze and excitation, Blurpool), learning rate (Cyclical Learning Rate), data augmentation (Mixup, Cutout), label manipulation (label smoothing), weight space manipulation (stochastic weight averaging), and optimizer (sharpness aware minimization). We demonstrate how an efficient deep convolution network can be built in a phased manner by sequentially reducing the number of training parameters and using the techniques mentioned above. We achieved a SOTA accuracy of 99.2% on MNIST data with just 1500 parameters and an accuracy of 86.01% with just over 140K parameters on the CIFAR-10 dataset.
In this paper, the fourth version the Sloan Digital Sky Survey (SDSS-4), Data Release 16 dataset was used to classify the SDSS dataset into galaxies, stars, and quasars using machine learning and deep learning architectures. We efficiently utilize both image and metadata in tabular format to build a novel multi-modal architecture and achieve state-of-the-art results. In addition, our experiments on transfer learning using Imagenet weights on five different architectures (Resnet-50, DenseNet-121 VGG-16, Xception, and EfficientNet) reveal that freezing all layers and adding a final trainable layer may not be an optimal solution for transfer learning. It is hypothesized that higher the number of trainable layers, higher will be the training time and accuracy of predictions. It is also hypothesized that any subsequent increase in the number of training layers towards the base layers will not increase in accuracy as the pre trained lower layers only help in low level feature extraction which would be quite similar in all the datasets. Hence the ideal level of trainable layers needs to be identified for each model in respect to the number of parameters. For the tabular data, we compared classical machine learning algorithms (Logistic Regression, Random Forest, Decision Trees, Adaboost, LightGBM etc.,) with artificial neural networks. Our works shed new light on transfer learning and multi-modal deep learning architectures. The multi-modal architecture not only resulted in higher metrics (accuracy, precision, recall, F1 score) than models using only image data or tabular data. Furthermore, multi-modal architecture achieved the best metrics in lesser training epochs and improved the metrics on all classes.
Foreign Exchange (FOREX) is a decentralised global market for exchanging currencies. The Forex market is enormous, and it operates 24 hours a day. Along with country-specific factors, Forex trading is influenced by cross-country ties and a variety of global events. Recent pandemic scenarios such as COVID19 and local elections can also have a significant impact on market pricing. We tested and compared various predictions with external elements such as news items in this work. Additionally, we compared classical machine learning methods to deep learning algorithms. We also added sentiment features from news headlines using NLP-based word embeddings and compared the performance. Our results indicate that simple regression model like linear, SGD, and Bagged performed better than deep learning models such as LSTM and RNN for single-step forecasting like the next two hours, the next day, and seven days. Surprisingly, news articles failed to improve the predictions indicating domain-based and relevant information only adds value. Among the text vectorization techniques, Word2Vec and SentenceBERT perform better.
Customers' reviews and comments are important for businesses to understand users' sentiment about the products and services. However, this data needs to be analyzed to assess the sentiment associated with topics/aspects to provide efficient customer assistance. LDA and LSA fail to capture the semantic relationship and are not specific to any domain. In this study, we evaluate BERTopic, a novel method that generates topics using sentence embeddings on Consumer Financial Protection Bureau (CFPB) data. Our work shows that BERTopic is flexible and yet provides meaningful and diverse topics compared to LDA and LSA. Furthermore, domain-specific pre-trained embeddings (FinBERT) yield even better topics. We evaluated the topics on coherence score (c_v) and UMass.
The SDSS-IV dataset contains information about various astronomical bodies such as Galaxies, Stars, and Quasars captured by observatories. Inspired by our work on deep multimodal learning, which utilized transfer learning to classify the SDSS-IV dataset, we further extended our research in the fine tuning of these architectures to study the effect in the classification scenario. Architectures such as Resnet-50, DenseNet-121 VGG-16, Xception, EfficientNetB2, MobileNetV2 and NasnetMobile have been built using layer wise fine tuning at different levels. Our findings suggest that freezing all layers with Imagenet weights and adding a final trainable layer may not be the optimal solution. Further, baseline models and models that have higher number of trainable layers performed similarly in certain architectures. Model need to be fine tuned at different levels and a specific training ratio is required for a model to be termed ideal. Different architectures had different responses to the change in the number of trainable layers w.r.t accuracies. While models such as DenseNet-121, Xception, EfficientNetB2 achieved peak accuracies that were relatively consistent with near perfect training curves, models such as Resnet-50,VGG-16, MobileNetV2 and NasnetMobile had lower, delayed peak accuracies with poorly fitting training curves. It was also found that though mobile neural networks have lesser parameters and model size, they may not always be ideal for deployment on a low computational device as they had consistently lower validation accuracies. Customized evaluation metrics such as Tuning Parameter Ratio and Tuning Layer Ratio are used for model evaluation.
Facial landmark detection is a widely researched field of deep learning as this has a wide range of applications in many fields. These key points are distinguishing characteristic points on the face, such as the eyes center, the eye's inner and outer corners, the mouth center, and the nose tip from which human emotions and intent can be explained. The focus of our work has been evaluating transfer learning models such as MobileNetV2 and NasNetMobile, including custom CNN architectures. The objective of the research has been to develop efficient deep learning models in terms of model size, parameters, and inference time and to study the effect of augmentation imputation and fine-tuning on these models. It was found that while augmentation techniques produced lower RMSE scores than imputation techniques, they did not affect the inference time. MobileNetV2 architecture produced the lowest RMSE and inference time. Moreover, our results indicate that manually optimized CNN architectures performed similarly to Auto Keras tuned architecture. However, manually optimized architectures yielded better inference time and training curves.