Abstract:Managing the configurations of a database system poses significant challenges due to the multitude of configuration knobs that impact various system aspects.The lack of standardization, independence, and universality among these knobs further complicates the task of determining the optimal settings.To address this issue, an automated solution leveraging supervised and unsupervised machine learning techniques was developed.This solution aims to identify influential knobs, analyze previously unseen workloads, and provide recommendations for knob settings.The effectiveness of this approach is demonstrated through the evaluation of a new tool called OtterTune [1] on three different database management systems (DBMSs).The results indicate that OtterTune's recommendations are comparable to or even surpass the configurations generated by existing tools or human experts.In this study, we build upon the automated technique introduced in the original OtterTune paper, utilizing previously collected training data to optimize new DBMS deployments.By employing supervised and unsupervised machine learning methods, we focus on improving latency prediction.Our approach expands upon the methods proposed in the original paper by incorporating GMM clustering to streamline metrics selection and combining ensemble models (such as RandomForest) with non-linear models (like neural networks) for more accurate prediction modeling.
Abstract:Lung cancer poses a significant global public health challenge, emphasizing the importance of early detection for improved patient outcomes. Recent advancements in deep learning algorithms have shown promising results in medical image analysis. This study aims to explore the application of object detection particularly YOLOv5, an advanced object identification system, in medical imaging for lung cancer identification. To train and evaluate the algorithm, a dataset comprising chest X-rays and corresponding annotations was obtained from Kaggle. The YOLOv5 model was employed to train an algorithm capable of detecting cancerous lung lesions. The training process involved optimizing hyperparameters and utilizing augmentation techniques to enhance the model's performance. The trained YOLOv5 model exhibited exceptional proficiency in identifying lung cancer lesions, displaying high accuracy and recall rates. It successfully pinpointed malignant areas in chest radiographs, as validated by a separate test set where it outperformed previous techniques. Additionally, the YOLOv5 model demonstrated computational efficiency, enabling real-time detection and making it suitable for integration into clinical procedures. This proposed approach holds promise in assisting radiologists in the early discovery and diagnosis of lung cancer, ultimately leading to prompt treatment and improved patient outcomes.
Abstract:Sentiment analysis (SA) is the automated process of detecting and understanding the emotions conveyed through written text. Over the past decade, SA has gained significant popularity in the field of Natural Language Processing (NLP). With the widespread use of social media and online platforms, SA has become crucial for companies to gather customer feedback and shape their marketing strategies. Additionally, researchers rely on SA to analyze public sentiment on various topics. In this particular research study, a comprehensive survey was conducted to explore the latest trends and techniques in SA. The survey encompassed a wide range of methods, including lexicon-based, graph-based, network-based, machine learning, deep learning, ensemble-based, rule-based, and hybrid techniques. The paper also addresses the challenges and opportunities in SA, such as dealing with sarcasm and irony, analyzing multi-lingual data, and addressing ethical concerns. To provide a practical case study, Twitter was chosen as one of the largest online social media platforms. Furthermore, the researchers shed light on the diverse application areas of SA, including social media, healthcare, marketing, finance, and politics. The paper also presents a comparative and comprehensive analysis of existing trends and techniques, datasets, and evaluation metrics. The ultimate goal is to offer researchers and practitioners a systematic review of SA techniques, identify existing gaps, and suggest possible improvements. This study aims to enhance the efficiency and accuracy of SA processes, leading to smoother and error-free outcomes.
Abstract:The management of database system configurations is a challenging task, as there are hundreds of configuration knobs that control every aspect of the system. This is complicated by the fact that these knobs are not standardized, independent, or universal, making it difficult to determine optimal settings. An automated approach to address this problem using supervised and unsupervised machine learning methods to select impactful knobs, map unseen workloads, and recommend knob settings was implemented in a new tool called OtterTune and is being evaluated on three DBMSs, with results demonstrating that it recommends configurations as good as or better than those generated by existing tools or a human expert.In this work, we extend an automated technique based on Ottertune [1] to reuse training data gathered from previous sessions to tune new DBMS deployments with the help of supervised and unsupervised machine learning methods to improve latency prediction. Our approach involves the expansion of the methods proposed in the original paper. We use GMM clustering to prune metrics and combine ensemble models, such as RandomForest, with non-linear models, like neural networks, for prediction modeling.
Abstract:Occlusions of objects is one of the indispensable problems in Computer vision. While Convolutional Neural Net-works (CNNs) provide various state of the art approaches for regular image classification, they however, prove to be not as effective for the classification of images with partial occlusions. Partial occlusion is scenario where an object is occluded partially by some other object/space. This problem when solved,holds tremendous potential to facilitate various scenarios. We in particular are interested in autonomous driving scenario and its implications in the same. Autonomous vehicle research is one of the hot topics of this decade, there are ample situations of partial occlusions of a driving sign or a person or other objects at different angles. Considering its prime importance in situations which can be further extended to video analytics of traffic data to handle crimes, anticipate income levels of various groups etc.,this holds the potential to be exploited in many ways. In this paper, we introduce our own synthetically created dataset by utilising Stanford Car Dataset and adding occlusions of various sizes and nature to it. On this created dataset, we conducted a comprehensive analysis using various state of the art CNN models such as VGG-19, ResNet 50/101, GoogleNet, DenseNet 121. We further in depth study the effect of varying occlusion proportions and nature on the performance of these models by fine tuning and training these from scratch on dataset and how is it likely to perform when trained in different scenarios, i.e., performance when training with occluded images and unoccluded images, which model is more robust to partial occlusions and soon.
Abstract:The task of predicting the publication period of text documents, such as news articles, is an important but less studied problem in the field of natural language processing. Predicting the year of a news article can be useful in various contexts, such as historical research, sentiment analysis, and media monitoring. In this work, we investigate the problem of predicting the publication period of a text document, specifically a news article, based on its textual content. In order to do so, we created our own extensive labeled dataset of over 350,000 news articles published by The New York Times over six decades. In our approach, we use a pretrained BERT model fine-tuned for the task of text classification, specifically for time period prediction.This model exceeds our expectations and provides some very impressive results in terms of accurately classifying news articles into their respective publication decades. The results beat the performance of the baseline model for this relatively unexplored task of time prediction from text.
Abstract:For years, Single Image Super Resolution (SISR) has been an interesting and ill-posed problem in computer vision. The traditional super-resolution (SR) imaging approaches involve interpolation, reconstruction, and learning-based methods. Interpolation methods are fast and uncomplicated to compute, but they are not so accurate and reliable. Reconstruction-based methods are better compared with interpolation methods, but they are time-consuming and the quality degrades as the scaling increases. Even though learning-based methods like Markov random chains are far better than all the previous ones, they are unable to match the performance of deep learning models for SISR. This study examines the Residual Dense Networks architecture proposed by Yhang et al. [17] and analyzes the importance of its components. By leveraging hierarchical features from original low-resolution (LR) images, this architecture achieves superior performance, with a network structure comprising four main blocks, including the residual dense block (RDB) as the core. Through investigations of each block and analyses using various loss metrics, the study evaluates the effectiveness of the architecture and compares it to other state-of-the-art models that differ in both architecture and components.