The multimodal task of Visual Question Answering (VQA) encompassing elements of Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers to questions on any visual input. Over time, the scope of VQA has expanded from datasets focusing on an extensive collection of natural images to datasets featuring synthetic images, video, 3D environments, and various other visual inputs. The emergence of large pre-trained networks has shifted the early VQA approaches relying on feature extraction and fusion schemes to vision language pre-training (VLP) techniques. However, there is a lack of comprehensive surveys that encompass both traditional VQA architectures and contemporary VLP-based methods. Furthermore, the VLP challenges in the lens of VQA haven't been thoroughly explored, leaving room for potential open problems to emerge. Our work presents a survey in the domain of VQA that delves into the intricacies of VQA datasets and methods over the field's history, introduces a detailed taxonomy to categorize the facets of VQA, and highlights the recent trends, challenges, and scopes for improvement. We further generalize VQA to multimodal question answering, explore tasks related to VQA, and present a set of open problems for future investigation. The work aims to navigate both beginners and experts by shedding light on the potential avenues of research and expanding the boundaries of the field.
Pneumonia is one of the foremost lung diseases and untreated pneumonia will lead to serious threats for all age groups. The proposed work aims to extract and evaluate the Coronavirus disease (COVID-19) caused pneumonia infection in lung using CT scans. We propose an image-assisted system to extract COVID-19 infected sections from lung CT scans (coronal view). It includes following steps: (i) Threshold filter to extract the lung region by eliminating possible artifacts; (ii) Image enhancement using Harmony-Search-Optimization and Otsu thresholding; (iii) Image segmentation to extract infected region(s); and (iv) Region-of-interest (ROI) extraction (features) from binary image to compute level of severity. The features that are extracted from ROI are then employed to identify the pixel ratio between the lung and infection sections to identify infection level of severity. The primary objective of the tool is to assist the pulmonologist not only to detect but also to help plan treatment process. As a consequence, for mass screening processing, it will help prevent diagnostic burden.
In the advent of the novel coronavirus epidemic since December 2019, governments and authorities have been struggling to make critical decisions under high uncertainty at their best efforts. Composite Monte-Carlo (CMC) simulation is a forecasting method which extrapolates available data which are broken down from multiple correlated/casual micro-data sources into many possible future outcomes by drawing random samples from some probability distributions. For instance, the overall trend and propagation of the infested cases in China are influenced by the temporal-spatial data of the nearby cities around the Wuhan city (where the virus is originated from), in terms of the population density, travel mobility, medical resources such as hospital beds and the timeliness of quarantine control in each city etc. Hence a CMC is reliable only up to the closeness of the underlying statistical distribution of a CMC, that is supposed to represent the behaviour of the future events, and the correctness of the composite data relationships. In this paper, a case study of using CMC that is enhanced by deep learning network and fuzzy rule induction for gaining better stochastic insights about the epidemic development is experimented. Instead of applying simplistic and uniform assumptions for a MC which is a common practice, a deep learning-based CMC is used in conjunction of fuzzy rule induction techniques. As a result, decision makers are benefited from a better fitted MC outputs complemented by min-max rules that foretell about the extreme ranges of future possibilities with respect to the epidemic.
Image Segmentation is a technique of partitioning the original image into some distinct classes. Many possible solutions may be available for segmenting an image into a certain number of classes, each one having different quality of segmentation. In our proposed method, multilevel thresholding technique has been used for image segmentation. A new approach of Cuckoo Search (CS) is used for selection of optimal threshold value. In other words, the algorithm is used to achieve the best solution from the initial random threshold values or solutions and to evaluate the quality of a solution correlation function is used. Finally, MSE and PSNR are measured to understand the segmentation quality.
In medical field, intravascular ultrasound (IVUS) is a tomographic imaging modality, which can identify the boundaries of different layers of blood vessels. IVUS can detect myocardial infarction (heart attack) that remains ignored and unattended when only angioplasty is done. During the past decade, it became easier for some individuals or groups to copy and transmits digital information without the permission of the owner. For increasing authentication and security of copyrights, digital watermarking, an information hiding technique, was introduced. Achieving watermarking technique with lesser amount of distortion in biomedical data is a challenging task. Watermark can be embedded into an image or in a video. As video data is a huge amount of information, therefore a large storage area is needed which is not feasible. In this case motion vector based video compression is done to reduce size. In this present paper, an Electronic Patient Record (EPR) is embedded as watermark within an IVUS video and then motion vector is calculated. This proposed method proves robustness as the extracted watermark has good PSNR value and less MSE.
In the past few years, like other fields, rapid expansion of digitization and globalization has influenced the medical field as well. For progress of diagnostic results most of the reputed hospitals and diagnostic centres all over the world have started exchanging medical information. In this proposed method, the calculated diagnostic parametric values of the original Electrooculography (EOG) signal are embedded as a watermark by using Difference Expansion (DE) algorithm based reversible watermarking technique. The extracted watermark provides the required parametric values at the recipient end without any post computation of the recovered EOG signal. By computing the parametric values from the recovered signal, the integrity of the extracted watermark can be validated. The time domain features of EOG signal are calculated for the generation of watermark. In the current work, various features are studied and two major features related to blink frequency are used to generate the watermark. The high Signal to Noise Ratio (SNR) and the Bit Error Rate (BER) claim the robustness of the proposed method.
In this paper we propose a method of corner detection for obtaining features which is required to track and recognize objects within a noisy image. Corner detection of noisy images is a challenging task in image processing. Natural images often get corrupted by noise during acquisition and transmission. Though Corner detection of these noisy images does not provide desired results, hence de-noising is required. Adaptive wavelet thresholding approach is applied for the same.
The Electrocardiogram (ECG) is a sensitive diagnostic tool that is used to detect various cardiovascular diseases by measuring and recording the electrical activity of the heart in exquisite detail. A wide range of heart condition is determined by thorough examination of the features of the ECG report. Automatic extraction of time plane features is important for identification of vital cardiac diseases. This paper presents a multi-resolution wavelet transform based system for detection 'P', 'Q', 'R', 'S', 'T' peaks complex from original ECG signal. 'R-R' time lapse is an important minutia of the ECG signal that corresponds to the heartbeat of the concerned person. Abrupt increase in height of the 'R' wave or changes in the measurement of the 'R-R' denote various anomalies of human heart. Similarly 'P-P', 'Q-Q', 'S-S', 'T-T' also corresponds to different anomalies of heart and their peak amplitude also envisages other cardiac diseases. In this proposed method the 'PQRST' peaks are marked and stored over the entire signal and the time interval between two consecutive 'R' peaks and other peaks interval are measured to detect anomalies in behavior of heart, if any. The peaks are achieved by the composition of Daubeheissub bands wavelet of original ECG signal. The accuracy of the 'PQRST' complex detection and interval measurement is achieved up to 100% with high exactitude by processing and thresholding the original ECG signal.
In this paper a comparative study between Moravec and Harris Corner Detection has been done for obtaining features required to track and recognize objects within a noisy image. Corner detection of noisy images is a challenging task in image processing. Natural images often get corrupted by noise during acquisition and transmission. As Corner detection of these noisy images does not provide desired results, hence de-noising is required. Adaptive wavelet thresholding approach is applied for the same.
The present work proposes a computer-aided normal and abnormal heart sound identification based on Discrete Wavelet Transform (DWT), it being useful for tele-diagnosis of heart diseases. Due to the presence of Cumulative Frequency components in the spectrogram, DWT is applied on the spectro-gram up to n level to extract the features from the individual approximation components. One dimensional feature vector is obtained by evaluating the Row Mean of the approximation components of these spectrograms. For this present approach, the set of spectrograms has been considered as the database, rather than raw sound samples. Minimum Euclidean distance is computed between feature vector of the test sample and the feature vectors of the stored samples to identify the heart sound. By applying this algorithm, almost 82% of accuracy was achieved.