Dysarthria is a neurological speech disorder that can significantly impact affected individuals' communication abilities and overall quality of life. The accurate and objective classification of dysarthria and the determination of its severity are crucial for effective therapeutic intervention. While traditional assessments by speech-language pathologists (SLPs) are common, they are often subjective, time-consuming, and can vary between practitioners. Emerging machine learning-based models have shown the potential to provide a more objective dysarthria assessment, enhancing diagnostic accuracy and reliability. This systematic review aims to comprehensively analyze current methodologies for classifying dysarthria based on severity levels. Specifically, this review will focus on determining the most effective set and type of features that can be used for automatic patient classification and evaluating the best AI techniques for this purpose. We will systematically review the literature on the automatic classification of dysarthria severity levels. Sources of information will include electronic databases and grey literature. Selection criteria will be established based on relevance to the research questions. Data extraction will include methodologies used, the type of features extracted for classification, and AI techniques employed. The findings of this systematic review will contribute to the current understanding of dysarthria classification, inform future research, and support the development of improved diagnostic tools. The implications of these findings could be significant in advancing patient care and improving therapeutic outcomes for individuals affected by dysarthria.
Three-dimensional (3D) point cloud analysis has become one of the attractive subjects in realistic imaging and machine visions due to its simplicity, flexibility and powerful capacity of visualization. Actually, the representation of scenes and buildings using 3D shapes and formats leveraged many applications among which automatic driving, scenes and objects reconstruction, etc. Nevertheless, working with this emerging type of data has been a challenging task for objects representation, scenes recognition, segmentation, and reconstruction. In this regard, a significant effort has recently been devoted to developing novel strategies, using different techniques such as deep learning models. To that end, we present in this paper a comprehensive review of existing tasks on 3D point cloud: a well-defined taxonomy of existing techniques is performed based on the nature of the adopted algorithms, application scenarios, and main objectives. Various tasks performed on 3D point could data are investigated, including objects and scenes detection, recognition, segmentation and reconstruction. In addition, we introduce a list of used datasets, we discuss respective evaluation metrics and we compare the performance of existing solutions to better inform the state-of-the-art and identify their limitations and strengths. Lastly, we elaborate on current challenges facing the subject of technology and future trends attracting considerable interest, which could be a starting point for upcoming research studies
Automatic speech recognition (ASR) has recently become an important challenge when using deep learning (DL). It requires large-scale training datasets and high computational and storage resources. Moreover, DL techniques and machine learning (ML) approaches in general, hypothesize that training and testing data come from the same domain, with the same input feature space and data distribution characteristics. This assumption, however, is not applicable in some real-world artificial intelligence (AI) applications. Moreover, there are situations where gathering real data is challenging, expensive, or rarely occurring, which can not meet the data requirements of DL models. deep transfer learning (DTL) has been introduced to overcome these issues, which helps develop high-performing models using real datasets that are small or slightly different but related to the training data. This paper presents a comprehensive survey of DTL-based ASR frameworks to shed light on the latest developments and helps academics and professionals understand current challenges. Specifically, after presenting the DTL background, a well-designed taxonomy is adopted to inform the state-of-the-art. A critical analysis is then conducted to identify the limitations and advantages of each framework. Moving on, a comparative study is introduced to highlight the current challenges before deriving opportunities for future research.
Learning disabilities, which primarily interfere with the basic learning skills such as reading, writing and math, are known to affect around 10% of children in the world. The poor motor skills and motor coordination as part of the neurodevelopmental disorder can become a causative factor for the difficulty in learning to write (dysgraphia), hindering the academic track of an individual. The signs and symptoms of dysgraphia include but are not limited to irregular handwriting, improper handling of writing medium, slow or labored writing, unusual hand position, etc. The widely accepted assessment criterion for all the types of learning disabilities is the examination performed by medical experts. The few available artificial intelligence-powered screening systems for dysgraphia relies on the distinctive features of handwriting from the corresponding images.This work presents a review of the existing automated dysgraphia diagnosis systems for children in the literature. The main focus of the work is to review artificial intelligence-based systems for dysgraphia diagnosis in children. This work discusses the data collection method, important handwriting features, machine learning algorithms employed in the literature for the diagnosis of dysgraphia. Apart from that, this article discusses some of the non-artificial intelligence-based automated systems also. Furthermore, this article discusses the drawbacks of existing systems and proposes a novel framework for dysgraphia diagnosis.
To understand the real world using various types of data, Artificial Intelligence (AI) is the most used technique nowadays. While finding the pattern within the analyzed data represents the main task. This is performed by extracting representative features step, which is proceeded using the statistical algorithms or using some specific filters. However, the selection of useful features from large-scale data represented a crucial challenge. Now, with the development of convolution neural networks (CNNs), the feature extraction operation has become more automatic and easier. CNNs allow to work on large-scale size of data, as well as cover different scenarios for a specific task. For computer vision tasks, convolutional networks are used to extract features also for the other parts of a deep learning model. The selection of a suitable network for feature extraction or the other parts of a DL model is not random work. So, the implementation of such a model can be related to the target task as well as the computational complexity of it. Many networks have been proposed and become the famous networks used for any DL models in any AI task. These networks are exploited for feature extraction or at the beginning of any DL model which is named backbones. A backbone is a known network trained in many other tasks before and demonstrates its effectiveness. In this paper, an overview of the existing backbones, e.g. VGGs, ResNets, DenseNet, etc, is given with a detailed description. Also, a couple of computer vision tasks are discussed by providing a review of each task regarding the backbones used. In addition, a comparison in terms of performance is also provided, based on the backbone used for each task.
Image segmentation for video analysis plays an essential role in different research fields such as smart city, healthcare, computer vision and geoscience, and remote sensing applications. In this regard, a significant effort has been devoted recently to developing novel segmentation strategies; one of the latest outstanding achievements is panoptic segmentation. The latter has resulted from the fusion of semantic and instance segmentation. Explicitly, panoptic segmentation is currently under study to help gain a more nuanced knowledge of the image scenes for video surveillance, crowd counting, self-autonomous driving, medical image analysis, and a deeper understanding of the scenes in general. To that end, we present in this paper the first comprehensive review of existing panoptic segmentation methods to the best of the authors' knowledge. Accordingly, a well-defined taxonomy of existing panoptic techniques is performed based on the nature of the adopted algorithms, application scenarios, and primary objectives. Moreover, the use of panoptic segmentation for annotating new datasets by pseudo-labeling is discussed. Moving on, ablation studies are carried out to understand the panoptic methods from different perspectives. Moreover, evaluation metrics suitable for panoptic segmentation are discussed, and a comparison of the performance of existing solutions is provided to inform the state-of-the-art and identify their limitations and strengths. Lastly, the current challenges the subject technology faces and the future trends attracting considerable interest in the near future are elaborated, which can be a starting point for the upcoming research studies. The papers provided with code are available at: https://github.com/elharroussomar/Awesome-Panoptic-Segmentation
Crowd counting on the drone platform is an interesting topic in computer vision, which brings new challenges such as small object inference, background clutter and wide viewpoint. However, there are few algorithms focusing on crowd counting on the drone-captured data due to the lack of comprehensive datasets. To this end, we collect a large-scale dataset and organize the Vision Meets Drone Crowd Counting Challenge (VisDrone-CC2020) in conjunction with the 16th European Conference on Computer Vision (ECCV 2020) to promote the developments in the related fields. The collected dataset is formed by $3,360$ images, including $2,460$ images for training, and $900$ images for testing. Specifically, we manually annotate persons with points in each video frame. There are $14$ algorithms from $15$ institutes submitted to the VisDrone-CC2020 Challenge. We provide a detailed analysis of the evaluation results and conclude the challenge. More information can be found at the website: \url{http://www.aiskyeye.com/}.
The novelty of the COVID-19 disease and the speed of spread has created a colossal chaos, impulse among researchers worldwide to exploit all the resources and capabilities to understand and analyze characteristics of the coronavirus in term of the ways it spreads and virus incubation time. For that, the existing medical features like CT and X-ray images are used. For example, CT-scan images can be used for the detection of lung infection. But the challenges of these features such as the quality of the image and infection characteristics limitate the effectiveness of these features. Using artificial intelligence (AI) tools and computer vision algorithms, the accuracy of detection can be more accurate and can help to overcome these issues. This paper proposes a multi-task deep-learning-based method for lung infection segmentation using CT-scan images. Our proposed method starts by segmenting the lung regions that can be infected. Then, segmenting the infections in these regions. Also, to perform a multi-class segmentation the proposed model is trained using the two-stream inputs. The multi-task learning used in this paper allows us to overcome shortage of labeled data. Also, the multi-input stream allows the model to do the learning on many features that can improve the results. To evaluate the proposed method, many features have been used. Also, from the experiments, the proposed method can segment lung infections with a high degree performance even with shortage of data and labeled images. In addition, comparing with the state-of-the-art method our method achieves good performance results.
Although image inpainting, or the art of repairing the old and deteriorated images, has been around for many years, it has gained even more popularity because of the recent development in image processing techniques. With the improvement of image processing tools and the flexibility of digital image editing, automatic image inpainting has found important applications in computer vision and has also become an important and challenging topic of research in image processing. This paper is a brief review of the existing image inpainting approaches we first present a global vision on the existing methods for image inpainting. We attempt to collect most of the existing approaches and classify them into three categories, namely, sequential-based, CNN-based and GAN-based methods. In addition, for each category, a list of methods for the different types of distortion on the images is presented. Furthermore, collect a list of the available datasets and discuss these in our paper. This is a contribution for digital image inpainting researchers trying to look for the available datasets because there is a lack of datasets available for image inpainting. As the final step in this overview, we present the results of real evaluations of the three categories of image inpainting methods performed on the datasets used, for the different types of image distortion. In the end, we also present the evaluations metrics and discuss the performance of these methods in terms of these metrics. This overview can be used as a reference for image inpainting researchers, and it can also facilitate the comparison of the methods as well as the datasets used. The main contribution of this paper is the presentation of the three categories of image inpainting methods along with a list of available datasets that the researchers can use to evaluate their proposed methodology against.
In recent years, various attempts have been proposed to explore the use of spatial and temporal information for human action recognition using convolutional neural networks (CNNs). However, only a small number of methods are available for the recognition of many human actions performed by more than one person in the same surveillance video. This paper proposes a novel technique for multiple human action recognition using a new architecture based on 3Dimdenisional deep learning with application to video surveillance systems. The first stage of the model uses a new representation of the data by extracting the sequence of each person acting in the scene. An analysis of each sequence to detect the corresponding actions is also proposed. KTH, Weizmann and UCF-ARG datasets were used for training, new datasets were also constructed which include a number of persons having multiple actions were used for testing the proposed algorithm. The results of this work revealed that the proposed method provides more accurate multi human action recognition achieving 98%. Other videos were used for the evaluation including datasets (UCF101, Hollywood2, HDMB51, and YouTube) without any preprocessing and the results obtained suggest that our proposed method clearly improves the performances when compared to state-of-the-art methods.