Topic:Sign Language Recognition
What is Sign Language Recognition? Sign language recognition is a computer vision and natural language processing task that involves automatically recognizing and translating sign language gestures into written or spoken language. The goal of sign language recognition is to develop algorithms that can understand and interpret sign language, enabling people who use sign language as their primary mode of communication to communicate more easily with non-signers.
Papers and Code
Jun 06, 2024
Abstract:This research combines MediaPipe and CNNs for the efficient and accurate interpretation of ASL dataset for the real-time detection of sign language. The system presented here captures and processes hands' gestures in real time. the intended purpose was to create a very easy, accurate, and fast way of entering commands without the necessity of touching something.MediaPipe supports one of the powerful frameworks in real-time hand tracking capabilities for the ability to capture and preprocess hand movements, which increases the accuracy of the gesture recognition system. Actually, the integration of CNN with the MediaPipe results in higher efficiency in using the model of real-time processing.The accuracy achieved by the model on ASL datasets is 99.12\%.The model was tested using American Sign Language (ASL) datasets. The results were then compared to those of existing methods to evaluate how well it performed, using established evaluation techniques. The system will have applications in the communication, education, and accessibility domains. Making systems such as described in this paper even better will assist people with hearing impairment and make things accessible to them. We tested the recognition and translation performance on an ASL dataset and achieved better accuracy over previous models.It is meant to the research is to identify the characters that American signs recognize using hand images taken from a web camera by based on mediapipe and CNNs
Via

Apr 17, 2024
Abstract:Human action recognition and performance assessment have been hot research topics in recent years. Recognition problems have mature solutions in the field of sign language, but past research in performance analysis has focused on competitive sports and medical training, overlooking the scoring assessment ,which is an important part of sign language teaching digitalization. In this paper, we analyze the existing technologies for performance assessment and adopt methods that perform well in human pose reconstruction tasks combined with motion rotation embedded expressions, proposing a two-stage sign language performance evaluation pipeline. Our analysis shows that choosing reconstruction tasks in the first stage can provide more expressive features, and using smoothing methods can provide an effective reference for assessment. Experiments show that our method provides good score feedback mechanisms and high consistency with professional assessments compared to end-to-end evaluations.
* 9 pages, 7 figures
Via

Mar 05, 2024
Abstract:Growing research in sign language recognition, generation, and translation AI has been accompanied by calls for ethical development of such technologies. While these works are crucial to helping individual researchers do better, there is a notable lack of discussion of systemic biases or analysis of rhetoric that shape the research questions and methods in the field, especially as it remains dominated by hearing non-signing researchers. Therefore, we conduct a systematic review of 101 recent papers in sign language AI. Our analysis identifies significant biases in the current state of sign language AI research, including an overfocus on addressing perceived communication barriers, a lack of use of representative datasets, use of annotations lacking linguistic foundations, and development of methods that build on flawed models. We take the position that the field lacks meaningful input from Deaf stakeholders, and is instead driven by what decisions are the most convenient or perceived as important to hearing researchers. We end with a call to action: the field must make space for Deaf researchers to lead the conversation in sign language AI.
Via

Jan 22, 2024
Abstract:Sign language recognition (SLR) plays a vital role in facilitating communication for the hearing-impaired community. SLR is a weakly supervised task where entire videos are annotated with glosses, making it challenging to identify the corresponding gloss within a video segment. Recent studies indicate that the main bottleneck in SLR is the insufficient training caused by the limited availability of large-scale datasets. To address this challenge, we present SignVTCL, a multi-modal continuous sign language recognition framework enhanced by visual-textual contrastive learning, which leverages the full potential of multi-modal data and the generalization ability of language model. SignVTCL integrates multi-modal data (video, keypoints, and optical flow) simultaneously to train a unified visual backbone, thereby yielding more robust visual representations. Furthermore, SignVTCL contains a visual-textual alignment approach incorporating gloss-level and sentence-level alignment to ensure precise correspondence between visual features and glosses at the level of individual glosses and sentence. Experimental results conducted on three datasets, Phoenix-2014, Phoenix-2014T, and CSL-Daily, demonstrate that SignVTCL achieves state-of-the-art results compared with previous methods.
Via

Jan 10, 2024
Abstract:The objective of sign language recognition is to bridge the communication gap between the deaf and the hearing. Numerous previous works train their models using the well-established connectionist temporal classification (CTC) loss. During the inference stage, the CTC-based models typically take the entire sign video as input to make predictions. This type of inference scheme is referred to as offline recognition. In contrast, while mature speech recognition systems can efficiently recognize spoken words on the fly, sign language recognition still falls short due to the lack of practical online solutions. In this work, we take the first step towards filling this gap. Our approach comprises three phases: 1) developing a sign language dictionary encompassing all glosses present in a target sign language dataset; 2) training an isolated sign language recognition model on augmented signs using both conventional classification loss and our novel saliency loss; 3) employing a sliding window approach on the input sign sequence and feeding each sign clip to the well-optimized model for online recognition. Furthermore, our online recognition model can be extended to boost the performance of any offline model, and to support online translation by appending a gloss-to-text network onto the recognition model. By integrating our online framework with the previously best-performing offline model, TwoStream-SLR, we achieve new state-of-the-art performance on three benchmarks: Phoenix-2014, Phoenix-2014T, and CSL-Daily. Code and models will be available at https://github.com/FangyunWei/SLRT
Via

Jan 22, 2024
Abstract:Recent advances in Deep Learning and Computer Vision have been successfully leveraged to serve marginalized communities in various contexts. One such area is Sign Language - a primary means of communication for the deaf community. However, so far, the bulk of research efforts and investments have gone into American Sign Language, and research activity into low-resource sign languages - especially Bangla Sign Language - has lagged significantly. In this research paper, we present a new word-level Bangla Sign Language dataset - BdSL40 - consisting of 611 videos over 40 words, along with two different approaches: one with a 3D Convolutional Neural Network model and another with a novel Graph Neural Network approach for the classification of BdSL40 dataset. This is the first study on word-level BdSL recognition, and the dataset was transcribed from Indian Sign Language (ISL) using the Bangla Sign Language Dictionary (1997). The proposed GNN model achieved an F1 score of 89%. The study highlights the significant lexical and semantic similarity between BdSL, West Bengal Sign Language, and ISL, and the lack of word-level datasets for BdSL in the literature. We release the dataset and source code to stimulate further research.
Via

Feb 14, 2024
Abstract:Activation functions enable neural networks to learn complex representations by introducing non-linearities. While feedforward models commonly use rectified linear units, sequential models like recurrent neural networks, long short-term memory (LSTMs) and gated recurrent units (GRUs) still rely on Sigmoid and TanH activation functions. However, these classical activation functions often struggle to model sparse patterns when trained on small sequential datasets to effectively capture temporal dependencies. To address this limitation, we propose squared Sigmoid TanH (SST) activation specifically tailored to enhance the learning capability of sequential models under data constraints. SST applies mathematical squaring to amplify differences between strong and weak activations as signals propagate over time, facilitating improved gradient flow and information filtering. We evaluate SST-powered LSTMs and GRUs for diverse applications, such as sign language recognition, regression, and time-series classification tasks, where the dataset is limited. Our experiments demonstrate that SST models consistently outperform RNN-based models with baseline activations, exhibiting improved test accuracy.
* 10 pages,9 figures, Submitted to IJCAI 2024 conference
Via

Feb 13, 2024
Abstract:Activation function is a pivotal component of deep learning, facilitating the extraction of intricate data patterns. While classical activation functions like ReLU and its variants are extensively utilized, their static nature and simplicity, despite being advantageous, often limit their effectiveness in specialized tasks. The trainable activation functions also struggle sometimes to adapt to the unique characteristics of the data. Addressing these limitations, we introduce a novel trainable activation function, adaptive piecewise approximated activation linear unit (APALU), to enhance the learning performance of deep learning across a broad range of tasks. It presents a unique set of features that enable it to maintain stability and efficiency in the learning process while adapting to complex data representations. Experiments reveal significant improvements over widely used activation functions for different tasks. In image classification, APALU increases MobileNet and GoogleNet accuracy by 0.37% and 0.04%, respectively, on the CIFAR10 dataset. In anomaly detection, it improves the average area under the curve of One-CLASS Deep SVDD by 0.8% on the MNIST dataset, 1.81% and 1.11% improvements with DifferNet, and knowledge distillation, respectively, on the MVTech dataset. Notably, APALU achieves 100% accuracy on a sign language recognition task with a limited dataset. For regression tasks, APALU enhances the performance of deep neural networks and recurrent neural networks on different datasets. These improvements highlight the robustness and adaptability of APALU across diverse deep-learning applications.
* 9 pages, 4 figures, Submitted at IJCAI 2024 conference
Via

Feb 20, 2024
Abstract:In the fast-paced field of human-computer interaction (HCI) and virtual reality (VR), automatic gesture recognition has become increasingly essential. This is particularly true for the recognition of hand signs, providing an intuitive way to effortlessly navigate and control VR and HCI applications. Considering increased privacy requirements, radar sensors emerge as a compelling alternative to cameras. They operate effectively in low-light conditions without capturing identifiable human details, thanks to their lower resolution and distinct wavelength compared to visible light. While previous works predominantly deploy radar sensors for dynamic hand gesture recognition based on Doppler information, our approach prioritizes classification using an imaging radar that operates on spatial information, e.g. image-like data. However, generating large training datasets required for neural networks (NN) is a time-consuming and challenging process, often falling short of covering all potential scenarios. Acknowledging these challenges, this study explores the efficacy of synthetic data generated by an advanced radar ray-tracing simulator. This simulator employs an intuitive material model that can be adjusted to introduce data diversity. Despite exclusively training the NN on synthetic data, it demonstrates promising performance when put to the test with real measurement data. This emphasizes the practicality of our methodology in overcoming data scarcity challenges and advancing the field of automatic gesture recognition in VR and HCI applications.
* 5 pages, 6 figures. Accepted to IEEE Radarconf2024
Via

Jan 15, 2024
Abstract:Structured hand gestures that incorporate visual motions and signs are used in sign language. Sign language is a valuable means of daily communication for individuals who are deaf or have speech impairments, but it is still rare among hearing people, and fewer are capable of understand it. Within the academic context, parents and teachers play a crucial role in supporting deaf students from childhood by facilitating their learning of sign language. In the last years, among all the teaching tools useful for learning sign language, the use of Virtual Reality (VR) has increased, as it has been demonstrated to improve retention, memory and attention during the learning process. The ISENSE project has been created to assist students with deafness during their academic life by proposing different technological tools for teaching sign language to the hearing community in the academic context. As part of the ISENSE project, this work aims to develop an application for Spanish and Italian sign language recognition that exploits the VR environment to quickly and easily create a comprehensive database of signs and an Artificial Intelligence (AI)-based software to accurately classify and recognize static and dynamic signs: from letters to sentences.
* 2023 IEEE International Conference on Metrology for Extended
Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE),
Milano, Italy, 2023, pp. 104-109
* 6 pages, 4 figures, MetroXRAINE 2023 Conference, ISENSE european
project
Via
