Thai Finger Spelling (TFS) sign recognition could benefit a community of hearing-difficulty people in bridging to a major hearing population. With a relatively large number of alphabets, TFS employs multiple signing schemes. Two schemes of more common signing -- static and dynamic single-hand signing, widely used in other sign languages -- have been addressed in several previous works. To complete the TFS sign recognition, the remaining two of quite distinct signing schemes -- static and dynamic point-on-hand signing -- need to be sufficiently addressed. With the advent of many off-the-shelf hand skeleton prediction models and that training a model to recognize a sign language from scratch is expensive, we explore an approach building upon recently launched MediaPipe Hands (MPH). MPH is a high-precision well-trained model for hand-keypoint detection. We have investigated MPH on three TFS schemes: static-single-hand (S1), simplified dynamic-single-hand (S2) and static-point-on-hand (P1) schemes. Our results show that MPH can satisfactorily address single-hand schemes with accuracy of 84.57% on both S1 and S2. However, our finding reveals a shortcoming of MPH in addressing a point-on-hand scheme, whose accuracy is 23.66% on P1 conferring to 69.19% obtained from conventional classification trained from scratch. This shortcoming has been investigated and attributed to self occlusion and handedness.
Despite overwhelming achievements in recognition accuracy, extending an open-set capability -- ability to identify when the question is out of scope -- remains greatly challenging in a scalable machine learning inference. A recent research has discovered Latent Cognizance (LC) -- an insight on a recognition mechanism based on a new probabilistic interpretation, Bayesian theorem, and an analysis of an internal structure of a commonly-used recognition inference structure. The new interpretation emphasizes a latent assumption of an overlooked probabilistic condition on a learned inference model. Viability of LC has been shown on a task of sign language recognition, but its potential and implication can reach far beyond a specific domain and can move object recognition toward a scalable open-set recognition. However, LC new probabilistic interpretation has not been directly investigated. This article investigates the new interpretation under a traceable context. Our findings support the rationale on which LC is based and reveal a hidden mechanism underlying the learning classification inference. The ramification of these findings could lead to a simple yet effective solution to an open-set recognition.
Sign language is a main communication channel among hearing disability community. Automatic sign language transcription could facilitate better communication and understanding between hearing disability community and hearing majority. As a recent work in automatic sign language transcription has discussed, effectively handling or identifying a non-sign posture is one of the key issues. A non-sign posture is a posture unintended for sign reading and does not belong to any valid sign. A non-sign posture may arise during sign transition or simply from an unaware posture. Confidence ratio has been proposed to mitigate the issue. Confidence ratio is simple to compute and readily available without extra training. However, confidence ratio is reported to only partially address the problem. In addition, confidence ratio formulation is susceptible to computational instability. This article proposes alternative formulations to confidence ratio, investigates an issue of non-sign identification for Thai Finger Spelling recognition, explores potential solutions and has found a promising direction. Not only does this finding address the issue of non-sign identification, it also provide some insight behind a well-learned inference machine, revealing hidden meaning and new interpretation of the underlying mechanism. Our proposed methods are evaluated and shown to be effective for non-sign detection.
This study investigates an application of a new probabilistic interpretation of a softmax output to Open-Set Recognition (OSR). Softmax is a mechanism wildly used in classification and object recognition. However, a softmax mechanism forces a model to operate under a closed-set paradigm, i.e., to predict an object class out of a set of pre-defined labels. This characteristic contributes to efficacy in classification, but poses a risk of non-sense prediction in object recognition. Object recognition is often operated under a dynamic and diverse condition. A foreign object -- an object of any unprepared class -- can be encountered at any time. OSR is intended to address an issue of identifying a foreign object in object recognition. Based on Bayes theorem and the emphasis of conditioning on the context, softmax inference has been re-interpreted. This re-interpretation has led to a new approach to OSR, called Latent Cognizance (LC). Our investigation employs various scenarios, using Imagenet 2012 dataset as well as fooling and open-set images. The findings support LC hypothesis and show its effectiveness on OSR.