Abstract:Anomaly detection holds considerable industrial significance, especially in scenarios with limited anomalous data. Currently, reconstruction-based and unsupervised representation-based approaches are the primary focus. However, unsupervised representation-based methods struggle to extract robust features under domain shift, whereas reconstruction-based methods often suffer from low training efficiency and performance degradation due to insufficient constraints. To address these challenges, we propose a novel method named Compressed Global Feature Conditioned Anomaly Detection (CCAD). CCAD synergizes the strengths of both paradigms by adapting global features as a new modality condition for the reconstruction model. Furthermore, we design an adaptive compression mechanism to enhance both generalization and training efficiency. Extensive experiments demonstrate that CCAD consistently outperforms state-of-the-art methods in terms of AUC while achieving faster convergence. In addition, we contribute a reorganized and re-annotated version of the DAGM 2007 dataset with new annotations to further validate our method's effectiveness. The code for reproducing main results is available at https://github.com/chloeqxq/CCAD.




Abstract:User engagement modeling for manipulating actions in vision-based interfaces is one of the most important case studies of user mental state detection. In a Virtual Reality environment that employs camera sensors to recognize human activities, we have to know when user intends to perform an action and when not. Without a proper algorithm for recognizing engagement status, any kind of activities could be interpreted as manipulating actions, called "Midas Touch" problem. Baseline approach for solving this problem is activating gesture recognition system using some focus gestures such as waiving or raising hand. However, a desirable natural user interface should be able to understand user's mental status automatically. In this paper, a novel multi-modal model for engagement detection, DAIA, is presented. using DAIA, the spectrum of mental status for performing an action is quantized in a finite number of engagement states. For this purpose, a Finite State Transducer (FST) is designed. This engagement framework shows how to integrate multi-modal information from user biometric data streams such as 2D and 3D imaging. FST is employed to make the state transition smoothly using combination of several boolean expressions. Our FST true detection rate is 92.3% in total for four different states. Results also show FST can segment user hand gestures more robustly.




Abstract:Group meetings are frequent business events aimed to develop and conduct project work, such as Big Room design and construction project meetings. To be effective in these meetings, participants need to have an engaged mental state. The mental state of participants however, is hidden from other participants, and thereby difficult to evaluate. Mental state is understood as an inner process of thinking and feeling, that is formed of a conglomerate of mental representations and propositional attitudes. There is a need to create transparency of these hidden states to understand, evaluate and influence them. Facilitators need to evaluate the meeting situation and adjust for higher engagement and productivity. This paper presents a framework that defines a spectrum of engagement states and an array of classifiers aimed to detect the engagement state of participants in real time. The Engagement Framework integrates multi-modal information from 2D and 3D imaging and sound. Engagement is detected and evaluated at participants and aggregated at group level. We use empirical data collected at the lab of Konica Minolta, Inc. to test initial applications of this framework. The paper presents examples of the tested engagement classifiers, which are based on research in psychology, communication, and human computer interaction. Their accuracy is illustrated in dyadic interaction for engagement detection. In closing we discuss the potential extension to complex group collaboration settings and future feedback implementations.