While speech interaction finds widespread utility within the Extended Reality (XR) domain, conventional vocal speech keyword spotting systems continue to grapple with formidable challenges, including suboptimal performance in noisy environments, impracticality in situations requiring silence, and susceptibility to inadvertent activations when others speak nearby. These challenges, however, can potentially be surmounted through the cost-effective fusion of voice and lip movement information. Consequently, we propose a novel vocal-echoic dual-modal keyword spotting system designed for XR headsets. We devise two different modal fusion approches and conduct experiments to test the system's performance across diverse scenarios. The results show that our dual-modal system not only consistently outperforms its single-modal counterparts, demonstrating higher precision in both typical and noisy environments, but also excels in accurately identifying silent utterances. Furthermore, we have successfully applied the system in real-time demonstrations, achieving promising results. The code is available at https://github.com/caizhuojiang/VE-KWS.
Active soft bodies can affect their shape through an internal actuation mechanism that induces a deformation. Similar to recent work, this paper utilizes a differentiable, quasi-static, and physics-based simulation layer to optimize for actuation signals parameterized by neural networks. Our key contribution is a general and implicit formulation to control active soft bodies by defining a function that enables a continuous mapping from a spatial point in the material space to the actuation value. This property allows us to capture the signal's dominant frequencies, making the method discretization agnostic and widely applicable. We extend our implicit model to mandible kinematics for the particular case of facial animation and show that we can reliably reproduce facial expressions captured with high-quality capture systems. We apply the method to volumetric soft bodies, human poses, and facial expressions, demonstrating artist-friendly properties, such as simple control over the latent space and resolution invariance at test time.
Ultrasound measurements have been examined as potential tools for predicting the likelihood of successful vaginal delivery. The angle of progression (AoP) is a measurable parameter that can be obtained during the initial stage of labor. The AoP is defined as the angle between a straight line along the longitudinal axis of the pubic symphysis (PS) and a line from the inferior edge of the PS to the leading edge of the fetal head (FH). However, the process of measuring AoP on ultrasound images is time consuming and prone to errors. To address this challenge, we propose the Mix Transformer U-Net (MiTU-Net) network, for automatic fetal head-pubic symphysis segmentation and AoP measurement. The MiTU-Net model is based on an encoder-decoder framework, utilizing a pre-trained efficient transformer to enhance feature representation. Within the efficient transformer encoder, the model significantly reduces the trainable parameters of the encoder-decoder model. The effectiveness of the proposed method is demonstrated through experiments conducted on a recent transperineal ultrasound dataset. Our model achieves competitive performance, ranking 5th compared to existing approaches. The MiTU-Net presents an efficient method for automatic segmentation and AoP measurement, reducing errors and assisting sonographers in clinical practice. Reproducibility: Framework implementation and models available on https://github.com/13204942/MiTU-Net.
Detecting anomalies in poultry houses is crucial for maintaining optimal chicken health conditions, minimizing economic losses and bolstering profitability. This paper presents a novel real-time framework for analyzing chicken behavior in cage-free poultry houses to detect abnormal behaviors. Specifically, two significant abnormalities, namely inactive broiler and huddling behavior, are investigated in this study. The proposed framework comprises three key steps: (1) chicken detection utilizing a state-of-the-art deep learning model, (2) tracking individual chickens across consecutive frames with a fast tracker module, and (3) detecting abnormal behaviors within the video stream. Experimental studies are conducted to evaluate the efficacy of the proposed algorithm in accurately assessing chicken behavior. The results illustrate that our framework provides a precise and efficient solution for real-time anomaly detection, facilitating timely interventions to maintain chicken health and enhance overall productivity on poultry farms. Github: https://github.com/TaherehZarratEhsan/Chicken-Behavior-Analysis
It has been demonstrated that leading cruise control (LCC) can improve the operation of mixed-autonomy platoons by allowing connected and automated vehicles (CAVs) to make longitudinal control decisions based on the information provided by surrounding vehicles. However, LCC generally requires surrounding human-driven vehicles (HDVs) to share their real-time states, which can be used by adversaries to infer drivers' car-following behavior, potentially leading to financial losses or safety concerns. This paper aims to address such privacy concerns and protect the behavioral characteristics of HDVs by devising a parameter privacy-preserving approach for mixed-autonomy platoon control. First, we integrate a parameter privacy filter into LCC to protect sensitive car-following parameters. The privacy filter allows each vehicle to generate seemingly realistic pseudo states by distorting the true parameters to pseudo parameters, which can protect drivers' privacy in behavioral parameters without significantly influencing the control performance. Second, to enhance the practicality and reliability of the privacy filter within LCC, we first extend the current approach to accommodate continuous parameter spaces through a neural network estimator. Subsequently, we introduce an individual-level parameter privacy preservation constraint, focusing on the privacy level of each individual parameter pair, further enhancing the approach's reliability. Third, analysis of head-to-tail string stability reveals the potential impact of privacy filters in degrading mixed traffic flow performance. Simulation shows that this approach can effectively trade off privacy and control performance in LCC. We further demonstrate the benefit of such an approach in networked systems, i.e., by applying the privacy filter to a proceeding vehicle, one can also achieve a certain level of privacy for the following vehicle.
This paper presents non-parametric baseline models for time series forecasting. Unlike classical forecasting models, the proposed approach does not assume any parametric form for the predictive distribution and instead generates predictions by sampling from the empirical distribution according to a tunable strategy. By virtue of this, the model is always able to produce reasonable forecasts (i.e., predictions within the observed data range) without fail unlike classical models that suffer from numerical stability on some data distributions. Moreover, we develop a global version of the proposed method that automatically learns the sampling strategy by exploiting the information across multiple related time series. The empirical evaluation shows that the proposed methods have reasonable and consistent performance across all datasets, proving them to be strong baselines to be considered in one's forecasting toolbox.
Exploring generative model training for synthetic tabular data, specifically in sequential contexts such as credit card transaction data, presents significant challenges. This paper addresses these challenges, focusing on attaining both high fidelity to actual data and optimal utility for machine learning tasks. We introduce five pre-processing schemas to enhance the training of the Conditional Probabilistic Auto-Regressive Model (CPAR), demonstrating incremental improvements in the synthetic data's fidelity and utility. Upon achieving satisfactory fidelity levels, our attention shifts to training fraud detection models tailored for time-series data, evaluating the utility of the synthetic data. Our findings offer valuable insights and practical guidelines for synthetic data practitioners in the finance sector, transitioning from real to synthetic datasets for training purposes, and illuminating broader methodologies for synthesizing credit card transaction time series.
Early Exit Neural Networks (EENNs) endow astandard Deep Neural Network (DNN) with Early Exit Classifiers (EECs), to provide predictions at intermediate points of the processing when enough confidence in classification is achieved. This leads to many benefits in terms of effectiveness and efficiency. Currently, the design of EENNs is carried out manually by experts, a complex and time-consuming task that requires accounting for many aspects, including the correct placement, the thresholding, and the computational overhead of the EECs. For this reason, the research is exploring the use of Neural Architecture Search (NAS) to automatize the design of EENNs. Currently, few comprehensive NAS solutions for EENNs have been proposed in the literature, and a fully automated, joint design strategy taking into consideration both the backbone and the EECs remains an open problem. To this end, this work presents Neural Architecture Search for Hardware Constrained Early Exit Neural Networks (NACHOS), the first NAS framework for the design of optimal EENNs satisfying constraints on the accuracy and the number of Multiply and Accumulate (MAC) operations performed by the EENNs at inference time. In particular, this provides the joint design of backbone and EECs to select a set of admissible (i.e., respecting the constraints) Pareto Optimal Solutions in terms of best tradeoff between the accuracy and number of MACs. The results show that the models designed by NACHOS are competitive with the state-of-the-art EENNs. Additionally, this work investigates the effectiveness of two novel regularization terms designed for the optimization of the auxiliary classifiers of the EENN
In this study, we explore the synergy of deep learning and financial market applications, focusing on pair trading. This market-neutral strategy is integral to quantitative finance and is apt for advanced deep-learning techniques. A pivotal challenge in pair trading is discerning temporal correlations among entities, necessitating the integration of diverse data modalities. Addressing this, we introduce a novel framework, Multi-modal Temporal Relation Graph Learning (MTRGL). MTRGL combines time series data and discrete features into a temporal graph and employs a memory-based temporal graph neural network. This approach reframes temporal correlation identification as a temporal graph link prediction task, which has shown empirical success. Our experiments on real-world datasets confirm the superior performance of MTRGL, emphasizing its promise in refining automated pair trading strategies.
Early diagnosis of Alzheimer Diagnostics (AD) is a challenging task due to its subtle and complex clinical symptoms. Deep learning-assisted medical diagnosis using image recognition techniques has become an important research topic in this field. The features have to accurately capture main variations of anatomical brain structures. However, time-consuming is expensive for feature extraction by deep learning training. This study proposes a novel Alzheimer's disease detection model based on Convolutional Neural Networks. The model utilizes a pre-trained ResNet network as the backbone, incorporating post-fusion algorithm for 3D medical images and attention mechanisms. The experimental results indicate that the employed 2D fusion algorithm effectively improves the model's training expense. And the introduced attention mechanism accurately weights important regions in images, further enhancing the model's diagnostic accuracy.