Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Time": models, code, and papers

Robust Dual-Modal Speech Keyword Spotting for XR Headsets

Jan 26, 2024
Zhuojiang Cai, Yuhan Ma, Feng Lu

While speech interaction finds widespread utility within the Extended Reality (XR) domain, conventional vocal speech keyword spotting systems continue to grapple with formidable challenges, including suboptimal performance in noisy environments, impracticality in situations requiring silence, and susceptibility to inadvertent activations when others speak nearby. These challenges, however, can potentially be surmounted through the cost-effective fusion of voice and lip movement information. Consequently, we propose a novel vocal-echoic dual-modal keyword spotting system designed for XR headsets. We devise two different modal fusion approches and conduct experiments to test the system's performance across diverse scenarios. The results show that our dual-modal system not only consistently outperforms its single-modal counterparts, demonstrating higher precision in both typical and noisy environments, but also excels in accurately identifying silent utterances. Furthermore, we have successfully applied the system in real-time demonstrations, achieving promising results. The code is available at https://github.com/caizhuojiang/VE-KWS.

* Accepted to IEEE VR 2024

Via

Access Paper or Ask Questions

Implicit Neural Representation for Physics-driven Actuated Soft Bodies

Jan 26, 2024
Lingchen Yang, Byungsoo Kim, Gaspard Zoss, Baran Gözcü, Markus Gross, Barbara Solenthaler

Figure 1 for Implicit Neural Representation for Physics-driven Actuated Soft Bodies

Figure 2 for Implicit Neural Representation for Physics-driven Actuated Soft Bodies

Figure 3 for Implicit Neural Representation for Physics-driven Actuated Soft Bodies

Figure 4 for Implicit Neural Representation for Physics-driven Actuated Soft Bodies

Active soft bodies can affect their shape through an internal actuation mechanism that induces a deformation. Similar to recent work, this paper utilizes a differentiable, quasi-static, and physics-based simulation layer to optimize for actuation signals parameterized by neural networks. Our key contribution is a general and implicit formulation to control active soft bodies by defining a function that enables a continuous mapping from a spatial point in the material space to the actuation value. This property allows us to capture the signal's dominant frequencies, making the method discretization agnostic and widely applicable. We extend our implicit model to mandible kinematics for the particular case of facial animation and show that we can reliably reproduce facial expressions captured with high-quality capture systems. We apply the method to volumetric soft bodies, human poses, and facial expressions, demonstrating artist-friendly properties, such as simple control over the latent space and resolution invariance at test time.

* Accepted to SIGGRAPH 2022. Project page: https://studios.disneyresearch.com/2022/07/24/implicit-neural-representation-for-physics-driven-actuated-soft-bodies/ Video: https://www.youtube.com/watch?v=9EERe_CTazk

Via

Access Paper or Ask Questions

MiTU-Net: A fine-tuned U-Net with SegFormer backbone for segmenting pubic symphysis-fetal head

Jan 27, 2024
Fangyijie Wang, Guenole Silvestre, Kathleen Curran

Ultrasound measurements have been examined as potential tools for predicting the likelihood of successful vaginal delivery. The angle of progression (AoP) is a measurable parameter that can be obtained during the initial stage of labor. The AoP is defined as the angle between a straight line along the longitudinal axis of the pubic symphysis (PS) and a line from the inferior edge of the PS to the leading edge of the fetal head (FH). However, the process of measuring AoP on ultrasound images is time consuming and prone to errors. To address this challenge, we propose the Mix Transformer U-Net (MiTU-Net) network, for automatic fetal head-pubic symphysis segmentation and AoP measurement. The MiTU-Net model is based on an encoder-decoder framework, utilizing a pre-trained efficient transformer to enhance feature representation. Within the efficient transformer encoder, the model significantly reduces the trainable parameters of the encoder-decoder model. The effectiveness of the proposed method is demonstrated through experiments conducted on a recent transperineal ultrasound dataset. Our model achieves competitive performance, ranking 5th compared to existing approaches. The MiTU-Net presents an efficient method for automatic segmentation and AoP measurement, reducing errors and assisting sonographers in clinical practice. Reproducibility: Framework implementation and models available on https://github.com/13204942/MiTU-Net.

* The 5th place in the Pubic Symphysis-Fetal Head Segmentation Challenge in MICCAI 2023

Via

Access Paper or Ask Questions

Broiler-Net: A Deep Convolutional Framework for Broiler Behavior Analysis in Poultry Houses

Jan 22, 2024
Tahereh Zarrat Ehsan, Seyed Mehdi Mohtavipour

Detecting anomalies in poultry houses is crucial for maintaining optimal chicken health conditions, minimizing economic losses and bolstering profitability. This paper presents a novel real-time framework for analyzing chicken behavior in cage-free poultry houses to detect abnormal behaviors. Specifically, two significant abnormalities, namely inactive broiler and huddling behavior, are investigated in this study. The proposed framework comprises three key steps: (1) chicken detection utilizing a state-of-the-art deep learning model, (2) tracking individual chickens across consecutive frames with a fast tracker module, and (3) detecting abnormal behaviors within the video stream. Experimental studies are conducted to evaluate the efficacy of the proposed algorithm in accurately assessing chicken behavior. The results illustrate that our framework provides a precise and efficient solution for real-time anomaly detection, facilitating timely interventions to maintain chicken health and enhance overall productivity on poultry farms. Github: https://github.com/TaherehZarratEhsan/Chicken-Behavior-Analysis

* 11 pages, 7 figures

Via

Access Paper or Ask Questions

A Parameter Privacy-Preserving Strategy for Mixed-Autonomy Platoon Control

Jan 28, 2024
Jingyuan Zhou, Kaidi Yang

It has been demonstrated that leading cruise control (LCC) can improve the operation of mixed-autonomy platoons by allowing connected and automated vehicles (CAVs) to make longitudinal control decisions based on the information provided by surrounding vehicles. However, LCC generally requires surrounding human-driven vehicles (HDVs) to share their real-time states, which can be used by adversaries to infer drivers' car-following behavior, potentially leading to financial losses or safety concerns. This paper aims to address such privacy concerns and protect the behavioral characteristics of HDVs by devising a parameter privacy-preserving approach for mixed-autonomy platoon control. First, we integrate a parameter privacy filter into LCC to protect sensitive car-following parameters. The privacy filter allows each vehicle to generate seemingly realistic pseudo states by distorting the true parameters to pseudo parameters, which can protect drivers' privacy in behavioral parameters without significantly influencing the control performance. Second, to enhance the practicality and reliability of the privacy filter within LCC, we first extend the current approach to accommodate continuous parameter spaces through a neural network estimator. Subsequently, we introduce an individual-level parameter privacy preservation constraint, focusing on the privacy level of each individual parameter pair, further enhancing the approach's reliability. Third, analysis of head-to-tail string stability reveals the potential impact of privacy filters in degrading mixed traffic flow performance. Simulation shows that this approach can effectively trade off privacy and control performance in LCC. We further demonstrate the benefit of such an approach in networked systems, i.e., by applying the privacy filter to a proceeding vehicle, one can also achieve a certain level of privacy for the following vehicle.

Via

Access Paper or Ask Questions

Deep Non-Parametric Time Series Forecaster

Dec 22, 2023
Syama Sundar Rangapuram, Jan Gasthaus, Lorenzo Stella, Valentin Flunkert, David Salinas, Yuyang Wang, Tim Januschowski

This paper presents non-parametric baseline models for time series forecasting. Unlike classical forecasting models, the proposed approach does not assume any parametric form for the predictive distribution and instead generates predictions by sampling from the empirical distribution according to a tunable strategy. By virtue of this, the model is always able to produce reasonable forecasts (i.e., predictions within the observed data range) without fail unlike classical models that suffer from numerical stability on some data distributions. Moreover, we develop a global version of the proposed method that automatically learns the sampling strategy by exploiting the information across multiple related time series. The empirical evaluation shows that the proposed methods have reasonable and consistent performance across all datasets, proving them to be strong baselines to be considered in one's forecasting toolbox.

Via

Access Paper or Ask Questions

Improve Fidelity and Utility of Synthetic Credit Card Transaction Time Series from Data-centric Perspective

Jan 01, 2024
Din-Yin Hsieh, Chi-Hua Wang, Guang Cheng

Exploring generative model training for synthetic tabular data, specifically in sequential contexts such as credit card transaction data, presents significant challenges. This paper addresses these challenges, focusing on attaining both high fidelity to actual data and optimal utility for machine learning tasks. We introduce five pre-processing schemas to enhance the training of the Conditional Probabilistic Auto-Regressive Model (CPAR), demonstrating incremental improvements in the synthetic data's fidelity and utility. Upon achieving satisfactory fidelity levels, our attention shifts to training fraud detection models tailored for time-series data, evaluating the utility of the synthetic data. Our findings offer valuable insights and practical guidelines for synthetic data practitioners in the finance sector, transitioning from real to synthetic datasets for training purposes, and illuminating broader methodologies for synthesizing credit card transaction time series.

* The following article has been accepted by 2nd Workshop on Synthetic Data for AI in Finance; see https://sites.google.com/view/icaif-synthetic/home

Via

Access Paper or Ask Questions

NACHOS: Neural Architecture Search for Hardware Constrained Early Exit Neural Networks

Jan 24, 2024
Matteo Gambella, Jary Pomponi, Simone Scardapane, Manuel Roveri

Early Exit Neural Networks (EENNs) endow astandard Deep Neural Network (DNN) with Early Exit Classifiers (EECs), to provide predictions at intermediate points of the processing when enough confidence in classification is achieved. This leads to many benefits in terms of effectiveness and efficiency. Currently, the design of EENNs is carried out manually by experts, a complex and time-consuming task that requires accounting for many aspects, including the correct placement, the thresholding, and the computational overhead of the EECs. For this reason, the research is exploring the use of Neural Architecture Search (NAS) to automatize the design of EENNs. Currently, few comprehensive NAS solutions for EENNs have been proposed in the literature, and a fully automated, joint design strategy taking into consideration both the backbone and the EECs remains an open problem. To this end, this work presents Neural Architecture Search for Hardware Constrained Early Exit Neural Networks (NACHOS), the first NAS framework for the design of optimal EENNs satisfying constraints on the accuracy and the number of Multiply and Accumulate (MAC) operations performed by the EENNs at inference time. In particular, this provides the joint design of backbone and EECs to select a set of admissible (i.e., respecting the constraints) Pareto Optimal Solutions in terms of best tradeoff between the accuracy and number of MACs. The results show that the models designed by NACHOS are competitive with the state-of-the-art EENNs. Additionally, this work investigates the effectiveness of two novel regularization terms designed for the optimization of the auxiliary classifiers of the EENN

Via

Access Paper or Ask Questions

MTRGL:Effective Temporal Correlation Discerning through Multi-modal Temporal Relational Graph Learning

Jan 25, 2024
Junwei Su, Shan Wu, Jinhui Li

In this study, we explore the synergy of deep learning and financial market applications, focusing on pair trading. This market-neutral strategy is integral to quantitative finance and is apt for advanced deep-learning techniques. A pivotal challenge in pair trading is discerning temporal correlations among entities, necessitating the integration of diverse data modalities. Addressing this, we introduce a novel framework, Multi-modal Temporal Relation Graph Learning (MTRGL). MTRGL combines time series data and discrete features into a temporal graph and employs a memory-based temporal graph neural network. This approach reframes temporal correlation identification as a temporal graph link prediction task, which has shown empirical success. Our experiments on real-world datasets confirm the superior performance of MTRGL, emphasizing its promise in refining automated pair trading strategies.

Via

Access Paper or Ask Questions

Attention-based Efficient Classification for 3D MRI Image of Alzheimer's Disease

Jan 25, 2024
Yihao Lin, Ximeng Li, Yan Zhang, Jinshan Tang

Early diagnosis of Alzheimer Diagnostics (AD) is a challenging task due to its subtle and complex clinical symptoms. Deep learning-assisted medical diagnosis using image recognition techniques has become an important research topic in this field. The features have to accurately capture main variations of anatomical brain structures. However, time-consuming is expensive for feature extraction by deep learning training. This study proposes a novel Alzheimer's disease detection model based on Convolutional Neural Networks. The model utilizes a pre-trained ResNet network as the backbone, incorporating post-fusion algorithm for 3D medical images and attention mechanisms. The experimental results indicate that the employed 2D fusion algorithm effectively improves the model's training expense. And the introduced attention mechanism accurately weights important regions in images, further enhancing the model's diagnostic accuracy.

Via

Access Paper or Ask Questions