Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mohammad Nur Hossain Khan

Mindfulness Meditation and Respiration: Accelerometer-Based Respiration Rate and Mindfulness Progress Estimation to Enhance App Engagement and Mindfulness Skills

Jul 23, 2025

Mohammad Nur Hossain Khan, David creswell, Jordan Albert, Patrick O'Connell, Shawn Fallon, Mathew Polowitz, Xuhai "orson" Xu, Bashima islam

Abstract:Mindfulness training is widely recognized for its benefits in reducing depression, anxiety, and loneliness. With the rise of smartphone-based mindfulness apps, digital meditation has become more accessible, but sustaining long-term user engagement remains a challenge. This paper explores whether respiration biosignal feedback and mindfulness skill estimation enhance system usability and skill development. We develop a smartphone's accelerometer-based respiration tracking algorithm, eliminating the need for additional wearables. Unlike existing methods, our approach accurately captures slow breathing patterns typical of mindfulness meditation. Additionally, we introduce the first quantitative framework to estimate mindfulness skills-concentration, sensory clarity, and equanimity-based on accelerometer-derived respiration data. We develop and test our algorithms on 261 mindfulness sessions in both controlled and real-world settings. A user study comparing an experimental group receiving biosignal feedback with a control group using a standard app shows that respiration feedback enhances system usability. Our respiration tracking model achieves a mean absolute error (MAE) of 1.6 breaths per minute, closely aligning with ground truth data, while our mindfulness skill estimation attains F1 scores of 80-84% in tracking skill progression. By integrating respiration tracking and mindfulness estimation into a commercial app, we demonstrate the potential of smartphone sensors to enhance digital mindfulness training.

* Accepted in Proc. ACM Interact. Mob. Wearable Ubiquitous Technology (IMWUT)

Via

Access Paper or Ask Questions

UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUs

Jul 10, 2025

Ashe Neth, Sawinder kaur, Mohammad Nur Hossain Khan, Subrata Biswas, Asif Salekin, Bashima Islam

Abstract:Existing pruning methods are typically applied during training or compile time and often rely on structured sparsity. While compatible with low-power microcontrollers (MCUs), structured pruning underutilizes the opportunity for fine-grained efficiency on devices without SIMD support or parallel compute. To address these limitations, we introduce UnIT (Unstructured Inference-Time pruning), a lightweight method that dynamically identifies and skips unnecessary multiply-accumulate (MAC) operations during inference, guided by input-specific activation patterns. Unlike structured pruning, UnIT embraces irregular sparsity and does not require retraining or hardware specialization. It transforms pruning decisions into lightweight comparisons, replacing multiplications with threshold checks and approximated divisions. UnIT further optimizes compute by reusing threshold computations across multiple connections and applying layer- and group-specific pruning sensitivity. We present three fast, hardware-friendly division approximations tailored to the capabilities of common embedded platforms. Demonstrated on the MSP430 microcontroller, UnIT achieves 11.02% to 82.03% MAC reduction, 27.30% to 84.19% faster inference, and 27.33% to 84.38% lower energy consumption compared to training-time pruned models, while maintaining accuracy with 0.48-7%. Under domain shift, UnIT matches or exceeds the accuracy of retrained models while requiring significantly fewer MACs. These results establish unstructured inference-time pruning as a viable and practical solution for efficient, retraining-free deployment of deep neural networks on MCUs.

* Submitted to SenSys 2026 on July 1, 2025

Via

Access Paper or Ask Questions

RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural Language

May 21, 2025

Subrata Biswas, Mohammad Nur Hossain Khan, Bashima Islam

Abstract:Multimodal question answering (QA) often requires identifying which video, audio, or sensor tokens are relevant to the question. Yet modality disagreements are common: off-camera speech, background noise, or motion outside the field of view often mislead fusion models that weight all streams equally. We present RAVEN, a unified QA architecture whose core is QuART, a query-conditioned cross-modal gating module that assigns scalar relevance scores to each token across modalities, enabling the model to amplify informative signals and suppress distractors before fusion. RAVEN is trained through a three-stage pipeline comprising unimodal pretraining, query-aligned fusion, and disagreement-oriented fine-tuning -- each stage targeting a distinct challenge in multi-modal reasoning: representation quality, cross-modal relevance, and robustness to modality mismatch. To support training and evaluation, we release AVS-QA, a dataset of 300K synchronized Audio--Video-Sensor streams paired with automatically generated question-answer pairs. Experimental results on seven multi-modal QA benchmarks -- including egocentric and exocentric tasks -- show that RAVEN achieves up to 14.5\% and 8.0\% gains in accuracy compared to state-of-the-art multi-modal large language models, respectively. Incorporating sensor data provides an additional 16.4\% boost, and the model remains robust under modality corruption, outperforming SOTA baselines by 50.23\%. Our code and dataset are available at https://github.com/BASHLab/RAVEN.

Via

Access Paper or Ask Questions

QUADS: QUAntized Distillation Framework for Efficient Speech Language Understanding

May 19, 2025

Subrata Biswas, Mohammad Nur Hossain Khan, Bashima Islam

Abstract:Spoken Language Understanding (SLU) systems must balance performance and efficiency, particularly in resource-constrained environments. Existing methods apply distillation and quantization separately, leading to suboptimal compression as distillation ignores quantization constraints. We propose QUADS, a unified framework that optimizes both through multi-stage training with a pre-tuned model, enhancing adaptability to low-bit regimes while maintaining accuracy. QUADS achieves 71.13\% accuracy on SLURP and 99.20\% on FSC, with only minor degradations of up to 5.56\% compared to state-of-the-art models. Additionally, it reduces computational complexity by 60--73$\times$ (GMACs) and model size by 83--700$\times$, demonstrating strong robustness under extreme quantization. These results establish QUADS as a highly efficient solution for real-world, resource-constrained SLU applications.

* INTERSPEECH, 2025

Via

Access Paper or Ask Questions

Sound Tagging in Infant-centric Home Soundscapes

Jun 25, 2024

Mohammad Nur Hossain Khan, Jialu Li, Nancy L. McElwain, Mark Hasegawa-Johnson, Bashima Islam

Figure 1 for Sound Tagging in Infant-centric Home Soundscapes

Figure 2 for Sound Tagging in Infant-centric Home Soundscapes

Figure 3 for Sound Tagging in Infant-centric Home Soundscapes

Figure 4 for Sound Tagging in Infant-centric Home Soundscapes

Abstract:Certain environmental noises have been associated with negative developmental outcomes for infants and young children. Though classifying or tagging sound events in a domestic environment is an active research area, previous studies focused on data collected from a non-stationary microphone placed in the environment or from the perspective of adults. Further, many of these works ignore infants or young children in the environment or have data collected from only a single family where noise from the fixed sound source can be moderate at the infant's position or vice versa. Thus, despite the recent success of large pre-trained models for noise event detection, the performance of these models on infant-centric noise soundscapes in the home is yet to be explored. To bridge this gap, we have collected and labeled noises in home soundscapes from 22 families in an unobtrusive manner, where the data are collected through an infant-worn recording device. In this paper, we explore the performance of a large pre-trained model (Audio Spectrogram Transformer [AST]) on our noise-conditioned infant-centric environmental data as well as publicly available home environmental datasets. Utilizing different training strategies such as resampling, utilizing public datasets, mixing public and infant-centric training sets, and data augmentation using noise and masking, we evaluate the performance of a large pre-trained model on sparse and imbalanced infant-centric data. Our results show that fine-tuning the large pre-trained model by combining our collected dataset with public datasets increases the F1-score from 0.11 (public datasets) and 0.76 (collected datasets) to 0.84 (combined datasets) and Cohen's Kappa from 0.013 (public datasets) and 0.77 (collected datasets) to 0.83 (combined datasets) compared to only training with public or collected datasets, respectively.

* Accepted in IEEE/ACM CHASE 2024

Via

Access Paper or Ask Questions

LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors

Jun 20, 2024

Sheikh Asif Imran, Mohammad Nur Hossain Khan, Subrata Biswas, Bashima Islam

Figure 1 for LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors

Figure 2 for LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors

Figure 3 for LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors

Figure 4 for LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors

Abstract:Integrating inertial measurement units (IMUs) with large language models (LLMs) advances multimodal AI by enhancing human activity understanding. We introduce SensorCaps, a dataset of 26,288 IMU-derived activity narrations, and OpenSQA, an instruction-following dataset with 257,562 question-answer pairs. Combining LIMU-BERT and Llama, we develop LLaSA, a Large Multimodal Agent capable of interpreting and responding to activity and motion analysis queries. Our evaluation demonstrates LLaSA's effectiveness in activity classification and question answering, highlighting its potential in healthcare, sports science, and human-computer interaction. These contributions advance sensor-aware language models and open new research avenues. Our code repository and datasets can be found on https://github.com/BASHLab/LLaSA.

* Under review at ARR (for EMNLP 2024)

Via

Access Paper or Ask Questions