Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ahmed Abul Hasanaath

USTM: Unified Spatial and Temporal Modeling for Continuous Sign Language Recognition

Dec 15, 2025

Ahmed Abul Hasanaath, Hamzah Luqman

Abstract:Continuous sign language recognition (CSLR) requires precise spatio-temporal modeling to accurately recognize sequences of gestures in videos. Existing frameworks often rely on CNN-based spatial backbones combined with temporal convolution or recurrent modules. These techniques fail in capturing fine-grained hand and facial cues and modeling long-range temporal dependencies. To address these limitations, we propose the Unified Spatio-Temporal Modeling (USTM) framework, a spatio-temporal encoder that effectively models complex patterns using a combination of a Swin Transformer backbone enhanced with lightweight temporal adapter with positional embeddings (TAPE). Our framework captures fine-grained spatial features alongside short and long-term temporal context, enabling robust sign language recognition from RGB videos without relying on multi-stream inputs or auxiliary modalities. Extensive experiments on benchmarked datasets including PHOENIX14, PHOENIX14T, and CSL-Daily demonstrate that USTM achieves state-of-the-art performance against RGB-based as well as multi-modal CSLR approaches, while maintaining competitive performance against multi-stream approaches. These results highlight the strength and efficacy of the USTM framework for CSLR. The code is available at https://github.com/gufranSabri/USTM

Via

Access Paper or Ask Questions

FSBI: Deepfakes Detection with Frequency Enhanced Self-Blended Images

Jun 12, 2024

Ahmed Abul Hasanaath, Hamzah Luqman, Raed Katib, Saeed Anwar

Figure 1 for FSBI: Deepfakes Detection with Frequency Enhanced Self-Blended Images

Figure 2 for FSBI: Deepfakes Detection with Frequency Enhanced Self-Blended Images

Figure 3 for FSBI: Deepfakes Detection with Frequency Enhanced Self-Blended Images

Figure 4 for FSBI: Deepfakes Detection with Frequency Enhanced Self-Blended Images

Abstract:Advances in deepfake research have led to the creation of almost perfect manipulations undetectable by human eyes and some deepfakes detection tools. Recently, several techniques have been proposed to differentiate deepfakes from realistic images and videos. This paper introduces a Frequency Enhanced Self-Blended Images (FSBI) approach for deepfakes detection. This proposed approach utilizes Discrete Wavelet Transforms (DWT) to extract discriminative features from the self-blended images (SBI) to be used for training a convolutional network architecture model. The SBIs blend the image with itself by introducing several forgery artifacts in a copy of the image before blending it. This prevents the classifier from overfitting specific artifacts by learning more generic representations. These blended images are then fed into the frequency features extractor to detect artifacts that can not be detected easily in the time domain. The proposed approach has been evaluated on FF++ and Celeb-DF datasets and the obtained results outperformed the state-of-the-art techniques with the cross-dataset evaluation protocol.

* The paper is under consideration at Pattern Recognition Letters

Via

Access Paper or Ask Questions