Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Peter Corcoran

ChildGAN: Large Scale Synthetic Child Facial Data Using Domain Adaptation in StyleGAN

Jul 25, 2023

Muhammad Ali Farooq, Wang Yao, Gabriel Costache, Peter Corcoran

Abstract:In this research work, we proposed a novel ChildGAN, a pair of GAN networks for generating synthetic boys and girls facial data derived from StyleGAN2. ChildGAN is built by performing smooth domain transfer using transfer learning. It provides photo-realistic, high-quality data samples. A large-scale dataset is rendered with a variety of smart facial transformations: facial expressions, age progression, eye blink effects, head pose, skin and hair color variations, and variable lighting conditions. The dataset comprises more than 300k distinct data samples. Further, the uniqueness and characteristics of the rendered facial features are validated by running different computer vision application tests which include CNN-based child gender classifier, face localization and facial landmarks detection test, identity similarity evaluation using ArcFace, and lastly running eye detection and eye aspect ratio tests. The results demonstrate that synthetic child facial data of high quality offers an alternative to the cost and complexity of collecting a large-scale dataset from real children.

* The Paper is submitted in IEEE Access Journal

Via

Access Paper or Ask Questions

Decisive Data using Multi-Modality Optical Sensors for Advanced Vehicular Systems

Jul 25, 2023

Muhammad Ali Farooq, Waseem Shariff, Mehdi Sefidgar Dilmaghani, Wang Yao, Moazam Soomro, Peter Corcoran

Figure 1 for Decisive Data using Multi-Modality Optical Sensors for Advanced Vehicular Systems

Figure 2 for Decisive Data using Multi-Modality Optical Sensors for Advanced Vehicular Systems

Figure 3 for Decisive Data using Multi-Modality Optical Sensors for Advanced Vehicular Systems

Figure 4 for Decisive Data using Multi-Modality Optical Sensors for Advanced Vehicular Systems

Abstract:Optical sensors have played a pivotal role in acquiring real world data for critical applications. This data, when integrated with advanced machine learning algorithms provides meaningful information thus enhancing human vision. This paper focuses on various optical technologies for design and development of state-of-the-art out-cabin forward vision systems and in-cabin driver monitoring systems. The focused optical sensors include Longwave Thermal Imaging (LWIR) cameras, Near Infrared (NIR), Neuromorphic/ event cameras, Visible CMOS cameras and Depth cameras. Further the paper discusses different potential applications which can be employed using the unique strengths of each these optical modalities in real time environment.

* The Paper is accepted in 25th Irish Machine Vision and Image Processing Conference (IMVIP23)

Via

Access Paper or Ask Questions

Adaptation of Whisper models to child speech recognition

Jul 24, 2023

Rishabh Jain, Andrei Barcovschi, Mariam Yiwere, Peter Corcoran, Horia Cucu

Figure 1 for Adaptation of Whisper models to child speech recognition

Figure 2 for Adaptation of Whisper models to child speech recognition

Figure 3 for Adaptation of Whisper models to child speech recognition

Figure 4 for Adaptation of Whisper models to child speech recognition

Abstract:Automatic Speech Recognition (ASR) systems often struggle with transcribing child speech due to the lack of large child speech datasets required to accurately train child-friendly ASR models. However, there are huge amounts of annotated adult speech datasets which were used to create multilingual ASR models, such as Whisper. Our work aims to explore whether such models can be adapted to child speech to improve ASR for children. In addition, we compare Whisper child-adaptations with finetuned self-supervised models, such as wav2vec2. We demonstrate that finetuning Whisper on child speech yields significant improvements in ASR performance on child speech, compared to non finetuned Whisper models. Additionally, utilizing self-supervised Wav2vec2 models that have been finetuned on child speech outperforms Whisper finetuning.

* Accepted in Interspeech 2023

Via

Access Paper or Ask Questions

Neuromorphic Sensing for Yawn Detection in Driver Drowsiness

May 04, 2023

Paul Kielty, Mehdi Sefidgar Dilmaghani, Cian Ryan, Joe Lemley, Peter Corcoran

Abstract:Driver monitoring systems (DMS) are a key component of vehicular safety and essential for the transition from semiautonomous to fully autonomous driving. A key task for DMS is to ascertain the cognitive state of a driver and to determine their level of tiredness. Neuromorphic vision systems, based on event camera technology, provide advanced sensing of facial characteristics, in particular the behavior of a driver's eyes. This research explores the potential to extend neuromorphic sensing techniques to analyze the entire facial region, detecting yawning behaviors that give a complimentary indicator of tiredness. A neuromorphic dataset is constructed from 952 video clips (481 yawns, 471 not-yawns) captured with an RGB color camera, with 37 subjects. A total of 95200 neuromorphic image frames are generated from this video data using a video-to-event converter. From these data 21 subjects were selected to provide a training dataset, 8 subjects were used for validation data, and the remaining 8 subjects were reserved for an "unseen" test dataset. An additional 12300 frames were generated from event simulations of a public dataset to test against other methods. A CNN with self-attention and a recurrent head was designed, trained, and tested with these data. Respective precision and recall scores of 95.9 percent and 94.7 percent were achieved on our test set, and 89.9 percent and 91 percent on the simulated public test set, demonstrating the feasibility to add yawn detection as a sensing component of a neuromorphic DMS.

Via

Access Paper or Ask Questions

Dataset Creation Pipeline for Camera-Based Heart Rate Estimation

Mar 02, 2023

Mohamed Moustafa, Amr Elrasad, Joseph Lemley, Peter Corcoran

Abstract:Heart rate is one of the most vital health metrics which can be utilized to investigate and gain intuitions into various human physiological and psychological information. Estimating heart rate without the constraints of contact-based sensors thus presents itself as a very attractive field of research as it enables well-being monitoring in a wider variety of scenarios. Consequently, various techniques for camera-based heart rate estimation have been developed ranging from classical image processing to convoluted deep learning models and architectures. At the heart of such research efforts lies health and visual data acquisition, cleaning, transformation, and annotation. In this paper, we discuss how to prepare data for the task of developing or testing an algorithm or machine learning model for heart rate estimation from images of facial regions. The data prepared is to include camera frames as well as sensor readings from an electrocardiograph sensor. The proposed pipeline is divided into four main steps, namely removal of faulty data, frame and electrocardiograph timestamp de-jittering, signal denoising and filtering, and frame annotation creation. Our main contributions are a novel technique of eliminating jitter from health sensor and camera timestamps and a method to accurately time align both visual frame and electrocardiogram sensor data which is also applicable to other sensor types.

* Presented at the International Conference on Machine Vision 2022, Rome, Italy. Paper is 8 pages long and includes 7 figures (including table)

Via

Access Paper or Ask Questions

Development, Optimization, and Deployment of Thermal Forward Vision Systems for Advance Vehicular Applications on Edge Devices

Jan 18, 2023

Muhammad Ali Farooq, Waseem Shariff, Faisal Khan, Peter Corcoran

Abstract:In this research work, we have proposed a thermal tiny-YOLO multi-class object detection (TTYMOD) system as a smart forward sensing system that should remain effective in all weather and harsh environmental conditions using an end-to-end YOLO deep learning framework. It provides enhanced safety and improved awareness features for driver assistance. The system is trained on large-scale thermal public datasets as well as newly gathered novel open-sourced dataset comprising of more than 35,000 distinct thermal frames. For optimal training and convergence of YOLO-v5 tiny network variant on thermal data, we have employed different optimizers which include stochastic decent gradient (SGD), Adam, and its variant AdamW which has an improved implementation of weight decay. The performance of thermally tuned tiny architecture is further evaluated on the public as well as locally gathered test data in diversified and challenging weather and environmental conditions. The efficacy of a thermally tuned nano network is quantified using various qualitative metrics which include mean average precision, frames per second rate, and average inference time. Experimental outcomes show that the network achieved the best mAP of 56.4% with an average inference time/ frame of 4 milliseconds. The study further incorporates optimization of tiny network variant using the TensorFlow Lite quantization tool this is beneficial for the deployment of deep learning architectures on the edge and mobile devices. For this study, we have used a raspberry pi 4 computing board for evaluating the real-time feasibility performance of an optimized version of the thermal object detection network for the automotive sensor suite. The source code, trained and optimized models and complete validation/ testing results are publicly available at https://github.com/MAli-Farooq/Thermal-YOLO-And-Model-Optimization-Using-TensorFlowLite.

* The paper is accepted and in the publication phase at ICMV 2022 Conference. Link: http://icmv.org/

Via

Access Paper or Ask Questions

Speech Driven Video Editing via an Audio-Conditioned Diffusion Model

Jan 12, 2023

Dan Bigioi, Shubhajit Basak, Hugh Jordan, Rachel McDonnell, Peter Corcoran

Figure 1 for Speech Driven Video Editing via an Audio-Conditioned Diffusion Model

Figure 2 for Speech Driven Video Editing via an Audio-Conditioned Diffusion Model

Figure 3 for Speech Driven Video Editing via an Audio-Conditioned Diffusion Model

Figure 4 for Speech Driven Video Editing via an Audio-Conditioned Diffusion Model

Abstract:In this paper we propose a method for end-to-end speech driven video editing using a denoising diffusion model. Given a video of a person speaking, we aim to re-synchronise the lip and jaw motion of the person in response to a separate auditory speech recording without relying on intermediate structural representations such as facial landmarks or a 3D face model. We show this is possible by conditioning a denoising diffusion model with audio spectral features to generate synchronised facial motion. We achieve convincing results on the task of unstructured single-speaker video editing, achieving a word error rate of 45% using an off the shelf lip reading model. We further demonstrate how our approach can be extended to the multi-speaker domain. To our knowledge, this is the first work to explore the feasibility of applying denoising diffusion models to the task of audio-driven video editing.

* 8 Pages, code and project page available here: https://danbigioi.github.io/DiffusionVideoEditing/

Via

Access Paper or Ask Questions

Event-based YOLO Object Detection: Proof of Concept for Forward Perception System

Jan 10, 2023

Waseem Shariff, Muhammad Ali Farooq, Joe Lemley, Peter Corcoran

Abstract:Neuromorphic vision or event vision is an advanced vision technology, where in contrast to the visible camera that outputs pixels, the event vision generates neuromorphic events every time there is a brightness change which exceeds a specific threshold in the field of view (FOV). This study focuses on leveraging neuromorphic event data for roadside object detection. This is a proof of concept towards building artificial intelligence (AI) based pipelines which can be used for forward perception systems for advanced vehicular applications. The focus is on building efficient state-of-the-art object detection networks with better inference results for fast-moving forward perception using an event camera. In this article, the event-simulated A2D2 dataset is manually annotated and trained on two different YOLOv5 networks (small and large variants). To further assess its robustness, single model testing and ensemble model testing are carried out.

* 7 pages, 9 figures, ICMV conference 2022

Via

Access Paper or Ask Questions

DroneAttention: Sparse Weighted Temporal Attention for Drone-Camera Based Activity Recognition

Dec 07, 2022

Santosh Kumar Yadav, Achleshwar Luthra, Esha Pahwa, Kamlesh Tiwari, Heena Rathore, Hari Mohan Pandey, Peter Corcoran

Figure 1 for DroneAttention: Sparse Weighted Temporal Attention for Drone-Camera Based Activity Recognition

Figure 2 for DroneAttention: Sparse Weighted Temporal Attention for Drone-Camera Based Activity Recognition

Figure 3 for DroneAttention: Sparse Weighted Temporal Attention for Drone-Camera Based Activity Recognition

Figure 4 for DroneAttention: Sparse Weighted Temporal Attention for Drone-Camera Based Activity Recognition

Abstract:Human activity recognition (HAR) using drone-mounted cameras has attracted considerable interest from the computer vision research community in recent years. A robust and efficient HAR system has a pivotal role in fields like video surveillance, crowd behavior analysis, sports analysis, and human-computer interaction. What makes it challenging are the complex poses, understanding different viewpoints, and the environmental scenarios where the action is taking place. To address such complexities, in this paper, we propose a novel Sparse Weighted Temporal Attention (SWTA) module to utilize sparsely sampled video frames for obtaining global weighted temporal attention. The proposed SWTA is comprised of two parts. First, temporal segment network that sparsely samples a given set of frames. Second, weighted temporal attention, which incorporates a fusion of attention maps derived from optical flow, with raw RGB images. This is followed by a basenet network, which comprises a convolutional neural network (CNN) module along with fully connected layers that provide us with activity recognition. The SWTA network can be used as a plug-in module to the existing deep CNN architectures, for optimizing them to learn temporal information by eliminating the need for a separate temporal stream. It has been evaluated on three publicly available benchmark datasets, namely Okutama, MOD20, and Drone-Action. The proposed model has received an accuracy of 72.76%, 92.56%, and 78.86% on the respective datasets thereby surpassing the previous state-of-the-art performances by a margin of 25.26%, 18.56%, and 2.94%, respectively.

* arXiv admin note: substantial text overlap with arXiv:2211.05531

Via

Access Paper or Ask Questions

SWTF: Sparse Weighted Temporal Fusion for Drone-Based Activity Recognition

Nov 10, 2022

Santosh Kumar Yadav, Esha Pahwa, Achleshwar Luthra, Kamlesh Tiwari, Hari Mohan Pandey, Peter Corcoran

Abstract:Drone-camera based human activity recognition (HAR) has received significant attention from the computer vision research community in the past few years. A robust and efficient HAR system has a pivotal role in fields like video surveillance, crowd behavior analysis, sports analysis, and human-computer interaction. What makes it challenging are the complex poses, understanding different viewpoints, and the environmental scenarios where the action is taking place. To address such complexities, in this paper, we propose a novel Sparse Weighted Temporal Fusion (SWTF) module to utilize sparsely sampled video frames for obtaining global weighted temporal fusion outcome. The proposed SWTF is divided into two components. First, a temporal segment network that sparsely samples a given set of frames. Second, weighted temporal fusion, that incorporates a fusion of feature maps derived from optical flow, with raw RGB images. This is followed by base-network, which comprises a convolutional neural network module along with fully connected layers that provide us with activity recognition. The SWTF network can be used as a plug-in module to the existing deep CNN architectures, for optimizing them to learn temporal information by eliminating the need for a separate temporal stream. It has been evaluated on three publicly available benchmark datasets, namely Okutama, MOD20, and Drone-Action. The proposed model has received an accuracy of 72.76%, 92.56%, and 78.86% on the respective datasets thereby surpassing the previous state-of-the-art performances by a significant margin.

Via

Access Paper or Ask Questions