Applications of deep learning to the radio frequency (RF) domain have largely concentrated on the task of narrowband signal classification after the signals of interest have already been detected and extracted from a wideband capture. To encourage broader research with wideband operations, we introduce the WidebandSig53 (WBSig53) dataset which consists of 550 thousand synthetically-generated samples from 53 different signal classes containing approximately 2 million unique signals. We extend the TorchSig signal processing machine learning toolkit for open-source and customizable generation, augmentation, and processing of the WBSig53 dataset. We conduct experiments using state of the art (SoTA) convolutional neural networks and transformers with the WBSig53 dataset. We investigate the performance of signal detection tasks, i.e. detect the presence, time, and frequency of all signals present in the input data, as well as the performance of signal recognition tasks, where networks detect the presence, time, frequency, and modulation family of all signals present in the input data. Two main approaches to these tasks are evaluated with segmentation networks and object detection networks operating on complex input spectrograms. Finally, we conduct comparative analysis of the various approaches in terms of the networks' mean average precision, mean average recall, and the speed of inference.
Despite the recent success of image generation and style transfer with Generative Adversarial Networks (GANs), hair synthesis and style transfer remain challenging due to the shape and style variability of human hair in in-the-wild conditions. The current state-of-the-art hair synthesis approaches struggle to maintain global composition of the target style and cannot be used in real-time applications due to their high running costs on high-resolution portrait images. Therefore, We propose a novel hairstyle transfer method, called EHGAN, which reduces computational costs to enable real-time processing while improving the transfer of hairstyle with better global structure compared to the other state-of-the-art hair synthesis methods. To achieve this goal, we train an encoder and a low-resolution generator to transfer hairstyle and then, increase the resolution of results with a pre-trained super-resolution model. We utilize Adaptive Instance Normalization (AdaIN) and design our novel Hair Blending Block (HBB) to obtain the best performance of the generator. EHGAN needs around 2.7 times and over 10,000 times less time consumption than the state-of-the-art MichiGAN and LOHO methods respectively while obtaining better photorealism and structural similarity to the desired style than its competitors.
Unmanned Aerial Vehicles(UAVs) are attaining more and more maneuverability and sensory ability as a promising teleoperation platform for intelligent interaction with the environments. This work presents a novel 5-degree-of-freedom (DoF) unmanned aerial vehicle (UAV) cyber-physical system for aerial manipulation. This UAV's body is capable of exerting powerful propulsion force in the longitudinal direction, decoupling the translational dynamics and the rotational dynamics on the longitudinal plane. A high-level impedance control law is proposed to drive the vehicle for trajectory tracking and interaction with the environments. In addition, a vision-based real-time target identification and tracking method integrating a YOLO v3 real-time object detector with feature tracking, and morphological operations is proposed to be implemented onboard the vehicle with support of model compression techniques to eliminate latency caused by video wireless transmission and heavy computation burden on traditional teleoperation platforms.
Remote photoplethysmography (rPPG) enables non-contact heart rate (HR) estimation from facial videos which gives significant convenience compared with traditional contact-based measurements. In the real-world long-term health monitoring scenario, the distance of the participants and their head movements usually vary by time, resulting in the inaccurate rPPG measurement due to the varying face resolution and complex motion artifacts. Different from the previous rPPG models designed for a constant distance between camera and participants, in this paper, we propose two plug-and-play blocks (i.e., physiological signal feature extraction block (PFE) and temporal face alignment block (TFA)) to alleviate the degradation of changing distance and head motion. On one side, guided with representative-area information, PFE adaptively encodes the arbitrary resolution facial frames to the fixed-resolution facial structure features. On the other side, leveraging the estimated optical flow, TFA is able to counteract the rPPG signal confusion caused by the head movement thus benefit the motion-robust rPPG signal recovery. Besides, we also train the model with a cross-resolution constraint using a two-stream dual-resolution framework, which further helps PFE learn resolution-robust facial rPPG features. Extensive experiments on three benchmark datasets (UBFC-rPPG, COHFACE and PURE) demonstrate the superior performance of the proposed method. One highlight is that with PFE and TFA, the off-the-shelf spatio-temporal rPPG models can predict more robust rPPG signals under both varying face resolution and severe head movement scenarios. The codes are available at https://github.com/LJW-GIT/Arbitrary_Resolution_rPPG.
The emergence of latency-critical AI applications has been supported by the evolution of the edge computing paradigm. However, edge solutions are typically resource-constrained, posing reliability challenges due to heightened contention for compute and communication capacities and faulty application behavior in the presence of overload conditions. Although a large amount of generated log data can be mined for fault prediction, labeling this data for training is a manual process and thus a limiting factor for automation. Due to this, many companies resort to unsupervised fault-tolerance models. Yet, failure models of this kind can incur a loss of accuracy when they need to adapt to non-stationary workloads and diverse host characteristics. To cope with this, we propose a novel modeling approach, called DeepFT, to proactively avoid system overloads and their adverse effects by optimizing the task scheduling and migration decisions. DeepFT uses a deep surrogate model to accurately predict and diagnose faults in the system and co-simulation based self-supervised learning to dynamically adapt the model in volatile settings. It offers a highly scalable solution as the model size scales by only 3 and 1 percent per unit increase in the number of active tasks and hosts. Extensive experimentation on a Raspberry-Pi based edge cluster with DeFog benchmarks shows that DeepFT can outperform state-of-the-art baseline methods in fault-detection and QoS metrics. Specifically, DeepFT gives the highest F1 scores for fault-detection, reducing service deadline violations by up to 37\% while also improving response time by up to 9%.
NeRFs have revolutionized the world of per-scene radiance field reconstruction because of their intrinsic compactness. One of the main limitations of NeRFs is their slow rendering speed, both at training and inference time. Recent research focuses on the optimization of an explicit voxel grid (EVG) that represents the scene, which can be paired with neural networks to learn radiance fields. This approach significantly enhances the speed both at train and inference time, but at the cost of large memory occupation. In this work we propose Re:NeRF, an approach that specifically targets EVG-NeRFs compressibility, aiming to reduce memory storage of NeRF models while maintaining comparable performance. We benchmark our approach with three different EVG-NeRF architectures on four popular benchmarks, showing Re:NeRF's broad usability and effectiveness.
In this paper we offer a review and bibliography of work on Hankel low-rank approximation and completion, with particular emphasis on how this methodology can be used for time series analysis and forecasting. We begin by describing possible formulations of the problem and offer commentary on related topics and challenges in obtaining globally optimal solutions. Key theorems are provided, and the paper closes with some expository examples.
Photorealistic rendering of real-world scenes is a tremendous challenge with a wide range of applications, including mixed reality (MR), and virtual reality (VR). Neural networks, which have long been investigated in the context of solving differential equations, have previously been introduced as implicit representations for photorealistic rendering. However, realistic rendering using classic computing is challenging because it requires time-consuming optical ray marching, and suffer computational bottlenecks due to the curse of dimensionality. In this paper, we propose Quantum Radiance Fields (QRF), which integrate the quantum circuit, quantum activation function, and quantum volume rendering for implicit scene representation. The results indicate that QRF not only exploits the advantage of quantum computing, such as high speed, fast convergence, and high parallelism, but also ensure high quality of volume rendering.
Deep learning has been widely used in various applications from different fields such as computer vision, natural language processing, etc. However, the training models are often manually developed via many costly experiments. This manual work usually requires substantial computing resources, time, and experience. To simplify the use of deep learning and alleviate human effort, automated deep learning has emerged as a potential tool that releases the burden for both users and researchers. Generally, an automatic approach should support the diversity of model selection and the evaluation should allow users to decide upon their demands. To that end, we propose a multi-objective end-to-end self-optimized approach for constructing deep learning models automatically. Experimental results on well-known datasets such as MNIST, Fashion, and Cifar10 show that our algorithm can discover various competitive models compared with the state-of-the-art approach. In addition, our approach also introduces multi-objective trade-off solutions for both accuracy and uncertainty metrics for users to make better decisions.
Recently, deep learning methods have shown promising results in point cloud compression. For octree-based point cloud compression, previous works show that the information of ancestor nodes and sibling nodes are equally important for predicting current node. However, those works either adopt insufficient context or bring intolerable decoding complexity (e.g. >600s). To address this problem, we propose a sufficient yet efficient context model and design an efficient deep learning codec for point clouds. Specifically, we first propose a window-constrained multi-group coding strategy to exploit the autoregressive context while maintaining decoding efficiency. Then, we propose a dual transformer architecture to utilize the dependency of current node on its ancestors and siblings. We also propose a random-masking pre-train method to enhance our model. Experimental results show that our approach achieves state-of-the-art performance for both lossy and lossless point cloud compression. Moreover, our multi-group coding strategy saves 98% decoding time compared with previous octree-based compression method.