Speech super-resolution (SSR) aims to predict a high resolution (HR) speech signal from its low resolution (LR) corresponding part. Most neural SSR models focus on producing the final result in a noise-free environment by recovering the spectrogram of high-frequency part of the signal and concatenating it with the original low-frequency part. Although these methods achieve high accuracy, they become less effective when facing the real-world scenario, where unavoidable noise is present. To address this problem, we propose a Super Denoise Net (SDNet), a neural network for a joint task of super-resolution and noise reduction from a low sampling rate signal. To that end, we design gated convolution and lattice convolution blocks to enhance the repair capability and capture information in the time-frequency axis, respectively. The experiments show our method outperforms baseline speech denoising and SSR models on DNS 2020 no-reverb test set with higher objective and subjective scores.
We present a new method to adapt an RGB-trained water segmentation network to target-domain aerial thermal imagery using online self-supervision by leveraging texture and motion cues as supervisory signals. This new thermal capability enables current autonomous aerial robots operating in near-shore environments to perform tasks such as visual navigation, bathymetry, and flow tracking at night. Our method overcomes the problem of scarce and difficult-to-obtain near-shore thermal data that prevents the application of conventional supervised and unsupervised methods. In this work, we curate the first aerial thermal near-shore dataset, show that our approach outperforms fully-supervised segmentation models trained on limited target-domain thermal data, and demonstrate real-time capabilities onboard an Nvidia Jetson embedded computing platform. Code and datasets used in this work will be available at: https://github.com/connorlee77/uav-thermal-water-segmentation.
Deep learning has been widely used in radio frequency (RF) fingerprinting. Despite its excellent performance, most existing methods only consider a closed-set assumption, which cannot effectively tackle signals emitted from those unknown devices that have never been seen during training. In this letter, we exploit prototype learning for open-set RF fingerprinting and propose two improvements, including consistency-based regularization and online label smoothing, which aim to learn a more robust feature space. Experimental results on a real-world RF dataset demonstrate that our proposed measures can significantly improve prototype learning to achieve promising open-set recognition performance for RF fingerprinting.
As a revolutionary generative paradigm of deep learning, generative adversarial networks (GANs) have been widely applied in various fields to synthesize realistic data. However, it is challenging for conventional GANs to synthesize raw signal data, especially in some complex cases. In this paper, we develop a novel GAN framework for radio generation called "Radio GAN". Compared to conventional methods, it benefits from three key improvements. The first is learning based on sampling points, which aims to model an underlying sampling distribution of radio signals. The second is an unrolled generator design, combined with an estimated pure signal distribution as a prior, which can greatly reduce learning difficulty and effectively improve learning precision. Finally, we present an energy-constrained optimization algorithm to achieve better training stability and convergence. Experimental results with extensive simulations demonstrate that our proposed GAN framework can effectively learn transmitter characteristics and various channel effects, thus accurately modeling for an underlying sampling distribution to synthesize radio signals of high quality.
Speech super-resolution (SSR) aims to recover a high resolution (HR) speech from its corresponding low resolution (LR) counterpart. Recent SSR methods focus more on the reconstruction of the magnitude spectrogram, ignoring the importance of phase reconstruction, thereby limiting the recovery quality. To address this issue, we propose mdctGAN, a novel SSR framework based on modified discrete cosine transform (MDCT). By adversarial learning in the MDCT domain, our method reconstructs HR speeches in a phase-aware manner without vocoders or additional post-processing. Furthermore, by learning frequency consistent features with self-attentive mechanism, mdctGAN guarantees a high quality speech reconstruction. For VCTK corpus dataset, the experiment results show that our model produces natural auditory quality with high MOS and PESQ scores. It also achieves the state-of-the-art log-spectral-distance (LSD) performance on 48 kHz target resolution from various input rates. Code is available from https://github.com/neoncloud/mdctGAN
As a promising non-password authentication technology, radio frequency (RF) fingerprinting can greatly improve wireless security. Recent work has shown that RF fingerprinting based on deep learning can significantly outperform conventional approaches. The superiority, however, is mainly attributed to supervised learning using a large amount of labeled data, and it significantly degrades if only limited labeled data is available, making many existing algorithms lack practicability. Considering that it is often easier to obtain enough unlabeled data in practice with minimal resources, we leverage deep semi-supervised learning for RF fingerprinting, which largely relies on a composite data augmentation scheme designed for radio signals, combined with two popular techniques: consistency-based regularization and pseudo-labeling. Experimental results on both simulated and real-world datasets demonstrate that our proposed method for semi-supervised RF fingerprinting is far superior to other competing ones, and it can achieve remarkable performance almost close to that of fully supervised learning with a very limited number of examples.
Reconfigurable intelligent surface (RIS) is a revolutionary technology that can customize the wireless channel and improve the energy efficiency of next-generation cellular networks. This letter proposes an environment-aware codebook design by employing the statistical channel state information (CSI) for RIS-assisted multiple-input single-output (MISO) systems. Specifically, first of all, we generate multiple virtual channels offline by utilizing the location information and design an environment-aware reflection coefficient codebook. Thus, we only need to estimate the composite channel and optimize the active transmit beamforming for each reflection coefficient in the pre-designed codebook, while simplifying the reflection optimization substantially. Moreover, we analyze the theoretical performance of the proposed scheme. Finally, numerical results verify the performance benefits of the proposed scheme over the cascaded channel estimation and passive beamforming as well as the existing codebook scheme in the face of channel estimation errors, albeit its significantly reduced overhead and complexity.
The user-centric cell-free network has emerged as an appealing technology to improve the next-generation wireless network's capacity thanks to its ability to eliminate inter-cell interference effectively. However, the cell-free network inevitably brings in higher hardware cost and backhaul overhead as a larger number of base stations (BSs) are deployed. Additionally, severe channel fading in high-frequency bands constitutes another crucial issue that limits the practical application of the cell-free network. In order to address the above challenges, we amalgamate the cell-free system with another emerging technology, namely reconfigurable intelligent surface (RIS), which can provide high spectrum and energy efficiency with low hardware cost by reshaping the wireless propagation environment intelligently. To this end, we formulate a weighted sum-rate (WSR) maximization problem for RIS-assisted cell-free systems by jointly optimizing the BS precoding matrix and the RIS reflection coefficient vector. Subsequently, we transform the complicated WSR problem to a tractable optimization problem and propose a distributed cooperative alternating direction method of multipliers (ADMM) to fully utilize parallel computing resources. Inspired by the model-based algorithm unrolling concept, we unroll our solver to a learning-based deep distributed ADMM (D-ADMM) network framework. To improve the efficiency of the D-ADMM in distributed BSs, we develop a monodirectional information exchange strategy with a small signaling overhead. In addition to benefiting from domain knowledge, D-ADMM adaptively learns hyper-parameters and non-convex solvers of the intractable RIS design problem through data-driven end-to-end training.
This work presents a new method for unsupervised thermal image classification and semantic segmentation by transferring knowledge from the RGB domain using a multi-domain attention network. Our method does not require any thermal annotations or co-registered RGB-thermal pairs, enabling robots to perform visual tasks at night and in adverse weather conditions without incurring additional costs of data labeling and registration. Current unsupervised domain adaptation methods look to align global images or features across domains. However, when the domain shift is significantly larger for cross-modal data, not all features can be transferred. We solve this problem by using a shared backbone network that promotes generalization, and domain-specific attention that reduces negative transfer by attending to domain-invariant and easily-transferable features. Our approach outperforms the state-of-the-art RGB-to-thermal adaptation method in classification benchmarks, and is successfully applied to thermal river scene segmentation using only synthetic RGB images. Our code is made publicly available at https://github.com/ganlumomo/thermal-uda-attention.
Reconfigurable intelligent surface (RIS) has recently emerged as a promising technology enabling next-generation wireless networks. In this paper, we develop an improved index modulation (IM) scheme by utilizing RIS to convey information. Specifically, we study an RIS-aided multiple-input single-output (MISO) system, in which the information bits are conveyed by reflection patterns of RIS rather than the conventional amplitude-phase constellation. Furthermore, the K-means algorithm is employed to optimize the reflection constellation to improve the error performance. Also, we propose a generalized Gray coding method for mapping information bits to an appropriate reflection constellation and analytically evaluate the error performance of the proposed scheme by deriving a closed-form expression of the average bit error rate (BER). Finally, numerical results verify the accuracy of our theoretical analysis as well as the substantially improved BER performance of the proposed RIS-based IM scheme.