Abstract:Artificial intelligence (AI) techniques, particularly autoencoders (AEs), have gained significant attention in wireless communication systems. This paper investigates using an AE to generate featureless signals with a low probability of detection and interception (LPD/LPI). Firstly, we introduce a novel loss function that adds a KL divergence term to the categorical cross entropy, enhancing the noise like characteristics of AE-generated signals while preserving block error rate (BLER). Secondly, to support long source message blocks for the AE's inputs, we replace one-hot inputs of source blocks with binary inputs pre-encoded by conventional error correction coding schemes. The AE's outputs are then decoded back to the source blocks using the same scheme. This design enables the AE to learn the coding structure, yielding superior BLER performance on coded blocks and the BLER of the source blocks is further decreased by the error correction decoder. Moreover, we also validate the AE based communication system in the over-the-air communication. Experimental results demonstrate that our proposed methods improve the featureless properties of AE signals and significantly reduce the BLER of message blocks, underscoring the promise of our AE-based approach for secure and reliable wireless communication systems.
Abstract:Multi-label Recognition (MLR) involves assigning multiple labels to each data instance in an image, offering advantages over single-label classification in complex scenarios. However, it faces the challenge of annotating all relevant categories, often leading to uncertain annotations, such as unseen or incomplete labels. Recent Vision and Language Pre-training (VLP) based methods have made significant progress in tackling zero-shot MLR tasks by leveraging rich vision-language correlations. However, the correlation between multi-label semantics has not been fully explored, and the learned visual features often lack essential semantic information. To overcome these limitations, we introduce a Semantic-guided Representation Learning approach (SigRL) that enables the model to learn effective visual and textual representations, thereby improving the downstream alignment of visual images and categories. Specifically, we first introduce a graph-based multi-label correlation module (GMC) to facilitate information exchange between labels, enriching the semantic representation across the multi-label texts. Next, we propose a Semantic Visual Feature Reconstruction module (SVFR) to enhance the semantic information in the visual representation by integrating the learned textual representation during reconstruction. Finally, we optimize the image-text matching capability of the VLP model using both local and global features to achieve zero-shot MLR. Comprehensive experiments are conducted on several MLR benchmarks, encompassing both zero-shot MLR (with unseen labels) and single positive multi-label learning (with limited labels), demonstrating the superior performance of our approach compared to state-of-the-art methods. The code is available at https://github.com/MVL-Lab/SigRL.