The recently proposed orthogonal time frequency space (OTFS) modulation, which is a typical Delay-Doppler (DD) communication scheme, has attracted significant attention thanks to its appealing performance over doubly-selective channels. In this paper, we present the fundamentals of general DD communications from the viewpoint of the Zak transform. We start our study by constructing DD domain basis functions aligning with the time-frequency (TF)-consistency condition, which are globally quasi-periodic and locally twisted-shifted. We unveil that these features are translated to unique signal structures in both time and frequency, which are beneficial for communication purposes. Then, we focus on the practical implementations of DD Nyquist communications, where we show that rectangular windows achieve perfect DD orthogonality, while truncated periodic signals can obtain sufficient DD orthogonality. Particularly, smoothed rectangular window with excess bandwidth can result in a slightly worse orthogonality but better pulse localization in the DD domain. Furthermore, we present a practical pulse shaping framework for general DD communications and derive the corresponding input-output relation under various shaping pulses. Our numerical results agree with our derivations and also demonstrate advantages of DD communications over conventional orthogonal frequency-division multiplexing (OFDM).
Self-supervised contrastive learning strategy has attracted remarkable attention due to its exceptional ability in representation learning. However, current contrastive learning tends to learn global coarse-grained representations of the image that benefit generic object recognition, whereas such coarse-grained features are insufficient for fine-grained visual recognition. In this paper, we present to incorporate the subtle local fine-grained feature learning into global self-supervised contrastive learning through a pure self-supervised global-local fine-grained contrastive learning framework. Specifically, a novel pretext task called Local Discrimination (LoDisc) is proposed to explicitly supervise self-supervised model's focus towards local pivotal regions which are captured by a simple-but-effective location-wise mask sampling strategy. We show that Local Discrimination pretext task can effectively enhance fine-grained clues in important local regions, and the global-local framework further refines the fine-grained feature representations of images. Extensive experimental results on different fine-grained object recognition tasks demonstrate that the proposed method can lead to a decent improvement in different evaluation settings. Meanwhile, the proposed method is also effective in general object recognition tasks.
In the last decade, Convolutional Neural Network with a multi-layer architecture has advanced rapidly. However, training its complex network is very space-consuming, since a lot of intermediate data are preserved across layers, especially when processing high-dimension inputs with a big batch size. That poses great challenges to the limited memory capacity of current accelerators (e.g., GPUs). Existing efforts mitigate such bottleneck by external auxiliary solutions with additional hardware costs, and internal modifications with potential accuracy penalty. Differently, our analysis reveals that computations intra- and inter-layers exhibit the spatial-temporal weak dependency and even complete independency features. That inspires us to break the traditional layer-by-layer (column) dataflow rule. Now operations are novelly re-organized into rows throughout all convolution layers. This lightweight design allows a majority of intermediate data to be removed without any loss of accuracy. We particularly study the weak dependency between two consecutive rows. For the resulting skewed memory consumption, we give two solutions with different favorite scenarios. Evaluations on two representative networks confirm the effectiveness. We also validate that our middle dataflow optimization can be smoothly embraced by existing works for better memory reduction.
This paper investigates the bit error rate (BER) minimum pre-coder design for an orthogonal time frequency space (OTFS)-based integrated sensing and communications (ISAC) system, which is considered as a promising technique for enabling future wireless networks. In particular, the BER minimum problem takes into account the maximized available transmission power and the required sensing performance. We devise the precoder from the perspective of delay-Doppler (DD) domain by exploiting the equivalent DD channel. To address the non-convex design problem, we resort to minimizing the lower bound of the derived average BER. Afterwards, we propose a computationally iterative method to solve the dual problem at low cost. Simulation results verify the effectiveness of our proposed precoder and reveal the interplay between sensing and communication for dual-functional precoder design.
This paper addresses the problem of direction-of-arrival (DOA) estimation for constant modulus (CM) source signals using a uniform or sparse linear array. Existing methods typically exploit either the Vandermonde structure of the steering matrix or the CM structure of source signals only. In this paper, we propose a structured matrix recovery technique (SMART) for CM DOA estimation via fully exploiting the two structures. In particular, we reformulate the highly nonconvex CM DOA estimation problems in the noiseless and noisy cases as equivalent rank-constrained Hankel-Toeplitz matrix recovery problems, in which the Vandermonde structure is captured by a series of Hankel-Toeplitz block matrices, of which the number equals the number of snapshots, and the CM structure is guaranteed by letting the block matrices share a same Toeplitz submatrix. The alternating direction method of multipliers (ADMM) is applied to solve the resulting rank-constrained problems and the DOAs are uniquely retrieved from the numerical solution. Extensive simulations are carried out to corroborate our analysis and confirm that the proposed SMART outperforms state-of-the-art algorithms in terms of the maximum number of locatable sources and statistical efficiency.
In this paper, we study the pulse shaping for delay-Doppler (DD) communications. We start with constructing a basis function in the DD domain following the properties of the Zak transform. Particularly, we show that the constructed basis functions are globally quasi-periodic while locally twisted-shifted, and their significance in time and frequency domains are then revealed. We further analyze the ambiguity function of the basis function, and show that fully localized ambiguity function can be achieved by constructing the basis function using periodic signals. More importantly, we prove that time and frequency truncating such basis functions naturally leads to delay and Doppler orthogonalities, if the truncating windows are orthogonal or periodic. Motivated by this, we propose a DD Nyquist pulse shaping scheme considering signals with periodicity. Finally, our conclusions are verified by using various orthogonal and periodic pulses.
This paper proposes an integrated sensing, navigation, and communication (ISNC) framework for safeguarding unmanned aerial vehicle (UAV)-enabled wireless networks against a mobile eavesdropping UAV (E-UAV). To cope with the mobility of the E-UAV, the proposed framework advocates the dual use of artificial noise transmitted by the information UAV (I-UAV) for simultaneous jamming and sensing to facilitate navigation and secure communication. In particular, the I-UAV communicates with legitimate downlink ground users, while avoiding potential information leakage by emitting jamming signals, and estimates the state of the E-UAV with an extended Kalman filter based on the backscattered jamming signals. Exploiting the estimated state of the E-UAV in the previous time slot, the I-UAV determines its flight planning strategy, predicts the wiretap channel, and designs its communication resource allocation policy for the next time slot. To circumvent the severe coupling between these three tasks, a divide-and-conquer approach is adopted. The online navigation design has the objective to minimize the distance between the I-UAV and a pre-defined destination point considering kinematic and geometric constraints. Subsequently, given the predicted wiretap channel, the robust resource allocation design is formulated as an optimization problem to achieve the optimal trade-off between sensing and communication in the next time slot, while taking into account the wiretap channel prediction error and the quality-of-service (QoS) requirements of secure communication. Simulation results demonstrate the superior performance of the proposed design compared with baseline schemes and validate the benefits of integrating sensing and navigation into secure UAV communication systems.
Image-text retrieval in remote sensing aims to provide flexible information for data analysis and application. In recent years, state-of-the-art methods are dedicated to ``scale decoupling'' and ``semantic decoupling'' strategies to further enhance the capability of representation. However, these previous approaches focus on either the disentangling scale or semantics but ignore merging these two ideas in a union model, which extremely limits the performance of cross-modal retrieval models. To address these issues, we propose a novel Scale-Semantic Joint Decoupling Network (SSJDN) for remote sensing image-text retrieval. Specifically, we design the Bidirectional Scale Decoupling (BSD) module, which exploits Salience Feature Extraction (SFE) and Salience-Guided Suppression (SGS) units to adaptively extract potential features and suppress cumbersome features at other scales in a bidirectional pattern to yield different scale clues. Besides, we design the Label-supervised Semantic Decoupling (LSD) module by leveraging the category semantic labels as prior knowledge to supervise images and texts probing significant semantic-related information. Finally, we design a Semantic-guided Triple Loss (STL), which adaptively generates a constant to adjust the loss function to improve the probability of matching the same semantic image and text and shorten the convergence time of the retrieval model. Our proposed SSJDN outperforms state-of-the-art approaches in numerical experiments conducted on four benchmark remote sensing datasets.
In 6G era, the space-air-ground integrated networks (SAGIN) are expected to provide global coverage and thus are required to support a wide range of emerging applications in hostile environments with high-mobility. In such scenarios, conventional orthogonal frequency division multiplexing (OFDM) modulation, which has been widely deployed in the cellular and Wi-Fi communications systems, will suffer from performance degradation due to high Doppler shift. To address this challenge, a new two-dimensional (2D) modulation scheme referred to as orthogonal time frequency space (OTFS) was proposed and has been recognized as an enabling technology for future high-mobility scenarios. In particular, OTFS modulates information in the delay-Doppler (DD) domain rather than the time-frequency (TF) domain for OFDM, providing the benefits of Doppler-resilience and delay-resilience, low signaling latency, low peak-to-average ratio (PAPR), and low-complexity implementation. Recent researches also show that the direct interaction of information and physical world in the DD domain makes OTFS an promising waveform for realizing integrated sensing and communications (ISAC). In this article, we will present a comprehensive survey of OTFS technology in 6G era, including the fundamentals, recent advances, and future works. Our aim is that this article could provide valuable references for all researchers working in the area of OTFS.
Beamforming design for intelligent reflecting surface (IRS)-assisted multi-user communication (IRS-MUC) systems critically depends on the acquisition of accurate channel state information (CSI). However, channel estimation (CE) in IRS-MUC systems causes a large signaling overhead for training due to the large number of IRS elements. In this paper, taking into account user mobility, we adopt a deep learning (DL) approach to implicitly learn the historical line-of-sight (LoS) channel features and predict the IRS phase shifts to be adopted for the next time slot for maximization of the weighted sum-rate (WSR) of the IRS-MUC system. With the proposed predictive approach, we can avoid full-scale CSI estimation and facilitate low-dimensional CE for transmit beamforming design such that the signaling overhead is reduced by a scale of $\frac{1}{N}$, where $N$ is the number of IRS elements. To this end, we first develop a universal DL-based predictive beamforming (DLPB) framework featuring a two-stage predictive-instantaneous beamforming mechanism. As a realization of the developed framework, a location-aware convolutional long short-term memory (CLSTM) graph neural network (GNN) is developed to facilitate effective predictive beamforming at the IRS, where a CLSTM module is first adopted to exploit the spatial and temporal features of the considered channels and a GNN is then applied to empower the designed neural network with high scalability and generalizability. Furthermore, in the second stage, based on the predicted IRS phase shifts, an instantaneous CSI-aware fully-connected neural network is designed to optimize the transmit beamforming at the access point. Simulation results demonstrate that the proposed framework not only achieves a better WSR performance and requires a lower CE overhead compared with state-of-the-art benchmarks, but also is highly scalable in the numbers of users.