Alibaba Group




Abstract:Remote sensing images usually characterized by complex backgrounds, scale and orientation variations, and large intra-class variance. General semantic segmentation methods usually fail to fully investigate the above issues, and thus their performances on remote sensing image segmentation are limited. In this paper, we propose our LOGCAN++, a semantic segmentation model customized for remote sensing images, which is made up of a Global Class Awareness (GCA) module and several Local Class Awareness (LCA) modules. The GCA module captures global representations for class-level context modeling to reduce the interference of background noise. The LCA module generates local class representations as intermediate perceptual elements to indirectly associate pixels with the global class representations, targeting at dealing with the large intra-class variance problem. In particular, we introduce affine transformations in the LCA module for adaptive extraction of local class representations to effectively tolerate scale and orientation variations in remotely sensed images. Extensive experiments on three benchmark datasets show that our LOGCAN++ outperforms current mainstream general and remote sensing semantic segmentation methods and achieves a better trade-off between speed and accuracy. Code is available at https://github.com/xwmaxwma/rssegmentation.




Abstract:Remote sensing change detection aims to compare two or more images recorded for the same area but taken at different time stamps to quantitatively and qualitatively assess changes in geographical entities and environmental factors. Mainstream models usually built on pixel-by-pixel change detection paradigms, which cannot tolerate the diversity of changes due to complex scenes and variation in imaging conditions. To address this shortcoming, this paper rethinks the change detection with the mask view, and further proposes the corresponding: 1) meta-architecture CDMask and 2) instance network CDMaskFormer. Components of CDMask include Siamese backbone, change extractor, pixel decoder, transformer decoder and normalized detector, which ensures the proper functioning of the mask detection paradigm. Since the change query can be adaptively updated based on the bi-temporal feature content, the proposed CDMask can adapt to different latent data distributions, thus accurately identifying regions of interest changes in complex scenarios. Consequently, we further propose the instance network CDMaskFormer customized for the change detection task, which includes: (i) a Spatial-temporal convolutional attention-based instantiated change extractor to capture spatio-temporal context simultaneously with lightweight operations; and (ii) a scene-guided axial attention-instantiated transformer decoder to extract more spatial details. State-of-the-art performance of CDMaskFormer is achieved on five benchmark datasets with a satisfactory efficiency-accuracy trade-off. Code is available at https://github.com/xwmaxwma/rschange.




Abstract:Generating city-scale lane-level maps faces significant challenges due to the intricate urban environments, such as blurred or absent lane markings. Additionally, a standard lane-level map requires a comprehensive organization of lane groupings, encompassing lane direction, style, boundary, and topology, yet has not been thoroughly examined in prior research. These obstacles result in labor-intensive human annotation and high maintenance costs. This paper overcomes these limitations and presents an industrial-grade solution named DuMapNet that outputs standardized, vectorized map elements and their topology in an end-to-end paradigm. To this end, we propose a group-wise lane prediction (GLP) system that outputs vectorized results of lane groups by meticulously tailoring a transformer-based network. Meanwhile, to enhance generalization in challenging scenarios, such as road wear and occlusions, as well as to improve global consistency, a contextual prompts encoder (CPE) module is proposed, which leverages the predicted results of spatial neighborhoods as contextual information. Extensive experiments conducted on large-scale real-world datasets demonstrate the superiority and effectiveness of DuMapNet. Additionally, DuMap-Net has already been deployed in production at Baidu Maps since June 2023, supporting lane-level map generation tasks for over 360 cities while bringing a 95% reduction in costs. This demonstrates that DuMapNet serves as a practical and cost-effective industrial solution for city-scale lane-level map generation.




Abstract:Remote sensing change detection (RSCD) aims to identify the changes of interest in a region by analyzing multi-temporal remote sensing images, and has an outstanding value for local development monitoring. Existing RSCD methods are devoted to contextual modeling in the spatial domain to enhance the changes of interest. Despite the satisfactory performance achieved, the lack of knowledge in the frequency domain limits the further improvement of model performance. In this paper, we propose DDLNet, a RSCD network based on dual-domain learning (i.e., frequency and spatial domains). In particular, we design a Frequency-domain Enhancement Module (FEM) to capture frequency components from the input bi-temporal images using Discrete Cosine Transform (DCT) and thus enhance the changes of interest. Besides, we devise a Spatial-domain Recovery Module (SRM) to fuse spatiotemporal features for reconstructing spatial details of change representations. Extensive experiments on three benchmark RSCD datasets demonstrate that the proposed method achieves state-of-the-art performance and reaches a more satisfactory accuracy-efficiency trade-off. Our code is publicly available at https://github.com/xwmaxwma/rschange.




Abstract:Communication infrastructure is often severely disrupted in post-disaster areas, which interrupts communications and impedes rescue. Recently, the technology of reconfigurable intelligent surface (RIS)-equipped-UAV has been investigated as a feasible approach to assist communication under such conditions. However, the channel characteristics in the post-disaster area rapidly change due to the topographical changes caused by secondary disasters and the high mobility of UAVs. In this paper we develop a new fading-shadowing model to fit the path loss caused by the debris. Following this, we derive the exact distribution of the new channel statistics for a small number of RIS elements and the approximate distribution for a large number of RIS elements, respectively. Then, we derive the closed-form expressions for performance analysis, including average capacity (AC), energy efficiency (EE), and outage probability (OP). Based on the above analytical derivations, we maximize the energy efficiency by optimizing the number of RIS elements and the coverage area by optimizing the altitude of the RIS-equipped UAV, respectively. Finally, simulation results validate the accuracy of derived expressions and show insights related to the optimal number of RIS elements and the optimal UAV altitude for emergency wireless communication (EWC).




Abstract:As the demand for programming skills grows across industries and academia, students often turn to Programming Online Judge (POJ) platforms for coding practice and competition. The difficulty level of each programming problem serves as an essential reference for guiding students' adaptive learning. However, current methods of determining difficulty levels either require extensive expert annotations or take a long time to accumulate enough student solutions for each problem. To address this issue, we formulate the problem of automatic difficulty level estimation of each programming problem, given its textual description and a solution example of code. For tackling this problem, we propose to couple two pre-trained models, one for text modality and the other for code modality, into a unified model. We built two POJ datasets for the task and the results demonstrate the effectiveness of the proposed approach and the contributions of both modalities.
Abstract:Wireless devices can be easily attacked by jammers during transmission, which is a potential security threat for wireless communications. Active reconfigurable intelligent surface (RIS) attracts considerable attention and is expected to be employed in anti-jamming systems for secure transmission to significantly enhance the anti-jamming performance. However, active RIS introduces external power load, which increases the complexity of hardware and restricts the flexible deployment of active RIS. To overcome these drawbacks, we design a innovative self-sustainable structure in this paper, where the active RIS is energized by harvesting energy from base station (BS) signals through the time dividing based simultaneous wireless information and power transfer (TD-SWIPT) scheme. Based on the above structure, we develop the BS harvesting scheme based on joint transmit and reflecting beamforming with the aim of maximizing the achievable rate of active RIS-assisted system, where the alternating optimization (AO) algorithm based on stochastic successive convex approximation (SSCA) tackles the nonconvex optimization problem in the scheme. Simulation results verified the effectiveness of our developed BS harvesting scheme, which can attain higher anti-jamming performance than other schemes when given the same maximum transmit power.




Abstract:Although the emerging reconfigurable intelligent surface (RIS) paves a new way for next-generation wireless communications, it suffers from inherent flaws, i.e., double-fading attenuation effects and half-space coverage limitations. The state-of-the-art double-face active (DFA)-RIS architecture is proposed for significantly amplifying and transmitting incident signals in full-space. Despite the efficacy of DFA-RIS in mitigating the aforementioned flaws, its potential drawback is that the complex active hardware also incurs intolerable energy consumption. To overcome this drawback, in this paper we propose a novel dynamic energy-saving design for the DFA-RIS, called the sub-array based DFA-RIS architecture. This architecture divides the DFA-RIS into multiple sub-arrays, where the signal amplification function in each sub-array can be activated/deactivated dynamically and flexibly. Utilizing the above architecture, we develop the joint optimization scheme based on transmit beamforming, DFA-RIS configuration, and reflection amplifier (RA) operating pattern to maximize the energy efficiency (EE) of the DFA-RIS assisted multiuser MISO system considering the perfect/imperfect channel state information (CSI) case. Then, the penalty dual decomposition (PDD) based alternating optimization (AO) algorithm and the constrained stochastic majorization-minimization (CSMM) based AO algorithm address non-convex problems in the perfect/imperfect CSI case, respectively. Simulation results verified that our proposed sub-array based DFA-RIS architecture can benefit the EE of the system more than other RIS architectures.
Abstract:To satisfy the various demands of growing devices and services, emerging high-frequency-based technologies promote near-field wireless communications. Therefore, near-field physical layer security has attracted much attention to facilitate the wireless information security against illegitimate eavesdropping. However, highly correlated channels between legitimate transceivers and eavesdroppers of existing multiple-input multiple-output (MIMO) based near-field secure technologies along with the low degrees of freedom significantly limit the enhancement of security results in wireless communications. To significantly increase the secrecy rates of near-field wireless communications, in this paper we propose the double-reconfigurable-intelligent-surface (RIS) assisted orbital angular momentum (OAM) secure scheme, where RISs with few reflecting elements are easily deployed to reconstruct the direct links blocked by obstacles between the legitimate transceivers, mitigate the inter-mode interference caused by the misalignment of legitimate transceivers, and adjust the OAM beams direction to interfere with eavesdroppers. Meanwhile, due to the unique orthogonality among OAM modes, the OAM-based joint index modulation and artificial noise scheme is proposed to weaken the information acquisition by eavesdroppers while increasing the achievable rate with the low cost of legitimate communications. To maximize the secrecy rate of our proposed scheme, we develop the Riemannian manifold conjugate gradient (RMCG)-based alternative optimization (AO) algorithm to jointly optimize the transmit power allocation of OAM modes and phase shifts of double RISs. Numerical results show that our proposed double-RIS-assisted OAM near-field secure scheme outperforms the existing works in terms of the secrecy rate and the eavesdropper's bit error rate.




Abstract:The spectrum share and open nature of wireless channels enable integrated sensing and communication (ISAC) susceptible to hostile jamming attacks. Due to the intrinsic orthogonality and rich azimuth angle information of orbital angular momentum (OAM), vortex electromagnetic waves with helical phase fronts have shown great potential to achieve high-resolution imaging and strong anti-jamming capability of wireless communication. Focusing on significantly enhancing the anti-jamming results of ISAC systems with limited bandwidth under hostile jamming, in this paper we propose a novel ISAC for anti-jamming with OAM scheme, where the OAM legitimate transmitter can simultaneously sense the position of jammers with dynamic behavior and send data to multiple OAM legitimate users. Specifically, the OAM modes for sensing and communications are respectively hopped according to pre-set index modulation information to suppress jamming. To acquire the position of the jammer, we develop the enhanced multiple-signal-classification-based three-dimension position estimation scheme with continuous sensing in both frequency and angular domains, where the OAM transmitter is designed with the concentric uniform-circular-array mono-static method, to significantly increase the azimuthal resolution. Then, based on the acquired jamming channel state information, we develop the joint transmit-receive beamforming and power allocation scheme, where the transmit and receive beamforming matrices are dynamically adjusted to mitigate the mixed interference containing inter-mode interference, inter-user interference, and jamming, thus maximizing the achievable sum rates (ASRs) of all users. Numerical results demonstrate that our proposed scheme can significantly increase the ASR under broadband jamming attacks and achieve high detection accuracy of targets .