Alibaba Group




Abstract:Recent advances in visual prompting in the natural image area have allowed users to interact with artificial intelligence (AI) tools through various visual marks such as box, point, and free-form shapes. However, due to the significant difference between the natural and remote sensing (RS) images, existing visual prompting models face challenges in RS scenarios. Moreover, RS MLLMs mainly focus on interpreting image-level RS data and only support interaction with language instruction, restricting flexibility applications in the real world. To address those limitations, a novel visual prompting model named EarthMarker is proposed, which excels in image-level, region-level, and point-level RS imagery interpretation. Specifically, the visual prompts alongside images and text instruction input into the large language model (LLM), adapt models toward specific predictions and tasks. Subsequently, a sharing visual encoding method is introduced to refine multi-scale image features and visual prompt information uniformly. Furthermore, to endow the EarthMarker with versatile multi-granularity visual perception abilities, the cross-domain phased learning strategy is developed, and the disjoint parameters are optimized in a lightweight manner by leveraging both the natural and RS domain-specific knowledge. In addition, to tackle the lack of RS visual prompting data, a dataset named RSVP featuring multi-modal fine-grained visual prompting instruction is constructed. Extensive experiments are conducted to demonstrate the proposed EarthMarker's competitive performance, representing a significant advance in multi-granularity RS imagery interpretation under the visual prompting learning framework.
Abstract:Directional modulation and artificial noise (AN)-based methods have been widely employed to achieve physical-layer security (PLS). However, these approaches can only achieve angle-dependent secure transmission. This paper presents an AN-aided decomposed and distributed directional modulation (D3M) scheme for secure wireless communications, which takes advantage of the spatial signatures to achieve an extra range-dimension security apart from the angles. Leveraging decomposed and distributed structure, each of modulated signal is represented by mutually orthogonal in-phase and quadrature branches, which are transmitted by two distributed transmitters to enhance PLS. In particular, we first aim to minimize transmit message power by integrated design of the transmit beamformers, subject to prescribed received signal-to-noise ratio (SNR) for the legitimate user (LU) and no inter-branch interference. This guarantees reliable and accurate transmission for the LU with the minimum transmit message power. Considering the leakage power on the sidelobes, AN is superimposed on the messages to try to mask the confidential information transmission. Simulation results demonstrate the security enhancement of our proposed D3M system.
Abstract:Orbital angular momentum (OAM) based mode hopping (MH) scheme is expected to be a potential anti-jamming technology in radio vortex wireless communications. However, it only uses one OAM-mode for hopping, thus resulting in low spectrum efficiency (SE). Index modulation offers a trade-off balance between the SE and performance reliability. In this paper, we propose an MH with OAM-based index modulation scheme, where several OAM-modes are activated for hopping, to achieve high SE at a given bit error rate in radio vortex wireless communications. Based on the proposed scheme, we derive the upper bound and lower bound of achievable SEs. Furthermore, in order to take advantage of index information, we derive the optimal hopped OAM-modes to achieve the maximum SE. Numerical results show that our proposed MH with index modulation scheme can achieve high SE while satisfying a certain reliability of radio vortex wireless communications.




Abstract:Score Distillation Sampling (SDS) by well-trained 2D diffusion models has shown great promise in text-to-3D generation. However, this paradigm distills view-agnostic 2D image distributions into the rendering distribution of 3D representation for each view independently, overlooking the coherence across views and yielding 3D inconsistency in generations. In this work, we propose \textbf{J}oint \textbf{S}core \textbf{D}istillation (JSD), a new paradigm that ensures coherent 3D generations. Specifically, we model the joint image distribution, which introduces an energy function to capture the coherence among denoised images from the diffusion model. We then derive the joint score distillation on multiple rendered views of the 3D representation, as opposed to a single view in SDS. In addition, we instantiate three universal view-aware models as energy functions, demonstrating compatibility with JSD. Empirically, JSD significantly mitigates the 3D inconsistency problem in SDS, while maintaining text congruence. Moreover, we introduce the Geometry Fading scheme and Classifier-Free Guidance (CFG) Switching strategy to enhance generative details. Our framework, JointDreamer, establishes a new benchmark in text-to-3D generation, achieving outstanding results with an 88.5\% CLIP R-Precision and 27.7\% CLIP Score. These metrics demonstrate exceptional text congruence, as well as remarkable geometric consistency and texture fidelity.
Abstract:Due to the crowded spectrum, frequency hopping (FH) techniques are now very difficult to achieve efficient antijamming and increase spectrum efficiency (SE) for wireless communications. The emerging orbital angular momentum (OAM), which is a property describing the helical phase fronts of electromagnetic waves, offers the potential to improve reliability and increase SE in wireless communications. To achieve efficient anti-jamming and increase SE of wireless communications with slight computational complexity cost, in this paper we propose an index-modulation embedded mode-hopping (IM-MH) scheme, which simultaneously activates several OAM-modes for hopping along with additional index information and signal information transmission. We analyze the average bit error rates (ABERs) for our proposed IM-MH scheme with perfect channel state information (CSI) and imperfect CSI, respectively. We also propose the index-modulation embedded double-serial MH (IMDSMH) scheme, which randomly activates one OAM-mode as the serial second hop to transmit the hopping signals in the IM-MH scheme, to further decrease the ABER of wireless communications. Extensive numerical results demonstrate that our proposed schemes within a narrowband can achieve the low ABER and significantly increase the SE. Also, the ABERs of our proposed IM-MH and IM-DSMH schemes are around 25% and 10%, respectively, compared with that of the mode hopping scheme.




Abstract:Full-duplex (FD) is an attractive technology that can significantly boost the throughput of wireless communications. However, it is limited by the severe self-interference (SI) from the transmitter to the local receiver. In this paper, we propose a new SI cancellation (SIC) scheme based on reconfigurable intelligent surface (RIS), where small RISs are deployed inside FD devices to enhance SIC capability and system capacity under frequencyselective fading channels. The novel scheme can not only address the challenges associated with SIC but also improve the overall performance. We first analyze the near-field behavior of the RIS and then formulate an optimization problem to maximize the SIC capability by controlling the reflection coefficients (RCs) of the RIS and allocating the transmit power of the device. The problem is solved with alternate optimization (AO) algorithm in three cases: ideal case, where both the amplitude and phase of each RIS unit cell can be controlled independently and continuously, continuous phases, where the phase of each RIS unit cell can be controlled independently, while the amplitude is fixed to one, and discrete phases, where the RC of each RIS unit cell can only take discrete values and these discrete values are equally spaced on the unit circle. For the ideal case, the closed-form solution to RC is derived with Karush-Kuhn-Tucker (KKT) conditions. Based on Riemannian conjugate gradient (RCG) algorithm, we optimize the RC for the case of continuous phases and then extend the solution to the case of discrete phases by the nearest point projection (NPP) method. Simulation results are given to validate the performance of our proposed SIC scheme.




Abstract:Task-Oriented Semantic Communication (TOSC) has been considered as a new communication paradigm to serve various samrt devices that depend on Artificial Intelligence (AI) tasks in future wireless networks. The existing TOSC frameworks rely on the Neural Network (NN) model to extract the semantic feature from the source data. The semantic feature, constituted by analog vectors of a lower dimensionality relative to the original source data, reserves the meaning of the source data. By conveying the semantic feature, TOSCs can significantly reduce the amount of data transmission while ensuring the correct execution of the AI-driven downstream task. However, standardized wireless networks depend on digital signal processing for data transmission, yet they necessitate the conveyance of semantic features that are inherently analog. Although existing TOSC frameworks developed the Deep Learning (DL) based \emph{analog approach} or conventional \emph{digital approach} to transmit the semantic feature, but there are still many challenging problems to urgently be solved in actual deployment. In this article, we first propose several challenging issues associated with the development of the TOSC framework in the standardized wireless network. Then, we develop a Digital-Analog transmission framework based TOSC (DA-TOSC) to resolve these challenging issues. Future research directions are discussed to further improve the DA-TOSC.




Abstract:Due to the challenges of satisfying the demands for communication efficiency and intelligent connectivity, sixth-generation (6G) wireless network requires new communication frameworks to enable effective information exchange and the integrated Artificial Intelligence (AI) and communication. The Deep Learning (DL) based semantic communication, which can integrate application requirements and the data meanings into data processing and transmission, is expected to become a new paradigm in 6G wireless networks. However, existing semantic communications frameworks rely on sending full semantic feature, which can maximize the semantic fidelity but fail to achieve the efficient semantic communications. In this article, we introduce a novel Scalable Extraction based Semantic Communication (SE-SC) model to support the potential applications in 6G wireless networks and then analyze its feasibility. Then, we propose a promising the SE-SC framework to highlight the potentials of SE-SC model in 6G wireless networks. Numerical results show that our proposed SE-SC scheme can offer an identical Quality of Service (QoS) for the downstream task with much fewer transmission symbols than the full semantic feature transmission and the traditional codec scheme. Finally, we discuss several challenges for further investigating the scalable extraction based semantic communications.




Abstract:Task-Oriented Semantic Communication (TOSC) has been regarded as a promising communication framework, serving for various Artificial Intelligence (AI) task driven applications. The existing TOSC frameworks focus on extracting the full semantic features of source data and learning low-dimensional channel inputs to transmit them within limited bandwidth resources. Although transmitting full semantic features can preserve the integrity of data meaning, this approach does not attain the performance threshold of the TOSC. In this paper, we propose a Task-oriented Adaptive Semantic Communication (TasCom) framework, which aims to effectively facilitate the execution of AI tasks by only sending task-related semantic features. In the TasCom framework, we first propose a Generative AI (GAI) architecture based Generative Joint Source-Channel Coding (G-JSCC) for efficient semantic transmission. Then, an Adaptive Coding Controller (ACC) is proposed to find the optimal coding scheme for the proposed G-JSCC, which allows the semantic features with significant contributions to the AI task to preferentially occupy limited bandwidth resources for wireless transmission. Furthermore, we propose a generative training algorithm to train the proposed TasCom for optimal performance. The simulation results show that the proposed TasCom outperforms the existing TOSC and traditional codec schemes on the object detection and instance segmentation tasks at all considered channel conditions.




Abstract:Reconfigurable intelligent surface (RIS) technology has emerged in recent years as a promising solution to the ever-increasing demand for wireless communication capacity. In practice, however, elements of RIS may suffer from phase deviations, which need to be properly estimated and calibrated. This paper models the problem of over-the-air (OTA) estimation of the RIS elements as a quasi-neural network (QNN) so that the phase estimates can be obtained using the classic backpropagation (BP) algorithm. We also derive the Cram\'{e}r Rao Bounds (CRBs) for the phases of the RIS elements as a benchmark of the proposed approach. The simulation results verify the effectiveness of the proposed algorithm by showing that the root mean square errors (RMSEs) of the phase estimates are close to the CRBs.