In this paper, we investigate multi-user modular extremely large-scale multiple-input multiple-output (XL-MIMO) communication systems, where modular extremely large-scale uniform linear array (XL-ULA) is deployed at the base station (BS) to serve multiple single-antenna users. By exploiting the unique modular array architecture and considering the potential near-field propagation, we develop sub-array based uniform spherical wave (USW) models for distinct versus common angles of arrival/departure (AoAs/AoDs) with respect to different sub-arrays/modules, respectively. Under such USW models, we analyze the beam focusing patterns at the near-field observation location by using near-field beamforming. The analysis reveals that compared to the conventional XL-MIMO with collocated antenna elements, modular XL-MIMO can provide better spatial resolution by benefiting from its larger array aperture. However, it also incurs undesired grating lobes due to the large inter-module separation. Moreover, it is found that for multi-user modular XL-MIMO communications, the achievable signal-to-interference-plus-noise ratio (SINR) for users may be degraded by the grating lobes of the beam focusing pattern. To address this issue, an efficient user grouping method is proposed for multi-user transmission scheduling, so that users located within the grating lobes of each other are not allocated to the same time-frequency resource block (RB) for their communications. Numerical results are presented to verify the effectiveness of the proposed user grouping method, as well as the superior performance of modular XL-MIMO over its collocated counterpart with densely distributed users.
The goal of sequential recommendation (SR) is to predict a user's potential interested items based on her/his historical interaction sequences. Most existing sequential recommenders are developed based on ID features, which, despite their widespread use, often underperform with sparse IDs and struggle with the cold-start problem. Besides, inconsistent ID mappings hinder the model's transferability, isolating similar recommendation domains that could have been co-optimized. This paper aims to address these issues by exploring the potential of multi-modal information in learning robust and generalizable sequence representations. We propose MISSRec, a multi-modal pre-training and transfer learning framework for SR. On the user side, we design a Transformer-based encoder-decoder model, where the contextual encoder learns to capture the sequence-level multi-modal synergy while a novel interest-aware decoder is developed to grasp item-modality-interest relations for better sequence representation. On the candidate item side, we adopt a dynamic fusion module to produce user-adaptive item representation, providing more precise matching between users and items. We pre-train the model with contrastive learning objectives and fine-tune it in an efficient manner. Extensive experiments demonstrate the effectiveness and flexibility of MISSRec, promising an practical solution for real-world recommendation scenarios.
Movable antenna (MA) is an emerging technology which enables a local movement of the antenna in the transmitter/receiver region for improving the channel condition and communication performance. In this paper, we study the deployment of multiple MAs at the base station (BS) for enhancing the multiuser communication performance. First, we model the multiuser channel in the uplink to characterize the wireless channel variation due to MAs' movements at the BS. Then, an optimization problem is formulated to maximize the minimum achievable rate among multiple users for MA-aided uplink multiuser communications by jointly optimizing the MAs' positions, their receive combining at the BS, and the transmit power of users, under the constraints of finite moving region for MAs, minimum inter-MA distance, and maximum transmit power of each user. To solve this challenging non-convex optimization problem, a two-loop iterative algorithm is proposed by leveraging the particle swarm optimization (PSO) method. Specifically, the outer-loop updates the positions of a set of particles, where each particle's position represents one realization of the antenna position vector (APV) of all MAs. The inner-loop implements the fitness evaluation for each particle in terms of the max-min achievable rate of multiple users with its corresponding APV, where the receive combining matrix of the BS and the transmit power of each user are optimized by applying the block coordinate descent (BCD) technique. Simulation results show that the antenna position optimization for MAs-aided BSs can significantly improve the rate performance as compared to conventional BSs with fixed-position antennas (FPAs).
Conventional beamforming with fixed-position antenna (FPA) arrays has a fundamental trade-off between maximizing the signal power (array gain) over a desired direction and simultaneously minimizing the interference power over undesired directions. To overcome this limitation, this letter investigates the movable antenna (MA) array enhanced beamforming by exploiting the new degree of freedom (DoF) via antenna position optimization, in addition to the design of antenna weights. We show that by jointly optimizing the antenna positions vector (APV) and antenna weights vector (AWV) of a linear MA array, the full array gain can be achieved over the desired direction while null steering can be realized over all undesired directions, under certain numbers of MAs and null-steering directions. The optimal solutions for AWV and APV are derived in closed form, which reveal that the optimal AWV for MA arrays requires only the signal phase adjustment with a fixed amplitude. Numerical results validate our analytical solutions for MA array beamforming and show their superior performance to the conventional beamforming techniques with FPA arrays.
Multi-domain recommendation (MDR) aims to provide recommendations for different domains (e.g., types of products) with overlapping users/items and is common for platforms such as Amazon, Facebook, and LinkedIn that host multiple services. Existing MDR models face two challenges: First, it is difficult to disentangle knowledge that generalizes across domains (e.g., a user likes cheap items) and knowledge specific to a single domain (e.g., a user likes blue clothing but not blue cars). Second, they have limited ability to transfer knowledge across domains with small overlaps. We propose a new MDR method named EDDA with two key components, i.e., embedding disentangling recommender and domain alignment, to tackle the two challenges respectively. In particular, the embedding disentangling recommender separates both the model and embedding for the inter-domain part and the intra-domain part, while most existing MDR methods only focus on model-level disentangling. The domain alignment leverages random walks from graph processing to identify similar user/item pairs from different domains and encourages similar user/item pairs to have similar embeddings, enhancing knowledge transfer. We compare EDDA with 12 state-of-the-art baselines on 3 real datasets. The results show that EDDA consistently outperforms the baselines on all datasets and domains. All datasets and codes are available at https://github.com/Stevenn9981/EDDA.
This paper studies the deployment of multiple movable antennas (MAs) at the base station (BS) for enhancing the multiuser communication performance. First, we model the multiuser channel in the uplink to characterize the wireless channel variation caused by MAs' movement at the BS. Then, an optimization problem is formulated to maximize the minimum achievable rate among multiple users for MA-aided uplink multiuser communications by jointly optimizing the MAs' positions, their receive combining at the BS, and the transmit power of users, under the constraints of finite moving region of MAs, minimum inter-MA distance, and maximum transmit power of each user. To solve this challenging non-convex optimization problem, a two-loop iterative algorithm is proposed by leveraging the particle swarm optimization (PSO) method. Specifically, the outer-loop updates the positions of a set of particles, where each particle's position represents one realization of the antenna positioning vector (APV) of all MAs. The inner-loop implements the fitness evaluation for each particle in terms of the max-min achievable rate of multiple users with its corresponding APV, where the receive combining matrix of the BS and the transmit power of each user are optimized by applying the block coordinate descent (BCD) technique. Simulation results show that the antenna position optimization for MAs-aided BS can significantly improve the rate performance as compared to conventional BS with fixed-position antennas (FPAs).
In this paper, we consider a challenging secure wireless sensing scenario where a legitimate radar station (LRS) intends to detect a target at unknown location in the presence of an unauthorized radar station (URS). We aim to enhance the sensing performance of the LRS and in the meanwhile prevent the detection of the same target by the URS. Under this setup, conventional stealth-based approaches such as wrapping the target with electromagnetic wave absorbing materials are not applicable, since they will disable the target detection by not only the URS, but the LRS as well. To tackle this challenge, we propose in this paper a new target-mounted IRS approach, where intelligent reflecting surface (IRS) is mounted on the outer/echo surface of the target and by tuning the IRS reflection, the strength of its reflected radar signal in any angle of departure (AoD) can be adjusted based on the signal's angle of arrival (AoA), thereby enhancing/suppressing the signal power towards the LRS/URS, respectively. To this end, we propose a practical protocol for the target-mounted IRS to estimate the LRS/URS channel and waveform parameters based on its sensed signals and control the IRS reflection for/against the LRS/URS accordingly. Specifically, we formulate new optimization problems to design the reflecting phase shifts at IRS for maximizing the received signal power at the LRS while keeping that at the URS below a certain level, for both the cases of short-term and long-term IRS operations with different dynamic reflection capabilities. To solve these non-convex problems, we apply the penalty dual decomposition method to obtain high-quality suboptimal solutions for them efficiently. Finally, simulation results are presented that verify the effectiveness of the proposed protocol and algorithms for the target-mounted IRS to achieve secure wireless sensing, as compared with various benchmark schemes.
Context-aware methods achieved great success in supervised scene text recognition via incorporating semantic priors from words. We argue that such prior contextual information can be interpreted as the relations of textual primitives due to the heterogeneous text and background, which can provide effective self-supervised labels for representation learning. However, textual relations are restricted to the finite size of dataset due to lexical dependencies, which causes the problem of over-fitting and compromises representation robustness. To this end, we propose to enrich the textual relations via rearrangement, hierarchy and interaction, and design a unified framework called RCLSTR: Relational Contrastive Learning for Scene Text Recognition. Based on causality, we theoretically explain that three modules suppress the bias caused by the contextual prior and thus guarantee representation robustness. Experiments on representation quality show that our method outperforms state-of-the-art self-supervised STR methods. Code is available at https://github.com/ThunderVVV/RCLSTR.
The Entity Set Expansion (ESE) task aims to expand a handful of seed entities with new entities belonging to the same semantic class. Conventional ESE methods are based on mono-modality (i.e., literal modality), which struggle to deal with complex entities in the real world such as: (1) Negative entities with fine-grained semantic differences. (2) Synonymous entities. (3) Polysemous entities. (4) Long-tailed entities. These challenges prompt us to propose Multi-modal Entity Set Expansion (MESE), where models integrate information from multiple modalities to represent entities. Intuitively, the benefits of multi-modal information for ESE are threefold: (1) Different modalities can provide complementary information. (2) Multi-modal information provides a unified signal via common visual properties for the same semantic class or entity. (3) Multi-modal information offers robust alignment signal for synonymous entities. To assess the performance of model in MESE and facilitate further research, we constructed the MESED dataset which is the first multi-modal dataset for ESE with large-scale and elaborate manual calibration. A powerful multi-modal model MultiExpan is proposed which is pre-trained on four multimodal pre-training tasks. The extensive experiments and analyses on MESED demonstrate the high quality of the dataset and the effectiveness of our MultiExpan, as well as pointing the direction for future research.