Alibaba Group




Abstract:The vortex electromagnetic wave carried by multiple orthogonal orbital angular momentum (OAM) modes in the same frequency band can be applied to the field of wireless communications, which greatly increases the spectrum efficiency. The uniform circular array (UCA) is widely used to generate and receive vortex electromagnetic waves with multiple OAM-modes. However, the maximum number of orthogonal OAM-modes based on UCA is usually limited to the number of array-elements of the UCA antenna, leaving how to utilize more OAM-modes to achieve higher channel capacity with a fixed number of arrayelements as an intriguing question. In this paper, we propose an N-dimensional quasi-fractal UCA (ND QF-UCA) antenna structure in different fractal geometry layouts to break through the limits of array-elements number on OAM-modes number. We develop the N-dimensional OAM modulation (NOM) and demodulation (NOD) schemes for OAM multiplexing transmission with the OAM-modes number exceeding the array-elements number, which is beyond the traditional concept of multiple antenna based wireless communications. Then, we investigate different dimensional multiplexing transmission schemes based on the corresponding QF-UCA antenna structure with various array-element layouts and evaluate the optimal layout type and dimension to obtain the highest channel capacity with a fixed number of array-elements. Simulation results show that our proposed schemes can obtain a higher spectrum efficiency, surpassing those of alternative array-element layouts of QF-UCA and the traditional multiple antenna systems.
Abstract:Edge detection is a fundamental task in computer vision and it has made great progress under the development of deep convolutional neural networks (DCNNs), some of them have achieved a beyond human-level performance. However, recent top-performing edge detection methods tend to generate thick and blurred edge lines. In this work, we propose an effective method to solve this problem. Our approach consists of a lightweight pre-trained backbone, multi-scale contextual enhancement module aggregating gradient information (MCGI), boundary correction module (BCM), and boundary refinement module (BRM). In addition to this, we construct a novel hybrid loss function based on the Tversky index for solving the issue of imbalanced pixel distribution. We test our method on three standard benchmarks and the experiment results illustrate that our method improves the visual effect of edge maps and achieves a top performance among several state-of-the-art methods on the BSDS500 dataset (ODS F-score in standard evaluation is 0.829, in crispness evaluation is 0.720), NYUD-V2 dataset (ODS F-score in standard evaluation is 0.768, in crispness evaluation is \textbf{0.546}), and BIPED dataset (ODS F-score in standard evaluation is 0.903).
Abstract:Orbital angular momentum (OAM) carried electromagnetic waves have the potential to improve spectrum efficiency in optical and radio-frequency communications due to the orthogonal wavefronts of different OAM modes. However, OAM beams are vortically hollow and divergent, which significantly decreases the capacity of OAM transmissions. In addition, unaligned transceivers in OAM transmissions can result in a high bit error rate (BER). The Talbot effect is a self-imaging phenomenon that can be used to generate optical or radio-frequency OAM beams with periodic repeating structures at multiples of a certain distance along the propagation direction. These periodic structures make it unnecessary for the transceiver antennas to be perfectly aligned and can also alleviate the hollow divergence of OAM beams. In this paper, we propose Talbot-effect-based fractal OAM generation and detection schemes using a uniform circular array (UCA) to significantly improve capacity and BER performance in unaligned OAM transmissions. We first provide a brief overview of fractal OAM. Then, we propose the fractal OAM beam generation and detection schemes. Numerical analysis and simulations verify the effectiveness of our proposed fractal OAM generation scheme and also demonstrate improved capacity and BER performance compared to normal OAM transmissions. We also analyze how the receive UCA radius and the distance between the UCAs impact the capacity and BER performances.




Abstract:Due to its low energy consumption and simplicity, near field communication (NFC) has been extensively used in various short-range transmission scenarios, for example, proximity payment and NFC entrance guard. However, the low data rate of NFC limits its application in high rate demanded scenarios, such as high-resolution fingerprint identification and streaming media transmission as well as the future promising high rate indoor communications among pads, phones, and laptops. In this paper, we model and analyze the performance of the orbital angular momentum based NFC (OAM-NFC) system, which can significantly increase the capacity of NFC. We first give the OAM system model. With coils circularly equipped at the transmitter and receiver, OAM-NFC signals can be transmitted, received, and detected. Then, we develop the OAM-NFC generation and detection schemes for NFC multiplexing transmission. We also analyze the OAM-NFC channel capacity and compare it with those of single-input-single-output (SISO) as well as multi-input-multi-output (MIMO) NFC. Simulation results validate the feasibility and capacity enhancement of our proposed OAM-NFC system. How different variables, such as the transceiver misalignment, the numbers of transceiver coils, and transceiver distance, impact the OAM-NFC capacity are also analyzed.




Abstract:Given the imperative of 6G networks' ubiquitous connectivity, along with the inherent mobility and cost-effectiveness of unmanned aerial vehicles (UAVs), UAVs play a critical role within 6G wireless networks. Despite advancements in enhancing the UAV-enabled communication systems' throughput in existing studies, there remains a notable gap in addressing issues concerning user fairness and quality-of-service (QoS) provisioning and lacks an effective scheme to depict the trade-off between system throughput and user fairness. To solve the above challenges, in this paper we introduce a novel fairness control scheme for UAV-enabled wireless communication systems based on a new weighted function. First, we propose a throughput combining model based on a new weighted function with fairness considering. Second, we formulate the optimization problem to maximize the weighted sum of all users' throughput. Third, we decompose the optimization problem and propose an efficient iterative algorithm to solve it. Finally, simulation results are provided to demonstrate the considerable potential of our proposed scheme in fairness and QoS provisioning.




Abstract:Knowledge tracing (KT) is a crucial task in intelligent education, focusing on predicting students' performance on given questions to trace their evolving knowledge. The advancement of deep learning in this field has led to deep-learning knowledge tracing (DLKT) models that prioritize high predictive accuracy. However, many existing DLKT methods overlook the fundamental goal of tracking students' dynamical knowledge mastery. These models do not explicitly model knowledge mastery tracing processes or yield unreasonable results that educators find difficulty to comprehend and apply in real teaching scenarios. In response, our research conducts a preliminary analysis of mainstream KT approaches to highlight and explain such unreasonableness. We introduce GRKT, a graph-based reasonable knowledge tracing method to address these issues. By leveraging graph neural networks, our approach delves into the mutual influences of knowledge concepts, offering a more accurate representation of how the knowledge mastery evolves throughout the learning process. Additionally, we propose a fine-grained and psychological three-stage modeling process as knowledge retrieval, memory strengthening, and knowledge learning/forgetting, to conduct a more reasonable knowledge tracing process. Comprehensive experiments demonstrate that GRKT outperforms eleven baselines across three datasets, not only enhancing predictive accuracy but also generating more reasonable knowledge tracing results. This makes our model a promising advancement for practical implementation in educational settings. The source code is available at https://github.com/JJCui96/GRKT.




Abstract:In virtual reality (VR) applications, 360-degree images play a pivotal role in crafting immersive experiences and offering panoramic views, thus improving user Quality of Experience (QoE). However, the voluminous data generated by 360-degree images poses challenges in network storage and bandwidth. To address these challenges, we propose a novel Activation Map-based Vector Quantization (AM-VQ) framework, which is designed to reduce communication overhead for wireless transmission. The proposed AM-VQ scheme uses the Deep Neural Networks (DNNs) with vector quantization (VQ) to extract and compress semantic features. Particularly, the AM-VQ framework utilizes activation map to adaptively quantize semantic features, thus reducing data distortion caused by quantization operation. To further enhance the reconstruction quality of the 360-degree image, adversarial training with a Generative Adversarial Networks (GANs) discriminator is incorporated. Numerical results show that our proposed AM-VQ scheme achieves better performance than the existing Deep Learning (DL) based coding and the traditional coding schemes under the same transmission symbols.




Abstract:Nutrition estimation is crucial for effective dietary management and overall health and well-being. Existing methods often struggle with sub-optimal accuracy and can be time-consuming. In this paper, we propose NuNet, a transformer-based network designed for nutrition estimation that utilizes both RGB and depth information from food images. We have designed and implemented a multi-scale encoder and decoder, along with two types of feature fusion modules, specialized for estimating five nutritional factors. These modules effectively balance the efficiency and effectiveness of feature extraction with flexible usage of our customized attention mechanisms and fusion strategies. Our experimental study shows that NuNet outperforms its variants and existing solutions significantly for nutrition estimation. It achieves an error rate of 15.65%, the lowest known to us, largely due to our multi-scale architecture and fusion modules. This research holds practical values for dietary management with huge potential for transnational research and deployment and could inspire other applications involving multiple data types with varying degrees of importance.




Abstract:Large-scale multi-session LiDAR mapping plays a crucial role in various applications but faces significant challenges in data redundancy and pose graph scalability. This paper present MS-Mapping, a novel multi-session LiDAR mapping system that combines an incremental mapping scheme with support for various LiDAR-based odometry, enabling high-precision and consistent map assembly in large-scale environments. Our approach introduces a real-time keyframe selection method based on the Wasserstein distance, which effectively reduces data redundancy and pose graph complexity. We formulate the LiDAR point cloud keyframe selection problem using a similarity method based on Gaussian mixture models (GMM) and tackle the real-time challenge by employing an incremental voxel update method. Extensive experiments on large-scale campus scenes and over \SI{12.8}{km} of public and self-collected datasets demonstrate the efficiency, accuracy, and consistency of our map assembly approach. To facilitate further research and development in the community, we make our code https://github.com/JokerJohn/MS-Mapping and datasets publicly available.
Abstract:This paper studies the performance trade-off in a multi-user backscatter communication (BackCom) system for integrated sensing and communications (ISAC), where the multi-antenna ISAC transmitter sends excitation signals to power multiple single-antenna passive backscatter devices (BD), and the multi-antenna ISAC receiver performs joint sensing (localization) and communication tasks based on the backscattered signals from all BDs. Specifically, the localization performance is measured by the Cram\'{e}r-Rao bound (CRB) on the transmission delay and direction of arrival (DoA) of the backscattered signals, whose closed-form expression is obtained by deriving the corresponding Fisher information matrix (FIM), and the communication performance is characterized by the sum transmission rate of all BDs. Then, to characterize the trade-off between the localization and communication performances, the CRB minimization problem with the communication rate constraint is formulated, and is shown to be non-convex in general. By exploiting the hidden convexity, we propose an approach that combines fractional programming (FP) and Schur complement techniques to transform the original problem into an equivalent convex form. Finally, numerical results reveal the trade-off between the CRB and sum transmission rate achieved by our proposed method.