In recent years, the results of view-based 3D shape recognition methods have saturated, and models with excellent performance cannot be deployed on memory-limited devices due to their huge size of parameters. To address this problem, we introduce a compression method based on knowledge distillation for this field, which largely reduces the number of parameters while preserving model performance as much as possible. Specifically, to enhance the capabilities of smaller models, we design a high-performing large model called Group Multi-view Vision Transformer (GMViT). In GMViT, the view-level ViT first establishes relationships between view-level features. Additionally, to capture deeper features, we employ the grouping module to enhance view-level features into group-level features. Finally, the group-level ViT aggregates group-level features into complete, well-formed 3D shape descriptors. Notably, in both ViTs, we introduce spatial encoding of camera coordinates as innovative position embeddings. Furthermore, we propose two compressed versions based on GMViT, namely GMViT-simple and GMViT-mini. To enhance the training effectiveness of the small models, we introduce a knowledge distillation method throughout the GMViT process, where the key outputs of each GMViT component serve as distillation targets. Extensive experiments demonstrate the efficacy of the proposed method. The large model GMViT achieves excellent 3D classification and retrieval results on the benchmark datasets ModelNet, ShapeNetCore55, and MCB. The smaller models, GMViT-simple and GMViT-mini, reduce the parameter size by 8 and 17.6 times, respectively, and improve shape recognition speed by 1.5 times on average, while preserving at least 90% of the classification and retrieval performance.
Large Language Models (LLMs) encapsulate vast amounts of knowledge but still remain vulnerable to external misinformation. Existing research mainly studied this susceptibility behavior in a single-turn setting. However, belief can change during a multi-turn conversation, especially a persuasive one. Therefore, in this study, we delve into LLMs' susceptibility to persuasive conversations, particularly on factual questions that they can answer correctly. We first curate the Farm (i.e., Fact to Misinform) dataset, which contains factual questions paired with systematically generated persuasive misinformation. Then, we develop a testing framework to track LLMs' belief changes in a persuasive dialogue. Through extensive experiments, we find that LLMs' correct beliefs on factual knowledge can be easily manipulated by various persuasive strategies.
Reconfigurable intelligent surface (RIS) technology is emerging as a promising technique for performance enhancement for next-generation wireless networks. This paper investigates the physical layer security of an RIS-assisted multiple-antenna communication system in the presence of random spatially distributed eavesdroppers. The RIS-to-ground channels are assumed to experience Rician fading. Using stochastic geometry, exact distributions of the received signal-to-noise-ratios (SNRs) at the legitimate user and the eavesdroppers located according to a Poisson point process (PPP) are derived, and closed-form expressions for the secrecy outage probability (SOP) and the ergodic secrecy capacity (ESC) are obtained to provide insightful guidelines for system design. First, the secrecy diversity order is obtained as $\frac{2}{\alpha_2}$, where $\alpha_2$ denotes the path loss exponent of the RIS-to-ground links. Then, it is revealed that the secrecy performance is mainly affected by the number of RIS reflecting elements, $N$, and the impact of the number of transmit antennas and transmit power at the base station is marginal. In addition, when the locations of the randomly located eavesdroppers are unknown, deploying the RIS closer to the legitimate user rather than to the base station is shown to be more efficient. Moreover, it is also found that the density of randomly located eavesdroppers, $\lambda_e$, has an additive effect on the asymptotic ESC performance given by $\log_2{\left({1}/{\lambda_e}\right)}$. Finally, numerical simulations are conducted to verify the accuracy of these theoretical observations.
Massive grant-free transmission and cell-free wireless communication systems have emerged as pivotal enablers for massive machine-type communication. This paper proposes a deep-unfolding-based joint activity and data detection (DU-JAD) algorithm for massive grant-free transmission in cell-free systems. We first formulate a joint activity and data detection optimization problem, which we solve approximately using forward-backward splitting (FBS). We then apply deep unfolding to FBS to optimize algorithm parameters using machine learning. In order to improve data detection (DD) performance, reduce algorithm complexity, and enhance active user detection (AUD), we employ a momentum strategy, an approximate posterior mean estimator, and a novel soft-output AUD module, respectively. Simulation results confirm the efficacy of DU-JAD for AUD and DD.
The extremely large-scale massive multiple-input multiple-output (XL-MIMO) has the potential to achieve boosted spectral efficiency and refined spatial resolution for future wireless networks. However, channel estimation for XL-MIMO is challenging since the large number of antennas results in high computational complexity with the near-field effect. In this letter, we propose a low-complexity sequential angle-distance channel estimation (SADCE) method for near-field XL-MIMO systems equipped with uniformly planar arrays (UPA). Specifically, we first successfully decouple the angle and distance parameters, which allows us to devise a two-dimensional discrete Fourier transform (2D-DFT) method for angle parameters estimation. Then, a low-complexity distance estimation method is proposed with a closed-form solution. Compared with existing methods, the proposed method achieves significant performance gain with noticeably reduced computational complexity.Numerical results verify the superiority of the proposed near-field channel estimation algorithm.
Human-centered AI (HCAI), as a design philosophy, advocates prioritizing humans in designing, developing, and deploying intelligent systems, aiming to maximize the benefits of AI technology to humans and avoid its potential adverse effects. While HCAI has gained momentum, the lack of guidance on methodology in its implementation makes its adoption challenging. After assessing the needs for a methodological framework for HCAI, this paper first proposes a comprehensive and interdisciplinary HCAI methodological framework integrated with seven components, including design goals, design principles, implementation approaches, design paradigms, interdisciplinary teams, methods, and processes. THe implications of the framework are also discussed. This paper also presents a "three-layer" approach to facilitate the implementation of the framework. We believe the proposed framework is systematic and executable, which can overcome the weaknesses in current frameworks and the challenges currently faced in implementing HCAI. Thus, the framework can help put it into action to develop, transfer, and implement HCAI in practice, eventually enabling the design, development, and deployment of HCAI-based intelligent systems.
Digital twin, which enables emulation, evaluation, and optimization of physical entities through synchronized digital replicas, has gained increasingly attention as a promising technology for intricate wireless networks. For 6G, numerous innovative wireless technologies and network architectures have posed new challenges in establishing wireless network digital twins. To tackle these challenges, artificial intelligence (AI), particularly the flourishing generative AI, emerges as a potential solution. In this article, we discuss emerging prerequisites for wireless network digital twins considering the complicated network architecture, tremendous network scale, extensive coverage, and diversified application scenarios in the 6G era. We further explore the applications of generative AI, such as transformer and diffusion model, to empower the 6G digital twin from multiple perspectives including implementation, physical-digital synchronization, and slicing capability. Subsequently, we propose a hierarchical generative AI-enabled wireless network digital twin at both the message-level and policy-level, and provide a typical use case with numerical results to validate the effectiveness and efficiency. Finally, open research issues for wireless network digital twins in the 6G era are discussed.
The significant progress of large language models (LLMs) provides a promising opportunity to build human-like systems for various practical applications. However, when applied to specific task domains, an LLM pre-trained on a general-purpose corpus may exhibit a deficit or inadequacy in two types of domain-specific knowledge. One is a comprehensive set of domain data that is typically large-scale and continuously evolving. The other is specific working patterns of this domain reflected in the data. The absence or inadequacy of such knowledge impacts the performance of the LLM. In this paper, we propose a general paradigm that augments LLMs with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE. This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way. Then, the extracted knowledge is incorporated through prompts, without any computational cost of model fine-tuning. We instantiate the general paradigm on a widespread application, i.e. recommender systems, where critical item attributes and collaborative filtering signals are incorporated. Experimental results demonstrate that DOKE can substantially improve the performance of LLMs in specific domains.