Semantic communication, an intelligent communication paradigm that aims to transmit useful information in the semantic domain, is facilitated by deep learning techniques. Although robust semantic features can be learned and transmitted in an analog fashion, it poses new challenges to hardware, protocol, and encryption. In this paper, we propose a digital semantic communication system, which consists of an encoding network deployed on a resource-limited device and a decoding network deployed at the edge. To acquire better semantic representation for digital transmission, a novel non-linear quantization module is proposed with the trainable quantization levels that efficiently quantifies semantic features. Additionally, structured pruning by a sparse scaling vector is incorporated to reduce the dimension of the transmitted features. We also introduce a semantic learning loss (SLL) function to reduce semantic error. To adapt to various channel conditions and inputs under constraints of communication and computing resources, a policy network is designed to adaptively choose the split point and the dimension of the transmitted semantic features. Experiments using the CIFAR-10 dataset for image classification are employed to evaluate the proposed digital semantic communication network, and ablation studies are conducted to assess the proposed modules including the quantization module, structured pruning and SLL.
We investigate the online bandit learning of the monotone multi-linear DR-submodular functions, designing the algorithm $\mathtt{BanditMLSM}$ that attains $O(T^{2/3}\log T)$ of $(1-1/e)$-regret. Then we reduce submodular bandit with partition matroid constraint and bandit sequential monotone maximization to the online bandit learning of the monotone multi-linear DR-submodular functions, attaining $O(T^{2/3}\log T)$ of $(1-1/e)$-regret in both problems, which improve the existing results. To the best of our knowledge, we are the first to give a sublinear regret algorithm for the submodular bandit with partition matroid constraint. A special case of this problem is studied by Streeter et al.(2009). They prove a $O(T^{4/5})$ $(1-1/e)$-regret upper bound. For the bandit sequential submodular maximization, the existing work proves an $O(T^{2/3})$ regret with a suboptimal $1/2$ approximation ratio (Niazadeh et al. 2021).
Shear wall structures are widely used in high-rise residential buildings, and the layout of shear walls requires many years of design experience and iterative trial and error. Currently, there are methods based on heuristic algorithms, but they generate results too slowly. Those based on Generative Adversarial Networks (GANs) or Graph Neural Networks (GNNs) can only generate single arrangements and require large amounts of training data. At present, Stable Diffusion is being widely used, and by using the Low-Rank Adaptation (LoRA) method to fine-tune large models with small amounts of data, good generative results can be achieved. Therefore, this paper proposes a personalized AI assistant for shear wall layout based on Stable Diffusion, which has been proven to produce good generative results through testing.
Grant-free non-orthogonal multiple access has been regarded as a viable approach to accommodate access for a massive number of machine-type devices with small data packets. The sporadic activation of the devices creates a multiuser setup where it is suitable to use compressed sensing in order to detect the active devices and decode their data. We consider asynchronous access of machine-type devices that send data packets of different frame sizes, leading to data length diversity. We address the composite problem of activity detection, channel estimation, and data recovery by posing it as a structured sparse recovery, having three-level sparsity caused by sporadic activity, symbol delay, and data length diversity. We approach the problem through approximate message passing with a backward propagation algorithm (AMP-BP), tailored to exploit the sparsity, and in particular the data length diversity. Moreover, we unfold the proposed AMP-BP into a model-driven network, termed learned AMP-BP (LAMP-BP), which enhances detection performance. The results show that the proposed LAMP-BP outperforms existing methods in activity detection and data recovery accuracy.
A surge of interest has emerged in weakly supervised semantic segmentation due to its remarkable efficiency in recent years. Existing approaches based on transformers mainly focus on exploring the affinity matrix to boost CAMs with global relationships. While in this work, we first perform a scrupulous examination towards the impact of successive affinity matrices and discover that they possess an inclination toward sparsification as the network approaches convergence, hence disclosing a manifestation of over-smoothing. Besides, it has been observed that enhanced attention maps tend to evince a substantial amount of extraneous background noise in deeper layers. Drawing upon this, we posit a daring conjecture that the undisciplined over-smoothing phenomenon introduces a noteworthy quantity of semantically irrelevant background noise, causing performance degradation. To alleviate this issue, we propose a novel perspective that highlights the objects of interest by investigating the regions of the trait, thereby fostering an extensive comprehension of the successive affinity matrix. Consequently, we suggest an adaptive re-activation mechanism (AReAM) that alleviates the issue of incomplete attention within the object and the unbounded background noise. AReAM accomplishes this by supervising high-level attention with shallow affinity matrices, yielding promising results. Exhaustive experiments conducted on the commonly used dataset manifest that segmentation results can be greatly improved through our proposed AReAM, which imposes restrictions on each affinity matrix in deep layers to make it attentive to semantic regions.
Neural ranking models (NRMs) have attracted considerable attention in information retrieval. Unfortunately, NRMs may inherit the adversarial vulnerabilities of general neural networks, which might be leveraged by black-hat search engine optimization practitioners. Recently, adversarial attacks against NRMs have been explored in the paired attack setting, generating an adversarial perturbation to a target document for a specific query. In this paper, we focus on a more general type of perturbation and introduce the topic-oriented adversarial ranking attack task against NRMs, which aims to find an imperceptible perturbation that can promote a target document in ranking for a group of queries with the same topic. We define both static and dynamic settings for the task and focus on decision-based black-box attacks. We propose a novel framework to improve topic-oriented attack performance based on a surrogate ranking model. The attack problem is formalized as a Markov decision process (MDP) and addressed using reinforcement learning. Specifically, a topic-oriented reward function guides the policy to find a successful adversarial example that can be promoted in rankings to as many queries as possible in a group. Experimental results demonstrate that the proposed framework can significantly outperform existing attack strategies, and we conclude by re-iterating that there exist potential risks for applying NRMs in the real world.
Without access to the source data, source-free domain adaptation (SFDA) transfers knowledge from a source-domain trained model to target domains. Recently, SFDA has gained popularity due to the need to protect the data privacy of the source domain, but it suffers from catastrophic forgetting on the source domain due to the lack of data. To systematically investigate the mechanism of catastrophic forgetting, we first reimplement previous SFDA approaches within a unified framework and evaluate them on four benchmarks. We observe that there is a trade-off between adaptation gain and forgetting loss, which motivates us to design a consistency regularization to mitigate forgetting. In particular, we propose a continual source-free domain adaptation approach named CoSDA, which employs a dual-speed optimized teacher-student model pair and is equipped with consistency learning capability. Our experiments demonstrate that CoSDA outperforms state-of-the-art approaches in continuous adaptation. Notably, our CoSDA can also be integrated with other SFDA methods to alleviate forgetting.
Temporal knowledge graphs (TKGs) model the temporal evolution of events and have recently attracted increasing attention. Since TKGs are intrinsically incomplete, it is necessary to reason out missing elements. Although existing TKG reasoning methods have the ability to predict missing future events, they fail to generate explicit reasoning paths and lack explainability. As reinforcement learning (RL) for multi-hop reasoning on traditional knowledge graphs starts showing superior explainability and performance in recent advances, it has opened up opportunities for exploring RL techniques on TKG reasoning. However, the performance of RL-based TKG reasoning methods is limited due to: (1) lack of ability to capture temporal evolution and semantic dependence jointly; (2) excessive reliance on manually designed rewards. To overcome these challenges, we propose an adaptive reinforcement learning model based on attention mechanism (DREAM) to predict missing elements in the future. Specifically, the model contains two components: (1) a multi-faceted attention representation learning method that captures semantic dependence and temporal evolution jointly; (2) an adaptive RL framework that conducts multi-hop reasoning by adaptively learning the reward functions. Experimental results demonstrate DREAM outperforms state-of-the-art models on public dataset
We present a high-precision real-time facial animation pipeline suitable for animators to use on their desktops. This pipeline is about to be launched in FACEGOOD's Avatary\footnote{https://www.avatary.com/} software, which will accelerate animators' productivity. The pipeline differs from professional head-mounted facial capture solutions in that it only requires the use of a consumer-grade 3D camera on the desk to achieve high-precision real-time facial capture. The system enables animators to create high-quality facial animations with ease and speed, while reducing the cost and complexity of traditional facial capture solutions. Our approach has the potential to revolutionize the way facial animation is done in the entertainment industry.
The past decade has witnessed rapid progress in AI research since the breakthrough in deep learning. AI technology has been applied in almost every field; therefore, technical and non-technical end-users must understand these technologies to exploit them. However existing materials are designed for experts, but non-technical users need appealing materials that deliver complex ideas in easy-to-follow steps. One notable tool that fits such a profile is scrollytelling, an approach to storytelling that provides readers with a natural and rich experience at the reader's pace, along with in-depth interactive explanations of complex concepts. Hence, this work proposes a novel visualization design for creating a scrollytelling that can effectively explain an AI concept to non-technical users. As a demonstration of our design, we created a scrollytelling to explain the Siamese Neural Network for the visual similarity matching problem. Our approach helps create a visualization valuable for a short-timeline situation like a sales pitch. The results show that the visualization based on our novel design helps improve non-technical users' perception and machine learning concept knowledge acquisition compared to traditional materials like online articles.