With the development of streaming media technology, increasing communication relies on sound and visual information, which puts a massive burden on online media. Data compression becomes increasingly important to reduce the volume of data transmission and storage. To further improve the efficiency of image compression, researchers utilize various image processing methods to compensate for the limitations of conventional codecs and advanced learning-based compression methods. Instead of modifying the image compression oriented approaches, we propose a unified image compression preprocessing framework, called Kuchen, which aims to further improve the performance of existing codecs. The framework consists of a hybrid data labeling system along with a learning-based backbone to simulate personalized preprocessing. As far as we know, this is the first exploration of setting a unified preprocessing benchmark in image compression tasks. Results demonstrate that the modern codecs optimized by our unified preprocessing framework constantly improve the efficiency of the state-of-the-art compression.
In this paper, an intelligent reflecting surface (IRS) is leveraged to enhance the physical layer security of an integrated sensing and communication (ISAC) system in which the IRS is deployed to not only assist the downlink communication for multiple users, but also create a virtual line-of-sight (LoS) link for target sensing. In particular, we consider a challenging scenario where the target may be a suspicious eavesdropper that potentially intercepts the communication-user information transmitted by the base station (BS). We investigate the joint design of the phase shifts at the IRS and the communication as well as radar beamformers at the BS to maximize the sensing beampattern gain towards the target, subject to the maximum information leakage to the eavesdropping target and the minimum signal-to-interference-plus-noise ratio (SINR) required by users. Based on the availability of perfect channel state information (CSI) of all involved user links and the accurate target location at the BS, two scenarios are considered and two different optimization algorithms are proposed. For the ideal scenario where the CSI of the user links and the target location are perfectly known at the BS, a penalty-based algorithm is proposed to obtain a high-quality solution. In particular, the beamformers are obtained with a semi-closed-form solution using Lagrange duality and the IRS phase shifts are solved for in closed form by applying the majorization-minimization (MM) method. On the other hand, for the more practical scenario where the CSI is imperfect and the target location is uncertain, a robust algorithm based on the $\cal S$-procedure and sign-definiteness approaches is proposed. Simulation results demonstrate the effectiveness of the proposed scheme in achieving a trade-off between the communication quality and the sensing quality.
Few-shot learning models learn representations with limited human annotations, and such a learning paradigm demonstrates practicability in various tasks, e.g., image classification, object detection, etc. However, few-shot object detection methods suffer from an intrinsic defect that the limited training data makes the model cannot sufficiently explore semantic information. To tackle this, we introduce knowledge distillation to the few-shot object detection learning paradigm. We further run a motivating experiment, which demonstrates that in the process of knowledge distillation the empirical error of the teacher model degenerates the prediction performance of the few-shot object detection model, as the student. To understand the reasons behind this phenomenon, we revisit the learning paradigm of knowledge distillation on the few-shot object detection task from the causal theoretic standpoint, and accordingly, develop a Structural Causal Model. Following the theoretical guidance, we propose a backdoor adjustment-based knowledge distillation method for the few-shot object detection task, namely Disentangle and Remerge (D&R), to perform conditional causal intervention toward the corresponding Structural Causal Model. Theoretically, we provide an extended definition, i.e., general backdoor path, for the backdoor criterion, which can expand the theoretical application boundary of the backdoor criterion in specific cases. Empirically, the experiments on multiple benchmark datasets demonstrate that D&R can yield significant performance boosts in few-shot object detection.
Recently, some span-based methods have achieved encouraging performances for joint aspect-sentiment analysis, which first extract aspects (aspect extraction) by detecting aspect boundaries and then classify the span-level sentiments (sentiment classification). However, most existing approaches either sequentially extract task-specific features, leading to insufficient feature interactions, or they encode aspect features and sentiment features in a parallel manner, implying that feature representation in each task is largely independent of each other except for input sharing. Both of them ignore the internal correlations between the aspect extraction and sentiment classification. To solve this problem, we novelly propose a hierarchical interactive network (HI-ASA) to model two-way interactions between two tasks appropriately, where the hierarchical interactions involve two steps: shallow-level interaction and deep-level interaction. First, we utilize cross-stitch mechanism to combine the different task-specific features selectively as the input to ensure proper two-way interactions. Second, the mutual information technique is applied to mutually constrain learning between two tasks in the output layer, thus the aspect input and the sentiment input are capable of encoding features of the other task via backpropagation. Extensive experiments on three real-world datasets demonstrate HI-ASA's superiority over baselines.
As one of the main solutions to the information overload problem, recommender systems are widely used in daily life. In the recent emerging micro-video recommendation scenario, micro-videos contain rich multimedia information, involving text, image, video and other multimodal data, and these rich multimodal information conceals users' deep interest in the items. Most of the current recommendation algorithms based on multimodal data use multimodal information to expand the information on the item side, but ignore the different preferences of users for different modal information, and lack the fine-grained mining of the internal connection of multimodal information. To investigate the problems in the micro-video recommendr system mentioned above, we design a hybrid recommendation model based on multimodal information, introduces multimodal information and user-side auxiliary information in the network structure, fully explores the deep interest of users, measures the importance of each dimension of user and item feature representation in the scoring prediction task, makes the application of graph neural network in the recommendation system is improved by using an attention mechanism to fuse the multi-layer state output information, allowing the shallow structural features provided by the intermediate layer to better participate in the prediction task. The recommendation accuracy is improved compared with the traditional recommendation algorithm on different data sets, and the feasibility and effectiveness of our model is verified.
We propose joint user association, channel assignment and power allocation for mobile robot Ultra-Reliable and Low Latency Communications (URLLC) based on multi-connectivity and reinforcement learning. The mobile robots require control messages from the central guidance system at regular intervals. We use a two-phase communication scheme where robots can form multiple clusters. The robots in a cluster are close to each other and can have reliable Device-to-Device (D2D) communications. In Phase I, the APs transmit the combined payload of a cluster to the cluster leader within a latency constraint. The cluster leader broadcasts this message to its members in Phase II. We develop a distributed Multi-Agent Reinforcement Learning (MARL) algorithm for joint user association and resource allocation (RA) for Phase I. The cluster leaders use their local Channel State Information (CSI) to decide the APs for connection along with the sub-band and power level. The cluster leaders utilize multi-connectivity to connect to multiple APs to increase their reliability. The objective is to maximize the successful payload delivery probability for all robots. Illustrative simulation results indicate that the proposed scheme can approach the performance of the centralized algorithm and offer a substantial gain in reliability as compared to single-connectivity (when cluster leaders are able to connect to 1 AP).
Denoising diffusion probabilistic models (DDPMs) have been shown to have superior performances in MRI reconstruction. From the perspective of continuous stochastic differential equations (SDEs), the reverse process of DDPM can be seen as maximizing the energy of the reconstructed MR image, leading to SDE sequence divergence. For this reason, a modified high-frequency DDPM model is proposed for MRI reconstruction. From its continuous SDE viewpoint, termed high-frequency space SDE (HFS-SDE), the energy concentrated low-frequency part of the MR image is no longer amplified, and the diffusion process focuses more on acquiring high-frequency prior information. It not only improves the stability of the diffusion model but also provides the possibility of better recovery of high-frequency details. Experiments on the publicly fastMRI dataset show that our proposed HFS-SDE outperforms the DDPM-driven VP-SDE, supervised deep learning methods and traditional parallel imaging methods in terms of stability and reconstruction accuracy.
Optimal experimental design is an essential subfield of statistics that maximizes the chances of experimental success. The D- and A-optimal design is a very challenging problem in the field of optimal design, namely minimizing the determinant and trace of the inverse Fisher information matrix. Due to the flexibility and ease of implementation, traditional evolutionary algorithms (EAs) are applied to deal with a small part of experimental optimization design problems without mathematical derivation and assumption. However, the current EAs remain the issues of determining the support point number, handling the infeasible weight solution, and the insufficient experiment. To address the above issues, this paper investigates differential evolution (DE) variants for finding D- and A-optimal designs on several different statistical models. The repair operation is proposed to automatically determine the support point by combining similar support points with their corresponding weights based on Euclidean distance and deleting the support point with less weight. Furthermore, the repair operation fixes the infeasible weight solution into the feasible weight solution. To enrich our optimal design experiments, we utilize the proposed DE variants to test the D- and A-optimal design problems on 12 statistical models. Compared with other competitor algorithms, simulation experiments show that LSHADE can achieve better performance on the D- and A-optimal design problems.
In recent years, deep dictionary learning (DDL)has attracted a great amount of attention due to its effectiveness for representation learning and visual recognition.~However, most existing methods focus on unsupervised deep dictionary learning, failing to further explore the category information.~To make full use of the category information of different samples, we propose a novel deep dictionary learning model with an intra-class constraint (DDLIC) for visual classification. Specifically, we design the intra-class compactness constraint on the intermediate representation at different levels to encourage the intra-class representations to be closer to each other, and eventually the learned representation becomes more discriminative.~Unlike the traditional DDL methods, during the classification stage, our DDLIC performs a layer-wise greedy optimization in a similar way to the training stage. Experimental results on four image datasets show that our method is superior to the state-of-the-art methods.
Molecular property prediction is a fundamental task in the drug and material industries. Physically, the properties of a molecule are determined by its own electronic structure, which can be exactly described by the Schr\"odinger equation. However, solving the Schr\"odinger equation for most molecules is extremely challenging due to long-range interactions in the behavior of a quantum many-body system. While deep learning methods have proven to be effective in molecular property prediction, we design a novel method, namely GEM-2, which comprehensively considers both the long-range and many-body interactions in molecules. GEM-2 consists of two interacted tracks: an atom-level track modeling both the local and global correlation between any two atoms, and a pair-level track modeling the correlation between all atom pairs, which embed information between any 3 or 4 atoms. Extensive experiments demonstrated the superiority of GEM-2 over multiple baseline methods in quantum chemistry and drug discovery tasks.