Gaussian mixture block models are distributions over graphs that strive to model modern networks: to generate a graph from such a model, we associate each vertex $i$ with a latent feature vector $u_i \in \mathbb{R}^d$ sampled from a mixture of Gaussians, and we add edge $(i,j)$ if and only if the feature vectors are sufficiently similar, in that $\langle u_i,u_j \rangle \ge \tau$ for a pre-specified threshold $\tau$. The different components of the Gaussian mixture represent the fact that there may be different types of nodes with different distributions over features -- for example, in a social network each component represents the different attributes of a distinct community. Natural algorithmic tasks associated with these networks are embedding (recovering the latent feature vectors) and clustering (grouping nodes by their mixture component). In this paper we initiate the study of clustering and embedding graphs sampled from high-dimensional Gaussian mixture block models, where the dimension of the latent feature vectors $d\to \infty$ as the size of the network $n \to \infty$. This high-dimensional setting is most appropriate in the context of modern networks, in which we think of the latent feature space as being high-dimensional. We analyze the performance of canonical spectral clustering and embedding algorithms for such graphs in the case of 2-component spherical Gaussian mixtures, and begin to sketch out the information-computation landscape for clustering and embedding in these models.
Kinship recognition aims to determine whether the subjects in two facial images are kin or non-kin, which is an emerging and challenging problem. However, most previous methods focus on heuristic designs without considering the spatial correlation between face images. In this paper, we aim to learn discriminative kinship representations embedded with the relation information between face components (e.g., eyes, nose, etc.). To achieve this goal, we propose the Face Componential Relation Network, which learns the relationship between face components among images with a cross-attention mechanism, which automatically learns the important facial regions for kinship recognition. Moreover, we propose Face Componential Relation Network (FaCoRNet), which adapts the loss function by the guidance from cross-attention to learn more discriminative feature representations. The proposed FaCoRNet outperforms previous state-of-the-art methods by large margins for the largest public kinship recognition FIW benchmark. The code will be publicly released upon acceptance.
The drastic variation of motion in spatial and temporal dimensions makes the video prediction task extremely challenging. Existing RNN models obtain higher performance by deepening or widening the model. They obtain the multi-scale features of the video only by stacking layers, which is inefficient and brings unbearable training costs (such as memory, FLOPs, and training time). Different from them, this paper proposes a spatiotemporal multi-scale model called MS-LSTM wholly from a multi-scale perspective. On the basis of stacked layers, MS-LSTM incorporates two additional efficient multi-scale designs to fully capture spatiotemporal context information. Concretely, we employ LSTMs with mirrored pyramid structures to construct spatial multi-scale representations and LSTMs with different convolution kernels to construct temporal multi-scale representations. Detailed comparison experiments with eight baseline models on four video datasets show that MS-LSTM has better performance but lower training costs.
Prediction-based active perception has shown the potential to improve the navigation efficiency and safety of the robot by anticipating the uncertainty in the unknown environment. The existing works for 3D shape prediction make an implicit assumption about the partial observations and therefore cannot be used for real-world planning and do not consider the control effort for next-best-view planning. We present Pred-NBV, a realistic object shape reconstruction method consisting of PoinTr-C, an enhanced 3D prediction model trained on the ShapeNet dataset, and an information and control effort-based next-best-view method to address these issues. Pred-NBV shows an improvement of 25.46% in object coverage over the traditional method in the AirSim simulator, and performs better shape completion than PoinTr, the state-of-the-art shape completion model, even on real data obtained from a Velodyne 3D LiDAR mounted on DJI M600 Pro.
Deep unfolding networks (DUNs) are the foremost methods in the realm of compressed sensing MRI, as they can employ learnable networks to facilitate interpretable forward-inference operators. However, several daunting issues still exist, including the heavy dependency on the first-order optimization algorithms, the insufficient information fusion mechanisms, and the limitation of capturing long-range relationships. To address the issues, we propose a Generically Accelerated Half-Quadratic Splitting (GA-HQS) algorithm that incorporates second-order gradient information and pyramid attention modules for the delicate fusion of inputs at the pixel level. Moreover, a multi-scale split transformer is also designed to enhance the global feature representation. Comprehensive experiments demonstrate that our method surpasses previous ones on single-coil MRI acceleration tasks.
In this paper, we propose a scheme for the problem of cache-aided multi-user private information retrieval with small caches, in which $K$ users are connected to $S$ non-colluding databases via shared links. Each database contains a set of $N$ files, and each user has a dedicated cache of size equivalent to the size of $M$ files. All the users want to retrieve a file without revealing their demands to the databases. During off-peak hours, all the users will fill their caches, and when required, users will demand their desired files by cooperatively generating query sets for each database. After receiving the transmissions from databases, all the users should get their desired files using transmitted data and their cache contents. This problem has been studied in [X. Zhang, K. Wan, H. Sun, M. Ji and G. Caire, \tqt{Fundamental limits of cache-aided multiuser private information retrieval}, IEEE Trans. Commun., 2021], in which authors proposed a product design scheme. In this paper, we propose a scheme that gives a better rate for a particular value of $M$ than the product design scheme. We consider a slightly different approach for the placement phase. Instead of a database filling the caches of all users directly, a database will broadcast cache content for all users on a shared link, and then the users will decide unitedly which part of the broadcasted content will be stored in the cache of each user. This variation facilitates maintaining the privacy constraint at a reduced rate.
Graph Markov Neural Networks (GMNN) have recently been proposed to improve regular graph neural networks (GNN) by including label dependencies into the semi-supervised node classification task. GMNNs do this in a theoretically principled way and use three kinds of information to predict labels. Just like ordinary GNNs, they use the node features and the graph structure but they moreover leverage information from the labels of neighboring nodes to improve the accuracy of their predictions. In this paper, we introduce a new dataset named WikiVitals which contains a graph of 48k mutually referred Wikipedia articles classified into 32 categories and connected by 2.3M edges. Our aim is to rigorously evaluate the contributions of three distinct sources of information to the prediction accuracy of GMNN for this dataset: the content of the articles, their connections with each other and the correlations among their labels. For this purpose we adapt a method which was recently proposed for performing fair comparisons of GNN performance using an appropriate randomization over partitions and a clear separation of model selection and model assessment.
We present NeRFVS, a novel neural radiance fields (NeRF) based method to enable free navigation in a room. NeRF achieves impressive performance in rendering images for novel views similar to the input views while suffering for novel views that are significantly different from the training views. To address this issue, we utilize the holistic priors, including pseudo depth maps and view coverage information, from neural reconstruction to guide the learning of implicit neural representations of 3D indoor scenes. Concretely, an off-the-shelf neural reconstruction method is leveraged to generate a geometry scaffold. Then, two loss functions based on the holistic priors are proposed to improve the learning of NeRF: 1) A robust depth loss that can tolerate the error of the pseudo depth map to guide the geometry learning of NeRF; 2) A variance loss to regularize the variance of implicit neural representations to reduce the geometry and color ambiguity in the learning procedure. These two loss functions are modulated during NeRF optimization according to the view coverage information to reduce the negative influence brought by the view coverage imbalance. Extensive results demonstrate that our NeRFVS outperforms state-of-the-art view synthesis methods quantitatively and qualitatively on indoor scenes, achieving high-fidelity free navigation results.
Spiking neural networks have attracted extensive attention from researchers in many fields due to their brain-like information processing mechanism. The proposal of surrogate gradient enables the spiking neural networks to migrate to more complex tasks, and gradually close the gap with the conventional artificial neural networks. Current spiking neural networks utilize the output of all moments to produce the final prediction, which compromises their temporal characteristics and causes a reduction in performance and efficiency. We propose a temporal knowledge sharing approach (TKS) that enables the interaction of information between different moments, by selecting the output of specific moments to compose teacher signals to guide the training of the network along with the real labels. We have validated TKS on both static datasets CIFAR10, CIFAR100, ImageNet-1k and neuromorphic datasets DVS-CIFAR10, NCALTECH101. Our experimental results indicate that we have achieved the current optimal performance in comparison with other algorithms. Experiments on Fine-grained classification datasets further demonstrate our algorithm's superiority with CUB-200-2011, StanfordDogs, and StanfordCars. TKS algorithm helps the model to have stronger temporal generalization capability, allowing the network to guarantee performance with large time steps in the training phase and with small time steps in the testing phase. This greatly facilitates the deployment of SNNs on edge devices.
In this paper, we propose a robust secure transmission scheme for an active reconfigurable intelligent surface (RIS) enabled symbiotic radio (SR) system in the presence of multiple eavesdroppers (Eves). In the considered system, the active RIS is adopted to enable the secure transmission of primary signals from the primary transmitter to multiple primary users in a multicasting manner, and simultaneously achieve its own information delivery to the secondary user by riding over the primary signals. Taking into account the imperfect channel state information (CSI) related with Eves, we formulate the system power consumption minimization problem by optimizing the transmit beamforming and reflection beamforming for the bounded and statistical CSI error models, taking the worst-case SNR constraints and the SNR outage probability constraints at the Eves into considerations, respectively. Specifically, the S-Procedure and the Bernstein-Type Inequality are implemented to approximately transform the worst-case SNR and the SNR outage probability constraints into tractable forms, respectively. After that, the formulated problems can be solved by the proposed alternating optimization (AO) algorithm with the semi-definite relaxation and sequential rank-one constraint relaxation techniques. Numerical results show that the proposed active RIS scheme can reduce up to 27.0% system power consumption compared to the passive RIS.