Community detection in multi-layer networks has emerged as a crucial area of modern network analysis. However, conventional approaches often assume that nodes belong exclusively to a single community, which fails to capture the complex structure of real-world networks where nodes may belong to multiple communities simultaneously. To address this limitation, we propose novel spectral methods to estimate the common mixed memberships in the multi-layer mixed membership stochastic block model. The proposed methods leverage the eigen-decomposition of three aggregate matrices: the sum of adjacency matrices, the debiased sum of squared adjacency matrices, and the sum of squared adjacency matrices. We establish rigorous theoretical guarantees for the consistency of our methods. Specifically, we derive per-node error rates under mild conditions on network sparsity, demonstrating their consistency as the number of nodes and/or layers increases under the multi-layer mixed membership stochastic block model. Our theoretical results reveal that the method leveraging the sum of adjacency matrices generally performs poorer than the other two methods for mixed membership estimation in multi-layer networks. We conduct extensive numerical experiments to empirically validate our theoretical findings. For real-world multi-layer networks with unknown community information, we introduce two novel modularity metrics to quantify the quality of mixed membership community detection. Finally, we demonstrate the practical applications of our algorithms and modularity metrics by applying them to real-world multi-layer networks, demonstrating their effectiveness in extracting meaningful community structures.
Community detection in multi-layer networks is a crucial problem in network analysis. In this paper, we analyze the performance of two spectral clustering algorithms for community detection within the multi-layer degree-corrected stochastic block model (MLDCSBM) framework. One algorithm is based on the sum of adjacency matrices, while the other utilizes the debiased sum of squared adjacency matrices. We establish consistency results for community detection using these methods under MLDCSBM as the size of the network and/or the number of layers increases. Our theorems demonstrate the advantages of utilizing multiple layers for community detection. Moreover, our analysis indicates that spectral clustering with the debiased sum of squared adjacency matrices is generally superior to spectral clustering with the sum of adjacency matrices. Numerical simulations confirm that our algorithm, employing the debiased sum of squared adjacency matrices, surpasses existing methods for community detection in multi-layer networks. Finally, the analysis of several real-world multi-layer networks yields meaningful insights.
The latent class model is a powerful tool for identifying latent classes within populations that share common characteristics for categorical data in social, psychological, and behavioral sciences. In this article, we propose two new algorithms to estimate a latent class model for categorical data. Our algorithms are developed by using a newly defined regularized Laplacian matrix calculated from the response matrix. We provide theoretical convergence rates of our algorithms by considering a sparsity parameter and show that our algorithms stably yield consistent latent class analysis under mild conditions. Additionally, we propose a metric to capture the strength of latent class analysis and several procedures designed based on this metric to infer how many latent classes one should use for real-world categorical data. The efficiency and accuracy of our algorithms are verified by extensive simulated experiments, and we further apply our algorithms to real-world categorical data with promising results.
The Graded of Membership (GoM) model is a powerful tool for inferring latent classes in categorical data, which enables subjects to belong to multiple latent classes. However, its application is limited to categorical data with nonnegative integer responses, making it inappropriate for datasets with continuous or negative responses. To address this limitation, this paper proposes a novel model named the Weighted Grade of Membership (WGoM) model. Compared with GoM, our WGoM relaxes GoM's distribution constraint on the generation of a response matrix and it is more general than GoM. We then propose an algorithm to estimate the latent mixed memberships and the other WGoM parameters. We derive the error bounds of the estimated parameters and show that the algorithm is statistically consistent. The algorithmic performance is validated in both synthetic and real-world datasets. The results demonstrate that our algorithm is accurate and efficient, indicating its high potential for practical applications. This paper makes a valuable contribution to the literature by introducing a novel model that extends the applicability of the GoM model and provides a more flexible framework for analyzing categorical data with weighted responses.
The latent class model has been proposed as a powerful tool for cluster analysis of categorical data in various fields such as social, psychological, behavioral, and biological sciences. However, one important limitation of the latent class model is that it is only suitable for data with binary responses, making it fail to model real-world data with continuous or negative responses. In many applications, ignoring the weights throws out a lot of potentially valuable information contained in the weights. To address this limitation, we propose a novel generative model, the weighted latent class model (WLCM). Our model allows data's response matrix to be generated from an arbitrary distribution with a latent class structure. In comparison to the latent class model, our WLCM is more realistic and more general. To our knowledge, our WLCM is the first model for latent class analysis with weighted responses. We investigate the identifiability of the model and propose an efficient algorithm for estimating the latent classes and other model parameters. We show that the proposed algorithm enjoys consistent estimation. The performance of the proposed algorithm is investigated using both computer-generated and real-world weighted response data.
Modeling and estimating mixed memberships for un-directed un-weighted networks in which nodes can belong to multiple communities has been well studied in recent years. However, for a more general case, the bipartite weighted networks in which nodes can belong to multiple communities, row nodes can be different from column nodes, and all elements of adjacency matrices can be any finite real values, to our knowledge, there is no model for such bipartite weighted networks. To close this gap, this paper introduces a novel model, the Bipartite Mixed Membership Distribution-Free (BiMMDF) model. As a special case, bipartite signed networks with mixed memberships can also be generated from BiMMDF. Our model enjoys its advantage by allowing all elements of an adjacency matrix to be generated from any distribution as long as the expectation adjacency matrix has a block structure related to node memberships under BiMMDF. The proposed model can be viewed as an extension of many previous models, including the popular mixed membership stochastic blcokmodels. An efficient algorithm with a theoretical guarantee of consistent estimation is applied to fit BiMMDF. In particular, for a standard bipartite weighted network with two row (and column) communities, to make the algorithm's error rates small with high probability, separation conditions are obtained when adjacency matrices are generated from different distributions under BiMMDF. The behavior differences of different distributions on separation conditions are verified by extensive synthetic bipartite weighted networks generated under BiMMDF. Experiments on real-world directed weighted networks illustrate the advantage of the algorithm in studying highly mixed nodes and asymmetry between row and column communities.
We consider the problem of detecting latent community information of mixed membership weighted network in which nodes have mixed memberships and edges connecting between nodes can be finite real numbers. We propose a general mixed membership distribution-free model for this problem. The model has no distribution constraints of edges but only the expected values, and can be viewed as generalizations of some previous models. We use an efficient spectral algorithm to estimate community memberships under the model. We also derive the convergence rate of the proposed algorithm under the model using delicate spectral analysis. We demonstrate the advantages of mixed membership distribution-free model with applications to a small scale of simulated networks when edges follow different distributions.
Community detection for un-weighted networks has been widely studied in network analysis, but the case of weighted networks remains a challenge. In this paper, a Distribution-Free Models (DFM) is proposed for networks in which nodes are partitioned into different communities. DFM is a general, interpretable and identifiable model for both un-weighted networks and weighted networks. The proposed model does not require prior knowledge on a specific distribution for elements of adjacency matrix but only the expected value. The distribution-free property of DFM even allows adjacency matrix to have negative elements. We develop an efficient spectral algorithm to fit DFM. By introducing a noise matrix, we build a theoretic framework on perturbation analysis to show that the proposed algorithm stably yields consistent community detection under DFM. Numerical experiments on both synthetic networks and two social networks from literature are used to illustrate the algorithm.
Consider a directed network with $K_{r}$ row communities and $K_{c}$ column communities. Previous works found that modeling directed networks in which all nodes have overlapping property requires $K_{r}=K_{c}$ for identifiability. In this paper, we propose an overlapping and nonoverlapping model to study directed networks in which row nodes have overlapping property while column nodes do not. The proposed model is identifiable when $K_{r}\leq K_{c}$. Meanwhile, we provide one identifiable model as extension of ONM to model directed networks with variation in node degree. Two spectral algorithms with theoretical guarantee on consistent estimations are designed to fit the models. A small scale of numerical studies are used to illustrate the algorithms.
This paper considers the problem of modeling and estimating community memberships of nodes in a directed network where every row (column) node is associated with a vector determining its membership in each row (column) community. To model such directed network, we propose directed degree corrected mixed membership (DiDCMM) model by considering degree heterogeneity. DiDCMM is identifiable under popular conditions for mixed membership network when considering degree heterogeneity. Based on the cone structure inherent in the normalized version of the left singular vectors and the simplex structure inherent in the right singular vectors of the population adjacency matrix, we build an efficient algorithm called DiMSC to infer the community membership vectors for both row nodes and column nodes. By taking the advantage of DiMSC's equivalence algorithm which returns same estimations as DiMSC and the recent development on row-wise singular vector deviation, we show that the proposed algorithm is asymptotically consistent under mild conditions by providing error bounds for the inferred membership vectors of each row node and each column node under DiDCMM. The theory is supplemented by a simulation study.