Score-based methods for learning Bayesain networks(BN) aim to maximizing the global score functions. However, if local variables have direct and indirect dependence simultaneously, the global optimization on score functions misses edges between variables with indirect dependent relationship, of which scores are smaller than those with direct dependent relationship. In this paper, we present an identifiability condition based on a determined subset of parents to identify the underlying DAG. By the identifiability condition, we develop a two-phase algorithm namely optimal-tuning (OT) algorithm to locally amend the global optimization. In the optimal phase, an optimization problem based on first-order Hilbert-Schmidt independence criterion (HSIC) gives an estimated skeleton as the initial determined parents subset. In the tuning phase, the skeleton is locally tuned by deletion, addition and DAG-formalization strategies using the theoretically proved incremental properties of high-order HSIC. Numerical experiments for different synthetic datasets and real-world datasets show that the OT algorithm outperforms existing methods. Especially in Sigmoid Mix model with the size of the graph being ${\rm\bf d=40}$, the structure intervention distance (SID) of the OT algorithm is 329.7 smaller than the one obtained by CAM, which indicates that the graph estimated by the OT algorithm misses fewer edges compared with CAM.
Social media and online navigation bring us enjoyable experience in accessing information, and simultaneously create information cocoons (ICs) in which we are unconsciously trapped with limited and biased information. We provide a formal definition of IC in the scenario of online navigation. Subsequently, by analyzing real recommendation networks extracted from Science, PNAS and Amazon websites, and testing mainstream algorithms in disparate recommender systems, we demonstrate that similarity-based recommendation techniques result in ICs, which suppress the system navigability by hundreds of times. We further propose a flexible recommendation strategy that solves the IC-induced problem and improves retrieval accuracy in navigation, demonstrated by simulations on real data and online experiments on the largest video website in China.
Superpixel segmentation algorithms are to partition an image into perceptually coherence atomic regions by assigning every pixel a superpixel label. Those algorithms have been wildly used as a preprocessing step in computer vision works, as they can enormously reduce the number of entries of subsequent algorithms. In this work, we propose an alternative superpixel segmentation method based on Gaussian mixture model (GMM) by assuming that each superpixel corresponds to a Gaussian distribution, and assuming that each pixel is generated by first randomly choosing one distribution from several Gaussian distributions which are defined to be related to that pixel, and then the pixel is drawn from the selected distribution. Based on this assumption, each pixel is supposed to be drawn from a mixture of Gaussian distributions with unknown parameters (GMM). An algorithm based on expectation-maximization method is applied to estimate the unknown parameters. Once the unknown parameters are obtained, the superpixel label of a pixel is determined by a posterior probability. The success of applying GMM to superpixel segmentation depends on the two major differences between the traditional GMM-based clustering and the proposed one: data points in our model may be non-identically distributed, and we present an approach to control the shape of the estimated Gaussian functions by adjusting their covariance matrices. Our method is of linear complexity with respect to the number of pixels. The proposed algorithm is inherently parallel and can get faster speed by adding simple OpenMP directives to our implementation. According to our experiments, our algorithm outperforms the state-of-the-art superpixel algorithms in accuracy and presents a competitive performance in computational efficiency.