Function plays an important role in mathematics and many science branches. As the fast development of computer technology, more and more study on computational function analysis, e.g., Fast Fourier Transform, Wavelet Transform, Curve Function, are presented in these years. However, there are two main problems in these approaches: 1) hard to handle the complex functions of stationary and non-stationary, periodic and non-periodic, high order and low order; 2) hard to generalize the fitting functions from training data to test data. In this paper, a multiple regression based function fitting network that solves the two main problems is introduced as a predicable function fitting technique. This technique constructs the network includes three main parts: 1) the stationary transform layer, 2) the feature encoding layers, and 3) the fine tuning regression layer. The stationary transform layer recognizes the order of input function data, and transforms non-stationary function to stationary function. The feature encoding layers encode the raw input sequential data to a novel linear regression feature that can capture both the structural and the temporal characters of the sequential data. The fine tuning regression layer then fits the features to the target ahead values. The fitting network with the linear regression feature layers and a non-linear regression layer come up with high quality fitting results and generalizable predictions. The experiments of both mathematic function examples and the real word function examples verifies the efficiency of the proposed technique.
In this paper, we propose a Spatial Robust Mixture Regression model to investigate the relationship between a response variable and a set of explanatory variables over the spatial domain, assuming that the relationships may exhibit complex spatially dynamic patterns that cannot be captured by constant regression coefficients. Our method integrates the robust finite mixture Gaussian regression model with spatial constraints, to simultaneously handle the spatial nonstationarity, local homogeneity, and outlier contaminations. Compared with existing spatial regression models, our proposed model assumes the existence a few distinct regression models that are estimated based on observations that exhibit similar response-predictor relationships. As such, the proposed model not only accounts for nonstationarity in the spatial trend, but also clusters observations into a few distinct and homogenous groups. This provides an advantage on interpretation with a few stationary sub-processes identified that capture the predominant relationships between response and predictor variables. Moreover, the proposed method incorporates robust procedures to handle contaminations from both regression outliers and spatial outliers. By doing so, we robustly segment the spatial domain into distinct local regions with similar regression coefficients, and sporadic locations that are purely outliers. Rigorous statistical hypothesis testing procedure has been designed to test the significance of such segmentation. Experimental results on many synthetic and real-world datasets demonstrate the robustness, accuracy, and effectiveness of our proposed method, compared with other robust finite mixture regression, spatial regression and spatial segmentation methods.
Hypergraph offers a framework to depict the multilateral relationships in real-world complex data. Predicting higher-order relationships, i.e hyperedge, becomes a fundamental problem for the full understanding of complicated interactions. The development of graph neural network (GNN) has greatly advanced the analysis of ordinary graphs with pair-wise relations. However, these methods could not be easily extended to the case of hypergraph. In this paper, we generalize the challenges of GNN in representing higher-order data in principle, which are edge- and node-level ambiguities. To overcome the challenges, we present SNALS that utilizes bipartite graph neural network with structural features to collectively tackle the two ambiguity issues. SNALS captures the joint interactions of a hyperedge by its local environment, which is retrieved by collecting the spectrum information of their connections. As a result, SNALS achieves nearly 30% performance increase compared with most recent GNN-based models. In addition, we applied SNALS to predict genetic higher-order interactions on 3D genome organization data. SNALS showed consistently high prediction accuracy across different chromosomes, and generated novel findings on 4-way gene interaction, which is further validated by existing literature.
Low rank representation of binary matrix is powerful in disentangling sparse individual-attribute associations, and has received wide applications. Existing binary matrix factorization (BMF) or co-clustering (CC) methods often assume i.i.d background noise. However, this assumption could be easily violated in real data, where heterogeneous row- or column-wise probability of binary entries results in disparate element-wise background distribution, and paralyzes the rationality of existing methods. We propose a binary data denoising framework, namely BIND, which optimizes the detection of true patterns by estimating the row- or column-wise mixture distribution of patterns and disparate background, and eliminating the binary attributes that are more likely from the background. BIND is supported by thoroughly derived mathematical property of the row- and column-wise mixture distributions. Our experiment on synthetic and real-world data demonstrated BIND effectively removes background noise and drastically increases the fairness and accuracy of state-of-the arts BMF and CC methods.
Boolean tensor has been broadly utilized in representing high dimensional logical data collected on spatial, temporal and/or other relational domains. Boolean Tensor Decomposition (BTD) factorizes a binary tensor into the Boolean sum of multiple rank-1 tensors, which is an NP-hard problem. Existing BTD methods have been limited by their high computational cost, in applications to large scale or higher order tensors. In this work, we presented a computationally efficient BTD algorithm, namely \textit{Geometric Expansion for all-order Tensor Factorization} (GETF), that sequentially identifies the rank-1 basis components for a tensor from a geometric perspective. We conducted rigorous theoretical analysis on the validity as well as algorithemic efficiency of GETF in decomposing all-order tensor. Experiments on both synthetic and real-world data demonstrated that GETF has significantly improved performance in reconstruction accuracy, extraction of latent structures and it is an order of magnitude faster than other state-of-the-art methods.