Recent years have witnessed great success in handling graph-related tasks with Graph Neural Networks (GNNs). Despite their great academic success, Multi-Layer Perceptrons (MLPs) remain the primary workhorse for practical industrial applications. One reason for this academic-industrial gap is the neighborhood-fetching latency incurred by data dependency in GNNs, which make it hard to deploy for latency-sensitive applications that require fast inference. Conversely, without involving any feature aggregation, MLPs have no data dependency and infer much faster than GNNs, but their performance is less competitive. Motivated by these complementary strengths and weaknesses, we propose a Graph Self-Distillation on Neighborhood (GSDN) framework to reduce the gap between GNNs and MLPs. Specifically, the GSDN framework is based purely on MLPs, where structural information is only implicitly used as prior to guide knowledge self-distillation between the neighborhood and the target, substituting the explicit neighborhood information propagation as in GNNs. As a result, GSDN enjoys the benefits of graph topology-awareness in training but has no data dependency in inference. Extensive experiments have shown that the performance of vanilla MLPs can be greatly improved with self-distillation, e.g., GSDN improves over stand-alone MLPs by 15.54\% on average and outperforms the state-of-the-art GNNs on six datasets. Regarding inference speed, GSDN infers 75X-89X faster than existing GNNs and 16X-25X faster than other inference acceleration methods.
How can we design protein sequences folding into the desired structures effectively and efficiently? Structure-based protein design has attracted increasing attention in recent years; however, few methods can simultaneously improve the accuracy and efficiency due to the lack of expressive features and autoregressive sequence decoder. To address these issues, we propose PiFold, which contains a novel residue featurizer and PiGNN layers to generate protein sequences in a one-shot way with improved recovery. Experiments show that PiFold could achieve 51.66\% recovery on CATH 4.2, while the inference speed is 70 times faster than the autoregressive competitors. In addition, PiFold achieves 58.72\% and 60.42\% recovery scores on TS50 and TS500, respectively. We conduct comprehensive ablation studies to reveal the role of different types of protein features and model designs, inspiring further simplification and improvement.
Deep learning shows great potential in generation tasks thanks to deep latent representation. Generative models are classes of models that can generate observations randomly with respect to certain implied parameters. Recently, the diffusion Model becomes a raising class of generative models by virtue of its power-generating ability. Nowadays, great achievements have been reached. More applications except for computer vision, speech generation, bioinformatics, and natural language processing are to be explored in this field. However, the diffusion model has its natural drawback of a slow generation process, leading to many enhanced works. This survey makes a summary of the field of the diffusion model. We firstly state the main problem with two landmark works - DDPM and DSM. Then, we present a diverse range of advanced techniques to speed up the diffusion models - training schedule, training-free sampling, mixed-modeling, and score & diffusion unification. Regarding existing models, we also provide a benchmark of FID score, IS, and NLL according to specific NFE. Moreover, applications with diffusion models are introduced including computer vision, sequence modeling, audio, and AI for science. Finally, there is a summarization of this field together with limitations & further directions.
With the remarkable progress of deep neural networks in computer vision, data mixing augmentation techniques are widely studied to alleviate problems of degraded generalization when the amount of training data is limited. However, mixup strategies have not been well assembled in current vision toolboxes. In this paper, we propose \texttt{OpenMixup}, an open-source all-in-one toolbox for supervised, semi-, and self-supervised visual representation learning with mixup. It offers an integrated model design and training platform, comprising a rich set of prevailing network architectures and modules, a collection of data mixing augmentation methods as well as practical model analysis tools. In addition, we also provide standard mixup image classification benchmarks on various datasets, which expedites practitioners to make fair comparisons among state-of-the-art methods under the same settings. The source code and user documents are available at \url{https://github.com/Westlake-AI/openmixup}.
Recent studies have proven that graph neural networks are vulnerable to adversarial attacks. Attackers can rely solely on the training labels to disrupt the performance of the agnostic victim model by edge perturbations. Researchers observe that the saliency-based attackers tend to add edges rather than delete them, which is previously explained by the fact that adding edges pollutes the nodes' features by aggregation while removing edges only leads to some loss of information. In this paper, we further prove that the attackers perturb graphs by adding inter-class edges, which also manifests as a reduction in the homophily of the perturbed graph. From this point of view, saliency-based attackers still have room for improvement in capability and imperceptibility. The message passing of the GNN-based surrogate model leads to the oversmoothing of nodes connected by inter-class edges, preventing attackers from obtaining the distinctiveness of node features. To solve this issue, we introduce a multi-hop aggregated message passing to preserve attribute differences between nodes. In addition, we propose a regularization term to restrict the homophily variance to enhance the attack imperceptibility. Experiments verify that our proposed surrogate model improves the attacker's versatility and the regularization term helps to limit the homophily of the perturbed graph.
Graph edge perturbations are dedicated to damaging the prediction of graph neural networks by modifying the graph structure. Previous gray-box attackers employ gradients from the surrogate model to locate the vulnerable edges to perturb the graph structure. However, unreliability exists in gradients on graph structures, which is rarely studied by previous works. In this paper, we discuss and analyze the errors caused by the unreliability of the structural gradients. These errors arise from rough gradient usage due to the discreteness of the graph structure and from the unreliability in the meta-gradient on the graph structure. In order to address these problems, we propose a novel attack model with methods to reduce the errors inside the structural gradients. We propose edge discrete sampling to select the edge perturbations associated with hierarchical candidate selection to ensure computational efficiency. In addition, semantic invariance and momentum gradient ensemble are proposed to address the gradient fluctuation on semantic-augmented graphs and the instability of the surrogate model. Experiments are conducted in untargeted gray-box poisoning scenarios and demonstrate the improvement in the performance of our approach.
Temporal point process (TPP) is commonly used to model the asynchronous event sequence featuring occurrence timestamps and revealed by probabilistic models conditioned on historical impacts. While lots of previous works have focused on `goodness-of-fit' of TPP models by maximizing the likelihood, their predictive performance is unsatisfactory, which means the timestamps generated by models are far apart from true observations. Recently, deep generative models such as denoising diffusion and score matching models have achieved great progress in image generating tasks by demonstrating their capability of generating samples of high quality. However, there are no complete and unified works exploring and studying the potential of generative models in the context of event occurence modeling for TPP. In this work, we try to fill the gap by designing a unified \textbf{g}enerative framework for \textbf{n}eural \textbf{t}emporal \textbf{p}oint \textbf{p}rocess (\textsc{GNTPP}) model to explore their feasibility and effectiveness, and further improve models' predictive performance. Besides, in terms of measuring the historical impacts, we revise the attentive models which summarize influence from historical events with an adaptive reweighting term considering events' type relation and time intervals. Extensive experiments have been conducted to illustrate the improved predictive capability of \textsc{GNTPP} with a line of generative probabilistic decoders, and performance gain from the revised attention. To the best of our knowledge, this is the first work that adapts generative models in a complete unified framework and studies their effectiveness in the context of TPP. Our codebase including all the methods given in Section.5.1.1 is open in \url{https://github.com/BIRD-TAO/GNTPP}. We hope the code framework can facilitate future research in Neural TPPs.
Dimensional reduction~(DR) maps high-dimensional data into a lower dimensions latent space with minimized defined optimization objectives. The DR method usually falls into feature selection~(FS) and feature projection~(FP). FS focuses on selecting a critical subset of dimensions but risks destroying the data distribution (structure). On the other hand, FP combines all the input features into lower dimensions space, aiming to maintain the data structure; but lacks interpretability and sparsity. FS and FP are traditionally incompatible categories; thus, they have not been unified into an amicable framework. We propose that the ideal DR approach combines both FS and FP into a unified end-to-end manifold learning framework, simultaneously performing fundamental feature discovery while maintaining the intrinsic relationships between data samples in the latent space. In this work, we develop a unified framework, Unified Dimensional Reduction Neural-network~(UDRN), that integrates FS and FP in a compatible, end-to-end way. We improve the neural network structure by implementing FS and FP tasks separately using two stacked sub-networks. In addition, we designed data augmentation of the DR process to improve the generalization ability of the method when dealing with extensive feature datasets and designed loss functions that can cooperate with the data augmentation. Extensive experimental results on four image and four biological datasets, including very high-dimensional data, demonstrate the advantages of DRN over existing methods~(FS, FP, and FS\&FP pipeline), especially in downstream tasks such as classification and visualization.
Manifold learning~(ML) aims to find low-dimensional embedding from high-dimensional data. Previous works focus on handcraft or easy datasets with simple and ideal scenarios; however, we find they perform poorly on real-world datasets with under-sampling data. Generally, ML methods primarily model data structure and subsequently process a low-dimensional embedding, where the poor local connectivity of under-sampling data in the former step and inappropriate optimization objectives in the later step will lead to \emph{structural distortion} and \emph{underconstrained embedding}. To solve this problem, we propose Deep Local-flatness Manifold Embedding (DLME), a novel ML framework to obtain reliable manifold embedding by reducing distortion. Our proposed DLME constructs semantic manifolds by data augmentation and overcomes \emph{structural distortion} problems with the help of its smooth framework. To overcome \emph{underconstrained embedding}, we design a specific loss for DLME and mathematically demonstrate that it leads to a more suitable embedding based on our proposed Local Flatness Assumption. In the experiments, by showing the effectiveness of DLME on downstream classification, clustering, and visualization tasks with three types of datasets (toy, biological, and image), our experimental results show that DLME outperforms SOTA ML \& contrastive learning (CL) methods.