Increasing the depth of Graph Convolutional Networks (GCN), which in principal can permit more expressivity, is shown to incur detriment to the performance especially on node classification. The main cause of this issue lies in \emph{over-smoothing}. As its name implies, over-smoothing drives the output of GCN with the increase in network depth towards a space that contains limited distinguished information among nodes, leading to poor trainability and expressivity. Several works on refining the architecture of deep GCN have been proposed, but the improvement in performance is still marginal and it is still unknown in theory whether or not these refinements are able to relieve over-smoothing. In this paper, we first theoretically analyze the over-smoothing issue for a general family of prevailing GCNs, including generic GCN, GCN with bias, ResGCN, and APPNP. We prove that the over-smoothing of all these models is characterized by an universal process, i.e. all nodes converging to a cuboid of specific structure. Upon this universal theorem, we further propose DropEdge, a novel and flexible technique to alleviate over-smoothing. At its core, DropEdge randomly removes a certain number of edges from the input graph at each training epoch, acting like a data augmenter and also a message passing reducer. Furthermore, we theoretically demonstrate that DropEdge either reduces the convergence speed of over-smoothing for general GCNs or relieves the information loss caused by it. One group of experimental evaluations on simulated dataset has visualized the difference of over-smoothing between different GCNs as well as verifying the validity of our proposed theorems. Moreover, extensive experiments on several real benchmarks support that DropEdge consistently improves the performance on a variety of both shallow and deep GCNs.
3D face reconstruction is a fundamental task that can facilitate numerous applications such as robust facial analysis and augmented reality. It is also a challenging task due to the lack of high-quality datasets that can fuel current deep learning-based methods. However, existing datasets are limited in quantity, realisticity and diversity. To circumvent these hurdles, we introduce Pixel-Face, a large-scale, high-resolution and diverse 3D face dataset with massive annotations. Specifically, Pixel-Face contains 855 subjects aging from 18 to 80. Each subject has more than 20 samples with various expressions. Each sample is composed of high-resolution multi-view RGB images and 3D meshes with various expressions. Moreover, we collect precise landmarks annotation and 3D registration result for each data. To demonstrate the advantages of Pixel-Face, we re-parameterize the 3D Morphable Model (3DMM) into Pixel-3DM using the collected data. We show that the obtained Pixel-3DM is better in modeling a wide range of face shapes and expressions. We also carefully benchmark existing 3D face reconstruction methods on our dataset. Moreover, Pixel-Face serves as an effective training source. We observe that the performance of current face reconstruction models significantly improves both on existing benchmarks and Pixel-Face after being fine-tuned using our newly collected data. Extensive experiments demonstrate the effectiveness of Pixel-3DM and the usefulness of Pixel-Face. The code and data is available at https://github.com/pixel-face/Pixel-Face.
\emph{Over-fitting} and \emph{over-smoothing} are two main obstacles of developing deep Graph Convolutional Networks (GCNs) for node classification. In particular, over-fitting weakens the generalization ability on small dataset, while over-smoothing impedes model training by isolating output representations from the input features with the increase in network depth. This paper proposes DropEdge, a novel and flexible technique to alleviate both issues. At its core, DropEdge randomly removes a certain number of edges from the input graph at each training epoch, acting like a data augmenter and also a message passing reducer. Furthermore, we theoretically demonstrate that DropEdge either reduces the convergence speed of over-smoothing or relieves the information loss caused by it. More importantly, our DropEdge is a general skill that can be equipped with many other backbone models (\emph{e.g.} GCN, ResGCN, GraphSAGE, and JKNet) for enhanced performance. Extensive experiments on several benchmarks verify that DropEdge consistently improves the performance on a variety of both shallow and deep GCNs. The effect of DropEdge on preventing over-smoothing is empirically visualized and validated as well. Codes are released on~\url{https://github.com/DropEdge/DropEdge}.
Although the essential nuance of human motion is often conveyed as a combination of body movements and hand gestures, the existing monocular motion capture approaches mostly focus on either body motion capture only ignoring hand parts or hand motion capture only without considering body motion. In this paper, we present FrankMocap, a motion capture system that can estimate both 3D hand and body motion from in-the-wild monocular inputs with faster speed (9.5 fps) and better accuracy than previous work. Our method works in near real-time (9.5 fps) and produces 3D body and hand motion capture outputs as a unified parametric model structure. Our method aims to capture 3D body and hand motion simultaneously from challenging in-the-wild monocular videos. To construct FrankMocap, we build the state-of-the-art monocular 3D "hand" motion capture method by taking the hand part of the whole body parametric model (SMPL-X). Our 3D hand motion capture output can be efficiently integrated to monocular body motion capture output, producing whole body motion results in a unified parrametric model structure. We demonstrate the state-of-the-art performance of our hand motion capture system in public benchmarks, and show the high quality of our whole body motion capture result in various challenging real-world scenes, including a live demo scenario.
Graph Identification (GI) has long been researched in graph learning and is essential in certain applications (e.g. social community detection). Specifically, GI requires to predict the label/score of a target graph given its collection of node features and edge connections. While this task is common, more complex cases arise in practice---we are supposed to do the inverse thing by, for example, grouping similar users in a social network given the labels of different communities. This triggers an interesting thought: can we identify nodes given the labels of the graphs they belong to? Therefore, this paper defines a novel problem dubbed Inverse Graph Identification (IGI), as opposed to GI. Upon a formal discussion of the variants of IGI, we choose a particular case study of node clustering by making use of the graph labels and node features, with an assistance of a hierarchical graph that further characterizes the connections between different graphs. To address this task, we propose Gaussian Mixture Graph Convolutional Network (GMGCN), a simple yet effective method that makes the node-level message passing process using Graph Attention Network (GAT) under the protocol of GI and then infers the category of each node via a Gaussian Mixture Layer (GML). The training of GMGCN is further boosted by a proposed consensus loss to take advantage of the structure of the hierarchical graph. Extensive experiments are conducted to test the rationality of the formulation of IGI. We verify the superiority of the proposed method compared to other baselines on several benchmarks we have built up. We will release our codes along with the benchmark data to facilitate more research attention to the IGI problem.
How to obtain informative representations of molecules is a crucial prerequisite in AI-driven drug design and discovery. Recent researches abstract molecules as graphs and employ Graph Neural Networks (GNNs) for task-specific and data-driven molecular representation learning. Nevertheless, two "dark clouds" impede the usage of GNNs in real scenarios: (1) insufficient labeled molecules for supervised training; (2) poor generalization capabilities to new-synthesized molecules. To address them both, we propose a novel molecular representation framework, GROVER, which stands for Graph Representation frOm self-superVised mEssage passing tRansformer. With carefully designed self-supervised tasks in node, edge and graph-level, GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data. Rather, to encode such complex information, GROVER integrates Message Passing Networks with the Transformer-style architecture to deliver a class of more expressive encoders of molecules. The flexibility of GROVER allows it to be trained efficiently on large-scale molecular dataset without requiring any supervision, thus being immunized to the two issues mentioned above. We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules---the biggest GNN and the largest training dataset that we have ever met. We then leverage the pre-trained GROVER to downstream molecular property prediction tasks followed by task-specific fine-tuning, where we observe a huge improvement (more than 6% on average) over current state-of-the-art methods on 11 challenging benchmarks. The insights we gained are that well-designed self-supervision losses and largely-expressive pre-trained models enjoy the significant potential on performance boosting.
The crux of molecular property prediction is to generate meaningful representations of the molecules. One promising route is to exploit the molecular graph structure through Graph Neural Networks (GNNs). It is well known that both atoms and bonds significantly affect the chemical properties of a molecule, so an expressive model shall be able to exploit both node (atom) and edge (bond) information simultaneously. Guided by this observation, we present Multi-View Graph Neural Network (MV-GNN), a multi-view message passing architecture to enable more accurate predictions of molecular properties. In MV-GNN, we introduce a shared self-attentive readout component and disagreement loss to stabilize the training process. This readout component also renders the whole architecture interpretable. We further boost the expressive power of MV-GNN by proposing a cross-dependent message passing scheme that enhances information communication of the two views, which results in the MV-GNN^cross variant. Lastly, we theoretically justify the expressiveness of the two proposed models in terms of distinguishing non-isomorphism graphs. Extensive experiments demonstrate that MV-GNN models achieve remarkably superior performance over the state-of-the-art models on a variety of challenging benchmarks. Meanwhile, visualization results of the node importance are consistent with prior knowledge, which confirms the interpretability power of MV-GNN models.
The Travelling Salesman Problem (TSP) is a classical NP-hard problem and has broad applications in many disciplines and industries. In a large scale location-based services system, users issue TSP queries concurrently, where a TSP query is a TSP instance with $n$ points. In the literature, many advanced TSP solvers are developed to find high-quality solutions. Such solvers can solve some TSP instances efficiently but may take an extremely long time for some other instances. Due to the diversity of TSP instances, it is well-known that there exists no universal best solver dominating all other solvers on all possible TSP instances. To solve TSP efficiently, in addition to developing new TSP solvers, it needs to find a per-instance solver for each TSP instance, which is known as the TSP solver selection problem. In this paper, for the first time, we propose a deep learning framework, \CTAS, for TSP solver selection in an end-to-end manner. Specifically, \CTAS exploits deep convolutional neural networks to extract informative features from TSP instances and involves data argumentation strategies to handle the scarcity of labeled TSP instances. Moreover, to support large scale TSP solver selection, we construct a challenging TSP benchmark dataset with 6,000 instances, which is known as the largest TSP benchmark. Our \CTAS achieves over 2$\times$ speedup of the average running time, comparing the single best solver, and outperforms the state-of-the-art statistical models.
The crux of molecular property prediction is to conduct meaningful and informative representations of molecules. One promising way of doing this is exploiting the molecular graph structure, which leads to several graph-based deep learning models, such as Graph Neural Networks (GNN), Message Passing Networks (MPN), etc. However, current deep graph learning models generally focus on either node embedding or edge embedding. Since both atoms and chemical bonds play important roles in chemical reactions and reflect the property of molecules, existing methods fail to exploit both node (atom) and edge (bond) features simultaneously to make property prediction. In this paper, we propose Dual Message Passing Neural Network (DualMPNN), a multi-view message passing architecture with a two-tier constraint, to make more accurate molecular property prediction. DualMPNN consists of two sub-models: one model as NodeMPN for atom-oriented modeling and the other as EdgeMPN for bond-oriented modeling. We design a shared self-attentive readout and disagreement loss to stabilize the training process and enhance the interactions between two sub-models. On the other hand, the designed self-attentive readout also provides the interpretability for molecular property prediction, which is crucial for real applications like molecular design and drug discovery. Our extensive experimental evaluations demonstrate that DualMPNN achieves remarkably superior improvement over the state-of-the-art approaches on a variety of challenging benchmarks. Meanwhile, the visualization of the self-attention demonstrates how different atoms affect the prediction performance, which brings in the interpretability of deep learning models out of the black-box.