Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Renjie Liao

Generative 3D Part Assembly via Part-Whole-Hierarchy Message Passing

Feb 27, 2024

Bi'an Du, Xiang Gao, Wei Hu, Renjie Liao

Figure 1 for Generative 3D Part Assembly via Part-Whole-Hierarchy Message Passing

Figure 2 for Generative 3D Part Assembly via Part-Whole-Hierarchy Message Passing

Figure 3 for Generative 3D Part Assembly via Part-Whole-Hierarchy Message Passing

Figure 4 for Generative 3D Part Assembly via Part-Whole-Hierarchy Message Passing

Abstract:Generative 3D part assembly involves understanding part relationships and predicting their 6-DoF poses for assembling a realistic 3D shape. Prior work often focus on the geometry of individual parts, neglecting part-whole hierarchies of objects. Leveraging two key observations: 1) super-part poses provide strong hints about part poses, and 2) predicting super-part poses is easier due to fewer superparts, we propose a part-whole-hierarchy message passing network for efficient 3D part assembly. We first introduce super-parts by grouping geometrically similar parts without any semantic labels. Then we employ a part-whole hierarchical encoder, wherein a super-part encoder predicts latent super-part poses based on input parts. Subsequently, we transform the point cloud using the latent poses, feeding it to the part encoder for aggregating super-part information and reasoning about part relationships to predict all part poses. In training, only ground-truth part poses are required. During inference, the predicted latent poses of super-parts enhance interpretability. Experimental results on the PartNet dataset show that our method achieves state-of-the-art performance in part and connectivity accuracy and enables an interpretable hierarchical part assembly.

Via

Access Paper or Ask Questions

Joint Generative Modeling of Scene Graphs and Images via Diffusion Models

Jan 02, 2024

Bicheng Xu, Qi Yan, Renjie Liao, Lele Wang, Leonid Sigal

Figure 1 for Joint Generative Modeling of Scene Graphs and Images via Diffusion Models

Figure 2 for Joint Generative Modeling of Scene Graphs and Images via Diffusion Models

Figure 3 for Joint Generative Modeling of Scene Graphs and Images via Diffusion Models

Figure 4 for Joint Generative Modeling of Scene Graphs and Images via Diffusion Models

Abstract:In this paper, we present a novel generative task: joint scene graph - image generation. While previous works have explored image generation conditioned on scene graphs or layouts, our task is distinctive and important as it involves generating scene graphs themselves unconditionally from noise, enabling efficient and interpretable control for image generation. Our task is challenging, requiring the generation of plausible scene graphs with heterogeneous attributes for nodes (objects) and edges (relations among objects), including continuous object bounding boxes and discrete object and relation categories. We introduce a novel diffusion model, DiffuseSG, that jointly models the adjacency matrix along with heterogeneous node and edge attributes. We explore various types of encodings for the categorical data, relaxing it into a continuous space. With a graph transformer being the denoiser, DiffuseSG successively denoises the scene graph representation in a continuous space and discretizes the final representation to generate the clean scene graph. Additionally, we introduce an IoU regularization to enhance the empirical performance. Our model significantly outperforms existing methods in scene graph generation on the Visual Genome and COCO-Stuff datasets, both on standard and newly introduced metrics that better capture the problem complexity. Moreover, we demonstrate the additional benefits of our model in two downstream applications: 1) excelling in a series of scene graph completion tasks, and 2) improving scene graph detection models by using extra training samples generated from DiffuseSG.

Via

Access Paper or Ask Questions

GEMTrans: A General, Echocardiography-based, Multi-Level Transformer Framework for Cardiovascular Diagnosis

Aug 25, 2023

Masoud Mokhtari, Neda Ahmadi, Teresa S. M. Tsang, Purang Abolmaesumi, Renjie Liao

Figure 1 for GEMTrans: A General, Echocardiography-based, Multi-Level Transformer Framework for Cardiovascular Diagnosis

Figure 2 for GEMTrans: A General, Echocardiography-based, Multi-Level Transformer Framework for Cardiovascular Diagnosis

Figure 3 for GEMTrans: A General, Echocardiography-based, Multi-Level Transformer Framework for Cardiovascular Diagnosis

Figure 4 for GEMTrans: A General, Echocardiography-based, Multi-Level Transformer Framework for Cardiovascular Diagnosis

Abstract:Echocardiography (echo) is an ultrasound imaging modality that is widely used for various cardiovascular diagnosis tasks. Due to inter-observer variability in echo-based diagnosis, which arises from the variability in echo image acquisition and the interpretation of echo images based on clinical experience, vision-based machine learning (ML) methods have gained popularity to act as secondary layers of verification. For such safety-critical applications, it is essential for any proposed ML method to present a level of explainability along with good accuracy. In addition, such methods must be able to process several echo videos obtained from various heart views and the interactions among them to properly produce predictions for a variety of cardiovascular measurements or interpretation tasks. Prior work lacks explainability or is limited in scope by focusing on a single cardiovascular task. To remedy this, we propose a General, Echo-based, Multi-Level Transformer (GEMTrans) framework that provides explainability, while simultaneously enabling multi-video training where the inter-play among echo image patches in the same frame, all frames in the same video, and inter-video relationships are captured based on a downstream task. We show the flexibility of our framework by considering two critical tasks including ejection fraction (EF) and aortic stenosis (AS) severity detection. Our model achieves mean absolute errors of 4.15 and 4.84 for single and dual-video EF estimation and an accuracy of 96.5 % for AS detection, while providing informative task-specific attention maps and prototypical explainability.

* To be published in MLMI 2023

Via

Access Paper or Ask Questions

EchoGLAD: Hierarchical Graph Neural Networks for Left Ventricle Landmark Detection on Echocardiograms

Jul 23, 2023

Masoud Mokhtari, Mobina Mahdavi, Hooman Vaseli, Christina Luong, Purang Abolmaesumi, Teresa S. M. Tsang, Renjie Liao

Figure 1 for EchoGLAD: Hierarchical Graph Neural Networks for Left Ventricle Landmark Detection on Echocardiograms

Figure 2 for EchoGLAD: Hierarchical Graph Neural Networks for Left Ventricle Landmark Detection on Echocardiograms

Figure 3 for EchoGLAD: Hierarchical Graph Neural Networks for Left Ventricle Landmark Detection on Echocardiograms

Figure 4 for EchoGLAD: Hierarchical Graph Neural Networks for Left Ventricle Landmark Detection on Echocardiograms

Abstract:The functional assessment of the left ventricle chamber of the heart requires detecting four landmark locations and measuring the internal dimension of the left ventricle and the approximate mass of the surrounding muscle. The key challenge of automating this task with machine learning is the sparsity of clinical labels, i.e., only a few landmark pixels in a high-dimensional image are annotated, leading many prior works to heavily rely on isotropic label smoothing. However, such a label smoothing strategy ignores the anatomical information of the image and induces some bias. To address this challenge, we introduce an echocardiogram-based, hierarchical graph neural network (GNN) for left ventricle landmark detection (EchoGLAD). Our main contributions are: 1) a hierarchical graph representation learning framework for multi-resolution landmark detection via GNNs; 2) induced hierarchical supervision at different levels of granularity using a multi-level loss. We evaluate our model on a public and a private dataset under the in-distribution (ID) and out-of-distribution (OOD) settings. For the ID setting, we achieve the state-of-the-art mean absolute errors (MAEs) of 1.46 mm and 1.86 mm on the two datasets. Our model also shows better OOD generalization than prior works with a testing MAE of 4.3 mm.

* To be published in MICCAI 2023

Via

Access Paper or Ask Questions

SwinGNN: Rethinking Permutation Invariance in Diffusion Models for Graph Generation

Jul 19, 2023

Qi Yan, Zhengyang Liang, Yang Song, Renjie Liao, Lele Wang

Figure 1 for SwinGNN: Rethinking Permutation Invariance in Diffusion Models for Graph Generation

Figure 2 for SwinGNN: Rethinking Permutation Invariance in Diffusion Models for Graph Generation

Figure 3 for SwinGNN: Rethinking Permutation Invariance in Diffusion Models for Graph Generation

Figure 4 for SwinGNN: Rethinking Permutation Invariance in Diffusion Models for Graph Generation

Abstract:Diffusion models based on permutation-equivariant networks can learn permutation-invariant distributions for graph data. However, in comparison to their non-invariant counterparts, we have found that these invariant models encounter greater learning challenges since 1) their effective target distributions exhibit more modes; 2) their optimal one-step denoising scores are the score functions of Gaussian mixtures with more components. Motivated by this analysis, we propose a non-invariant diffusion model, called $\textit{SwinGNN}$, which employs an efficient edge-to-edge 2-WL message passing network and utilizes shifted window based self-attention inspired by SwinTransformers. Further, through systematic ablations, we identify several critical training and sampling techniques that significantly improve the sample quality of graph generation. At last, we introduce a simple post-processing trick, $\textit{i.e.}$, randomly permuting the generated graphs, which provably converts any graph generative model to a permutation-invariant one. Extensive experiments on synthetic and real-world protein and molecule datasets show that our SwinGNN achieves state-of-the-art performances. Our code is released at https://github.com/qiyan98/SwinGNN.

Via

Access Paper or Ask Questions

Memorization Capacity of Multi-Head Attention in Transformers

Jun 03, 2023

Sadegh Mahdavi, Renjie Liao, Christos Thrampoulidis

Abstract:In this paper, we investigate the memorization capabilities of multi-head attention in Transformers, motivated by the central role attention plays in these models. Under a mild linear independence assumption on the input data, we present a theoretical analysis demonstrating that an $H$-head attention layer with a context size $n$, dimension $d$, and $O(Hd^2)$ parameters can memorize $O(Hn)$ examples. We conduct experiments that verify our assumptions on the image classification task using Vision Transformer. To validate our theoretical findings, we perform synthetic experiments and show a linear relationship between memorization capacity and the number of attention heads.

Via

Access Paper or Ask Questions

Specformer: Spectral Graph Neural Networks Meet Transformers

Mar 02, 2023

Deyu Bo, Chuan Shi, Lele Wang, Renjie Liao

Figure 1 for Specformer: Spectral Graph Neural Networks Meet Transformers

Figure 2 for Specformer: Spectral Graph Neural Networks Meet Transformers

Figure 3 for Specformer: Spectral Graph Neural Networks Meet Transformers

Figure 4 for Specformer: Spectral Graph Neural Networks Meet Transformers

Abstract:Spectral graph neural networks (GNNs) learn graph representations via spectral-domain graph convolutions. However, most existing spectral graph filters are scalar-to-scalar functions, i.e., mapping a single eigenvalue to a single filtered value, thus ignoring the global pattern of the spectrum. Furthermore, these filters are often constructed based on some fixed-order polynomials, which have limited expressiveness and flexibility. To tackle these issues, we introduce Specformer, which effectively encodes the set of all eigenvalues and performs self-attention in the spectral domain, leading to a learnable set-to-set spectral filter. We also design a decoder with learnable bases to enable non-local graph convolution. Importantly, Specformer is equivariant to permutation. By stacking multiple Specformer layers, one can build a powerful spectral GNN. On synthetic datasets, we show that our Specformer can better recover ground-truth spectral filters than other spectral GNNs. Extensive experiments of both node-level and graph-level tasks on real-world graph datasets show that our Specformer outperforms state-of-the-art GNNs and learns meaningful spectrum patterns. Code and data are available at https://github.com/bdy9527/Specformer.

* ICLR 2023

Via

Access Paper or Ask Questions

Self-Supervised Relation Alignment for Scene Graph Generation

Feb 02, 2023

Bicheng Xu, Renjie Liao, Leonid Sigal

Figure 1 for Self-Supervised Relation Alignment for Scene Graph Generation

Figure 2 for Self-Supervised Relation Alignment for Scene Graph Generation

Figure 3 for Self-Supervised Relation Alignment for Scene Graph Generation

Figure 4 for Self-Supervised Relation Alignment for Scene Graph Generation

Abstract:The goal of scene graph generation is to predict a graph from an input image, where nodes correspond to identified and localized objects and edges to their corresponding interaction predicates. Existing methods are trained in a fully supervised manner and focus on message passing mechanisms, loss functions, and/or bias mitigation. In this work we introduce a simple-yet-effective self-supervised relational alignment regularization designed to improve the scene graph generation performance. The proposed alignment is general and can be combined with any existing scene graph generation framework, where it is trained alongside the original model's objective. The alignment is achieved through distillation, where an auxiliary relation prediction branch, that mirrors and shares parameters with the supervised counterpart, is designed. In the auxiliary branch, relational input features are partially masked prior to message passing and predicate prediction. The predictions for masked relations are then aligned with the supervised counterparts after the message passing. We illustrate the effectiveness of this self-supervised relational alignment in conjunction with two scene graph generation architectures, SGTR and Neural Motifs, and show that in both cases we achieve significantly improved performance.

Via

Access Paper or Ask Questions

GraphPNAS: Learning Distribution of Good Neural Architectures via Deep Graph Generative Models

Nov 28, 2022

Muchen Li, Jeffrey Yunfan Liu, Leonid Sigal, Renjie Liao

Figure 1 for GraphPNAS: Learning Distribution of Good Neural Architectures via Deep Graph Generative Models

Figure 2 for GraphPNAS: Learning Distribution of Good Neural Architectures via Deep Graph Generative Models

Figure 3 for GraphPNAS: Learning Distribution of Good Neural Architectures via Deep Graph Generative Models

Figure 4 for GraphPNAS: Learning Distribution of Good Neural Architectures via Deep Graph Generative Models

Abstract:Neural architectures can be naturally viewed as computational graphs. Motivated by this perspective, we, in this paper, study neural architecture search (NAS) through the lens of learning random graph models. In contrast to existing NAS methods which largely focus on searching for a single best architecture, i.e, point estimation, we propose GraphPNAS a deep graph generative model that learns a distribution of well-performing architectures. Relying on graph neural networks (GNNs), our GraphPNAS can better capture topologies of good neural architectures and relations between operators therein. Moreover, our graph generator leads to a learnable probabilistic search method that is more flexible and efficient than the commonly used RNN generator and random search methods. Finally, we learn our generator via an efficient reinforcement learning formulation for NAS. To assess the effectiveness of our GraphPNAS, we conduct extensive experiments on three search spaces, including the challenging RandWire on TinyImageNet, ENAS on CIFAR10, and NAS-Bench-101/201. The complexity of RandWire is significantly larger than other search spaces in the literature. We show that our proposed graph generator consistently outperforms RNN-based one and achieves better or comparable performances than state-of-the-art NAS methods.

Via

Access Paper or Ask Questions

Learning Latent Part-Whole Hierarchies for Point Clouds

Nov 14, 2022

Xiang Gao, Wei Hu, Renjie Liao

Abstract:Strong evidence suggests that humans perceive the 3D world by parsing visual scenes and objects into part-whole hierarchies. Although deep neural networks have the capability of learning powerful multi-level representations, they can not explicitly model part-whole hierarchies, which limits their expressiveness and interpretability in processing 3D vision data such as point clouds. To this end, we propose an encoder-decoder style latent variable model that explicitly learns the part-whole hierarchies for the multi-level point cloud segmentation. Specifically, the encoder takes a point cloud as input and predicts the per-point latent subpart distribution at the middle level. The decoder takes the latent variable and the feature from the encoder as an input and predicts the per-point part distribution at the top level. During training, only annotated part labels at the top level are provided, thus making the whole framework weakly supervised. We explore two kinds of approximated inference algorithms, i.e., most-probable-latent and Monte Carlo methods, and three stochastic gradient estimations for learning discrete latent variables, i.e., straight-through, REINFORCE, and pathwise estimators. Experimental results on the PartNet dataset show that the proposed method achieves state-of-the-art performance in not only top-level part segmentation but also middle-level latent subpart segmentation.

Via

Access Paper or Ask Questions