Alert button
Picture for Xin Dai

Xin Dai

Alert button

PDT: Pretrained Dual Transformers for Time-aware Bipartite Graphs

Jun 21, 2023
Xin Dai, Yujie Fan, Zhongfang Zhuang, Shubham Jain, Chin-Chia Michael Yeh, Junpeng Wang, Liang Wang, Yan Zheng, Wei Zhang

Figure 1 for PDT: Pretrained Dual Transformers for Time-aware Bipartite Graphs
Figure 2 for PDT: Pretrained Dual Transformers for Time-aware Bipartite Graphs
Figure 3 for PDT: Pretrained Dual Transformers for Time-aware Bipartite Graphs
Figure 4 for PDT: Pretrained Dual Transformers for Time-aware Bipartite Graphs

Pre-training on large models is prevalent and emerging with the ever-growing user-generated content in many machine learning application categories. It has been recognized that learning contextual knowledge from the datasets depicting user-content interaction plays a vital role in downstream tasks. Despite several studies attempting to learn contextual knowledge via pre-training methods, finding an optimal training objective and strategy for this type of task remains a challenging problem. In this work, we contend that there are two distinct aspects of contextual knowledge, namely the user-side and the content-side, for datasets where user-content interaction can be represented as a bipartite graph. To learn contextual knowledge, we propose a pre-training method that learns a bi-directional mapping between the spaces of the user-side and the content-side. We formulate the training goal as a contrastive learning task and propose a dual-Transformer architecture to encode the contextual knowledge. We evaluate the proposed method for the recommendation task. The empirical studies have demonstrated that the proposed method outperformed all the baselines with significant gains.

Viaarxiv icon

How Does Attention Work in Vision Transformers? A Visual Analytics Attempt

Mar 24, 2023
Yiran Li, Junpeng Wang, Xin Dai, Liang Wang, Chin-Chia Michael Yeh, Yan Zheng, Wei Zhang, Kwan-Liu Ma

Figure 1 for How Does Attention Work in Vision Transformers? A Visual Analytics Attempt
Figure 2 for How Does Attention Work in Vision Transformers? A Visual Analytics Attempt
Figure 3 for How Does Attention Work in Vision Transformers? A Visual Analytics Attempt
Figure 4 for How Does Attention Work in Vision Transformers? A Visual Analytics Attempt

Vision transformer (ViT) expands the success of transformer models from sequential data to images. The model decomposes an image into many smaller patches and arranges them into a sequence. Multi-head self-attentions are then applied to the sequence to learn the attention between patches. Despite many successful interpretations of transformers on sequential data, little effort has been devoted to the interpretation of ViTs, and many questions remain unanswered. For example, among the numerous attention heads, which one is more important? How strong are individual patches attending to their spatial neighbors in different heads? What attention patterns have individual heads learned? In this work, we answer these questions through a visual analytics approach. Specifically, we first identify what heads are more important in ViTs by introducing multiple pruning-based metrics. Then, we profile the spatial distribution of attention strengths between patches inside individual heads, as well as the trend of attention strengths across attention layers. Third, using an autoencoder-based learning solution, we summarize all possible attention patterns that individual heads could learn. Examining the attention strengths and patterns of the important heads, we answer why they are important. Through concrete case studies with experienced deep learning experts on multiple ViTs, we validate the effectiveness of our solution that deepens the understanding of ViTs from head importance, head attention strength, and head attention pattern.

* Accepted by PacificVis 2023 and selected to be published in TVCG 
Viaarxiv icon

ABN: Anti-Blur Neural Networks for Multi-Stage Deformable Image Registration

Dec 06, 2022
Yao Su, Xin Dai, Lifang He, Xiangnan Kong

Figure 1 for ABN: Anti-Blur Neural Networks for Multi-Stage Deformable Image Registration
Figure 2 for ABN: Anti-Blur Neural Networks for Multi-Stage Deformable Image Registration
Figure 3 for ABN: Anti-Blur Neural Networks for Multi-Stage Deformable Image Registration
Figure 4 for ABN: Anti-Blur Neural Networks for Multi-Stage Deformable Image Registration

Deformable image registration, i.e., the task of aligning multiple images into one coordinate system by non-linear transformation, serves as an essential preprocessing step for neuroimaging data. Recent research on deformable image registration is mainly focused on improving the registration accuracy using multi-stage alignment methods, where the source image is repeatedly deformed in stages by a same neural network until it is well-aligned with the target image. Conventional methods for multi-stage registration can often blur the source image as the pixel/voxel values are repeatedly interpolated from the image generated by the previous stage. However, maintaining image quality such as sharpness during image registration is crucial to medical data analysis. In this paper, we study the problem of anti-blur deformable image registration and propose a novel solution, called Anti-Blur Network (ABN), for multi-stage image registration. Specifically, we use a pair of short-term registration and long-term memory networks to learn the nonlinear deformations at each stage, where the short-term registration network learns how to improve the registration accuracy incrementally and the long-term memory network combines all the previous deformations to allow an interpolation to perform on the raw image directly and preserve image sharpness. Extensive experiments on both natural and medical image datasets demonstrated that ABN can accurately register images while preserving their sharpness. Our code and data can be found at https://github.com/anonymous3214/ABN

* Published as a full paper at ICDM 2022. Code: https://github.com/anonymous3214/ABN 
Viaarxiv icon