Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yu Sun

Sherman

Spectral Heterogeneous Graph Convolutions via Positive Noncommutative Polynomials

May 31, 2023

Mingguo He, Zhewei Wei, Shikun Feng, Zhengjie Huang, Weibin Li, Yu Sun, Dianhai Yu

Abstract:Heterogeneous Graph Neural Networks (HGNNs) have gained significant popularity in various heterogeneous graph learning tasks. However, most HGNNs rely on spatial domain-based message passing and attention modules for information propagation and aggregation. These spatial-based HGNNs neglect the utilization of spectral graph convolutions, which are the foundation of Graph Convolutional Networks (GCN) on homogeneous graphs. Inspired by the effectiveness and scalability of spectral-based GNNs on homogeneous graphs, this paper explores the extension of spectral-based GNNs to heterogeneous graphs. We propose PSHGCN, a novel heterogeneous convolutional network based on positive noncommutative polynomials. PSHGCN provides a simple yet effective approach for learning spectral graph convolutions on heterogeneous graphs. Moreover, we demonstrate the rationale of PSHGCN in graph optimization. We conducted an extensive experimental study to show that PSHGCN can learn diverse spectral heterogeneous graph convolutions and achieve superior performance in node classification tasks. Our code is available at https://github.com/ivam-he/PSHGCN.

* 10 pages

Via

Access Paper or Ask Questions

Learning Task-Specific Strategies for Accelerated MRI

Apr 25, 2023

Zihui Wu, Tianwei Yin, Yu Sun, Robert Frost, Andre van der Kouwe, Adrian V. Dalca, Katherine L. Bouman

Figure 1 for Learning Task-Specific Strategies for Accelerated MRI

Figure 2 for Learning Task-Specific Strategies for Accelerated MRI

Figure 3 for Learning Task-Specific Strategies for Accelerated MRI

Figure 4 for Learning Task-Specific Strategies for Accelerated MRI

Abstract:Compressed sensing magnetic resonance imaging (CS-MRI) seeks to recover visual information from subsampled measurements for diagnostic tasks. Traditional CS-MRI methods often separately address measurement subsampling, image reconstruction, and task prediction, resulting in suboptimal end-to-end performance. In this work, we propose TACKLE as a unified framework for designing CS-MRI systems tailored to specific tasks. Leveraging recent co-design techniques, TACKLE jointly optimizes subsampling, reconstruction, and prediction strategies to enhance the performance on the downstream task. Our results on multiple public MRI datasets show that the proposed framework achieves improved performance on various tasks over traditional CS-MRI methods. We also evaluate the generalization ability of TACKLE by experimentally collecting a new dataset using different acquisition setups from the training data. Without additional fine-tuning, TACKLE functions robustly and leads to both numerical and visual improvements.

Via

Access Paper or Ask Questions

Label Information Enhanced Fraud Detection against Low Homophily in Graphs

Feb 21, 2023

Yuchen Wang, Jinghui Zhang, Zhengjie Huang, Weibin Li, Shikun Feng, Ziheng Ma, Yu Sun, Dianhai Yu, Fang Dong, Jiahui Jin(+2 more)

Figure 1 for Label Information Enhanced Fraud Detection against Low Homophily in Graphs

Figure 2 for Label Information Enhanced Fraud Detection against Low Homophily in Graphs

Figure 3 for Label Information Enhanced Fraud Detection against Low Homophily in Graphs

Figure 4 for Label Information Enhanced Fraud Detection against Low Homophily in Graphs

Abstract:Node classification is a substantial problem in graph-based fraud detection. Many existing works adopt Graph Neural Networks (GNNs) to enhance fraud detectors. While promising, currently most GNN-based fraud detectors fail to generalize to the low homophily setting. Besides, label utilization has been proved to be significant factor for node classification problem. But we find they are less effective in fraud detection tasks due to the low homophily in graphs. In this work, we propose GAGA, a novel Group AGgregation enhanced TrAnsformer, to tackle the above challenges. Specifically, the group aggregation provides a portable method to cope with the low homophily issue. Such an aggregation explicitly integrates the label information to generate distinguishable neighborhood information. Along with group aggregation, an attempt towards end-to-end trainable group encoding is proposed which augments the original feature space with the class labels. Meanwhile, we devise two additional learnable encodings to recognize the structural and relational context. Then, we combine the group aggregation and the learnable encodings into a Transformer encoder to capture the semantic information. Experimental results clearly show that GAGA outperforms other competitive graph-based fraud detectors by up to 24.39% on two trending public datasets and a real-world industrial dataset from Anonymous. Even more, the group aggregation is demonstrated to outperform other label utilization methods (e.g., C&S, BoT/UniMP) in the low homophily setting.

* Accepted to The ACM Webconf 2023

Via

Access Paper or Ask Questions

Pose-Oriented Transformer with Uncertainty-Guided Refinement for 2D-to-3D Human Pose Estimation

Feb 15, 2023

Han Li, Bowen Shi, Wenrui Dai, Hongwei Zheng, Botao Wang, Yu Sun, Min Guo, Chenlin Li, Junni Zou, Hongkai Xiong

Figure 1 for Pose-Oriented Transformer with Uncertainty-Guided Refinement for 2D-to-3D Human Pose Estimation

Figure 2 for Pose-Oriented Transformer with Uncertainty-Guided Refinement for 2D-to-3D Human Pose Estimation

Figure 3 for Pose-Oriented Transformer with Uncertainty-Guided Refinement for 2D-to-3D Human Pose Estimation

Figure 4 for Pose-Oriented Transformer with Uncertainty-Guided Refinement for 2D-to-3D Human Pose Estimation

Abstract:There has been a recent surge of interest in introducing transformers to 3D human pose estimation (HPE) due to their powerful capabilities in modeling long-term dependencies. However, existing transformer-based methods treat body joints as equally important inputs and ignore the prior knowledge of human skeleton topology in the self-attention mechanism. To tackle this issue, in this paper, we propose a Pose-Oriented Transformer (POT) with uncertainty guided refinement for 3D HPE. Specifically, we first develop novel pose-oriented self-attention mechanism and distance-related position embedding for POT to explicitly exploit the human skeleton topology. The pose-oriented self-attention mechanism explicitly models the topological interactions between body joints, whereas the distance-related position embedding encodes the distance of joints to the root joint to distinguish groups of joints with different difficulties in regression. Furthermore, we present an Uncertainty-Guided Refinement Network (UGRN) to refine pose predictions from POT, especially for the difficult joints, by considering the estimated uncertainty of each joint with uncertainty-guided sampling strategy and self-attention mechanism. Extensive experiments demonstrate that our method significantly outperforms the state-of-the-art methods with reduced model parameters on 3D HPE benchmarks such as Human3.6M and MPI-INF-3DHP

* accepted by AAAI2023

Via

Access Paper or Ask Questions

ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models

Feb 09, 2023

Pengfei Zhu, Chao Pang, Shuohuan Wang, Yekun Chai, Yu Sun, Hao Tian, Hua Wu

Abstract:In recent years, there has been an increased popularity in image and speech generation using diffusion models. However, directly generating music waveforms from free-form text prompts is still under-explored. In this paper, we propose the first text-to-waveform music generation model that can receive arbitrary texts using diffusion models. We incorporate the free-form textual prompt as the condition to guide the waveform generation process of diffusion models. To solve the problem of lacking such text-music parallel data, we collect a dataset of text-music pairs from the Internet with weak supervision. Besides, we compare the effect of two prompt formats of conditioning texts (music tags and free-form texts) and prove the superior performance of our method in terms of text-music relevance. We further demonstrate that our generated music in the waveform domain outperforms previous works by a large margin in terms of diversity, quality, and text-music relevance.

Via

Access Paper or Ask Questions

ERNIE 3.0 Tiny: Frustratingly Simple Method to Improve Task-Agnostic Distillation Generalization

Jan 09, 2023

Weixin Liu, Xuyi Chen, Jiaxiang Liu, Shikun Feng, Yu Sun, Hao Tian, Hua Wu

Figure 1 for ERNIE 3.0 Tiny: Frustratingly Simple Method to Improve Task-Agnostic Distillation Generalization

Figure 2 for ERNIE 3.0 Tiny: Frustratingly Simple Method to Improve Task-Agnostic Distillation Generalization

Figure 3 for ERNIE 3.0 Tiny: Frustratingly Simple Method to Improve Task-Agnostic Distillation Generalization

Figure 4 for ERNIE 3.0 Tiny: Frustratingly Simple Method to Improve Task-Agnostic Distillation Generalization

Abstract:Task-agnostic knowledge distillation attempts to address the problem of deploying large pretrained language model in resource-constrained scenarios by compressing a large pretrained model called teacher into a smaller one called student such that the student can be directly finetuned on downstream tasks and retains comparable performance. However, we empirically find that there is a generalization gap between the student and the teacher in existing methods. In this work, we show that we can leverage multi-task learning in task-agnostic distillation to advance the generalization of the resulted student. In particular, we propose Multi-task Infused Task-agnostic Knowledge Distillation (MITKD). We first enhance the teacher by multi-task training it on multiple downstream tasks and then perform distillation to produce the student. Experimental results demonstrate that our method yields a student with much better generalization, significantly outperforms existing baselines, and establishes a new state-of-the-art result on in-domain, out-domain, and low-resource datasets in the setting of task-agnostic distillation. Moreover, our method even exceeds an 8x larger BERT$_{\text{Base}}$ on SQuAD and four GLUE tasks. In addition, by combining ERNIE 3.0, our method achieves state-of-the-art results on 10 Chinese datasets.

Via

Access Paper or Ask Questions

ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages

Dec 13, 2022

Yekun Chai, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu

Figure 1 for ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages

Figure 2 for ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages

Figure 3 for ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages

Figure 4 for ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages

Abstract:Software engineers working with the same programming language (PL) may speak different natural languages (NLs) and vice versa, erecting huge barriers to communication and working efficiency. Recent studies have demonstrated the effectiveness of generative pre-training in computer programs, yet they are always English-centric. In this work, we step towards bridging the gap between multilingual NLs and multilingual PLs for large language models (LLMs). We release ERNIE-Code, a unified pre-trained language model for 116 NLs and 6 PLs. We employ two methods for universal cross-lingual pre-training: span-corruption language modeling that learns patterns from monolingual NL or PL; and pivot-based translation language modeling that relies on parallel data of many NLs and PLs. Extensive results show that ERNIE-Code outperforms previous multilingual LLMs for PL or NL across a wide range of end tasks of code intelligence, including multilingual code-to-text, text-to-code, code-to-code, and text-to-text generation. We further show its advantage of zero-shot prompting on multilingual code summarization and text-to-text translation. We will make our code and pre-trained models publicly available.

Via

Access Paper or Ask Questions

X-PuDu at SemEval-2022 Task 6: Multilingual Learning for English and Arabic Sarcasm Detection

Nov 30, 2022

Yaqian Han, Yekun Chai, Shuohuan Wang, Yu Sun, Hongyi Huang, Guanghao Chen, Yitong Xu, Yang Yang

Abstract:Detecting sarcasm and verbal irony from people's subjective statements is crucial to understanding their intended meanings and real sentiments and positions in social scenarios. This paper describes the X-PuDu system that participated in SemEval-2022 Task 6, iSarcasmEval - Intended Sarcasm Detection in English and Arabic, which aims at detecting intended sarcasm in various settings of natural language understanding. Our solution finetunes pre-trained language models, such as ERNIE-M and DeBERTa, under the multilingual settings to recognize the irony from Arabic and English texts. Our system ranked second out of 43, and ninth out of 32 in Task A: one-sentence detection in English and Arabic; fifth out of 22 in Task B: binary multi-label classification in English; first out of 16, and fifth out of 13 in Task C: sentence-pair detection in English and Arabic.

* SemEval-2022 Task 6

Via

Access Paper or Ask Questions

X-PuDu at SemEval-2022 Task 7: A Replaced Token Detection Task Pre-trained Model with Pattern-aware Ensembling for Identifying Plausible Clarifications

Nov 27, 2022

Junyuan Shang, Shuohuan Wang, Yu Sun, Yanjun Yu, Yue Zhou, Li Xiang, Guixiu Yang

Abstract:This paper describes our winning system on SemEval 2022 Task 7: Identifying Plausible Clarifications of Implicit and Underspecified Phrases in Instructional Texts. A replaced token detection pre-trained model is utilized with minorly different task-specific heads for SubTask-A: Multi-class Classification and SubTask-B: Ranking. Incorporating a pattern-aware ensemble method, our system achieves a 68.90% accuracy score and 0.8070 spearman's rank correlation score surpassing the 2nd place with a large margin by 2.7 and 2.2 percent points for SubTask-A and SubTask-B, respectively. Our approach is simple and easy to implement, and we conducted ablation studies and qualitative and quantitative analyses for the working strategies used in our system.

* Accepted at the 16th International Workshop on Semantic Evaluation (SemEval-2022), NAACL

Via

Access Paper or Ask Questions

Coordinate-Based Seismic Interpolation in Irregular Land Survey: A Deep Internal Learning Approach

Nov 21, 2022

Paul Goyes, Edwin Vargas, Claudia Correa, Yu Sun, Ulugbek Kamilov, Brendt Wohlberg, Henry Arguello

Figure 1 for Coordinate-Based Seismic Interpolation in Irregular Land Survey: A Deep Internal Learning Approach

Figure 2 for Coordinate-Based Seismic Interpolation in Irregular Land Survey: A Deep Internal Learning Approach

Figure 3 for Coordinate-Based Seismic Interpolation in Irregular Land Survey: A Deep Internal Learning Approach

Figure 4 for Coordinate-Based Seismic Interpolation in Irregular Land Survey: A Deep Internal Learning Approach

Abstract:Physical and budget constraints often result in irregular sampling, which complicates accurate subsurface imaging. Pre-processing approaches, such as missing trace or shot interpolation, are typically employed to enhance seismic data in such cases. Recently, deep learning has been used to address the trace interpolation problem at the expense of large amounts of training data to adequately represent typical seismic events. Nonetheless, state-of-the-art works have mainly focused on trace reconstruction, with little attention having been devoted to shot interpolation. Furthermore, existing methods assume regularly spaced receivers/sources failing in approximating seismic data from real (irregular) surveys. This work presents a novel shot gather interpolation approach which uses a continuous coordinate-based representation of the acquired seismic wavefield parameterized by a neural network. The proposed unsupervised approach, which we call coordinate-based seismic interpolation (CoBSI), enables the prediction of specific seismic characteristics in irregular land surveys without using external data during neural network training. Experimental results on real and synthetic 3D data validate the ability of the proposed method to estimate continuous smooth seismic events in the time-space and frequency-wavenumber domains, improving sparsity or low rank-based interpolation methods.

Via

Access Paper or Ask Questions