Alert button
Picture for Dianhai Yu

Dianhai Yu

Alert button

Spectral Heterogeneous Graph Convolutions via Positive Noncommutative Polynomials

May 31, 2023
Mingguo He, Zhewei Wei, Shikun Feng, Zhengjie Huang, Weibin Li, Yu Sun, Dianhai Yu

Figure 1 for Spectral Heterogeneous Graph Convolutions via Positive Noncommutative Polynomials
Figure 2 for Spectral Heterogeneous Graph Convolutions via Positive Noncommutative Polynomials
Figure 3 for Spectral Heterogeneous Graph Convolutions via Positive Noncommutative Polynomials
Figure 4 for Spectral Heterogeneous Graph Convolutions via Positive Noncommutative Polynomials

Heterogeneous Graph Neural Networks (HGNNs) have gained significant popularity in various heterogeneous graph learning tasks. However, most HGNNs rely on spatial domain-based message passing and attention modules for information propagation and aggregation. These spatial-based HGNNs neglect the utilization of spectral graph convolutions, which are the foundation of Graph Convolutional Networks (GCN) on homogeneous graphs. Inspired by the effectiveness and scalability of spectral-based GNNs on homogeneous graphs, this paper explores the extension of spectral-based GNNs to heterogeneous graphs. We propose PSHGCN, a novel heterogeneous convolutional network based on positive noncommutative polynomials. PSHGCN provides a simple yet effective approach for learning spectral graph convolutions on heterogeneous graphs. Moreover, we demonstrate the rationale of PSHGCN in graph optimization. We conducted an extensive experimental study to show that PSHGCN can learn diverse spectral heterogeneous graph convolutions and achieve superior performance in node classification tasks. Our code is available at https://github.com/ivam-he/PSHGCN.

* 10 pages 
Viaarxiv icon

Label Information Enhanced Fraud Detection against Low Homophily in Graphs

Feb 21, 2023
Yuchen Wang, Jinghui Zhang, Zhengjie Huang, Weibin Li, Shikun Feng, Ziheng Ma, Yu Sun, Dianhai Yu, Fang Dong, Jiahui Jin, Beilun Wang, Junzhou Luo

Figure 1 for Label Information Enhanced Fraud Detection against Low Homophily in Graphs
Figure 2 for Label Information Enhanced Fraud Detection against Low Homophily in Graphs
Figure 3 for Label Information Enhanced Fraud Detection against Low Homophily in Graphs
Figure 4 for Label Information Enhanced Fraud Detection against Low Homophily in Graphs

Node classification is a substantial problem in graph-based fraud detection. Many existing works adopt Graph Neural Networks (GNNs) to enhance fraud detectors. While promising, currently most GNN-based fraud detectors fail to generalize to the low homophily setting. Besides, label utilization has been proved to be significant factor for node classification problem. But we find they are less effective in fraud detection tasks due to the low homophily in graphs. In this work, we propose GAGA, a novel Group AGgregation enhanced TrAnsformer, to tackle the above challenges. Specifically, the group aggregation provides a portable method to cope with the low homophily issue. Such an aggregation explicitly integrates the label information to generate distinguishable neighborhood information. Along with group aggregation, an attempt towards end-to-end trainable group encoding is proposed which augments the original feature space with the class labels. Meanwhile, we devise two additional learnable encodings to recognize the structural and relational context. Then, we combine the group aggregation and the learnable encodings into a Transformer encoder to capture the semantic information. Experimental results clearly show that GAGA outperforms other competitive graph-based fraud detectors by up to 24.39% on two trending public datasets and a real-world industrial dataset from Anonymous. Even more, the group aggregation is demonstrated to outperform other label utilization methods (e.g., C&S, BoT/UniMP) in the low homophily setting.

* Accepted to The ACM Webconf 2023 
Viaarxiv icon

TA-MoE: Topology-Aware Large Scale Mixture-of-Expert Training

Feb 20, 2023
Chang Chen, Min Li, Zhihua Wu, Dianhai Yu, Chao Yang

Figure 1 for TA-MoE: Topology-Aware Large Scale Mixture-of-Expert Training
Figure 2 for TA-MoE: Topology-Aware Large Scale Mixture-of-Expert Training
Figure 3 for TA-MoE: Topology-Aware Large Scale Mixture-of-Expert Training
Figure 4 for TA-MoE: Topology-Aware Large Scale Mixture-of-Expert Training

Sparsely gated Mixture-of-Expert (MoE) has demonstrated its effectiveness in scaling up deep neural networks to an extreme scale. Despite that numerous efforts have been made to improve the performance of MoE from the model design or system optimization perspective, existing MoE dispatch patterns are still not able to fully exploit the underlying heterogeneous network environments. In this paper, we propose TA-MoE, a topology-aware routing strategy for large-scale MoE trainging, from a model-system co-design perspective, which can dynamically adjust the MoE dispatch pattern according to the network topology. Based on communication modeling, we abstract the dispatch problem into an optimization objective and obtain the approximate dispatch pattern under different topologies. On top of that, we design a topology-aware auxiliary loss, which can adaptively route the data to fit in the underlying topology without sacrificing the model accuracy. Experiments show that TA-MoE can substantially outperform its counterparts on various hardware and model configurations, with roughly 1.01x-1.61x, 1.01x-4.77x, 1.25x-1.54x improvements over the popular DeepSpeed-MoE, FastMoE and FasterMoE.

Viaarxiv icon

PP-YOLOE-R: An Efficient Anchor-Free Rotated Object Detector

Nov 04, 2022
Xinxin Wang, Guanzhong Wang, Qingqing Dang, Yi Liu, Xiaoguang Hu, Dianhai Yu

Figure 1 for PP-YOLOE-R: An Efficient Anchor-Free Rotated Object Detector
Figure 2 for PP-YOLOE-R: An Efficient Anchor-Free Rotated Object Detector
Figure 3 for PP-YOLOE-R: An Efficient Anchor-Free Rotated Object Detector
Figure 4 for PP-YOLOE-R: An Efficient Anchor-Free Rotated Object Detector

Arbitrary-oriented object detection is a fundamental task in visual scenes involving aerial images and scene text. In this report, we present PP-YOLOE-R, an efficient anchor-free rotated object detector based on PP-YOLOE. We introduce a bag of useful tricks in PP-YOLOE-R to improve detection precision with marginal extra parameters and computational cost. As a result, PP-YOLOE-R-l and PP-YOLOE-R-x achieve 78.14 and 78.28 mAP respectively on DOTA 1.0 dataset with single-scale training and testing, which outperform almost all other rotated object detectors. With multi-scale training and testing, PP-YOLOE-R-l and PP-YOLOE-R-x further improve the detection precision to 80.02 and 80.73 mAP. In this case, PP-YOLOE-R-x surpasses all anchor-free methods and demonstrates competitive performance to state-of-the-art anchor-based two-stage models. Further, PP-YOLOE-R is deployment friendly and PP-YOLOE-R-s/m/l/x can reach 69.8/55.1/48.3/37.1 FPS respectively on RTX 2080 Ti with TensorRT and FP16-precision. Source code and pre-trained models are available at https://github.com/PaddlePaddle/PaddleDetection, which is powered by https://github.com/PaddlePaddle/Paddle.

* 6 pages, 2 figures, 3 tables 
Viaarxiv icon

PP-StructureV2: A Stronger Document Analysis System

Oct 11, 2022
Chenxia Li, Ruoyu Guo, Jun Zhou, Mengtao An, Yuning Du, Lingfeng Zhu, Yi Liu, Xiaoguang Hu, Dianhai Yu

Figure 1 for PP-StructureV2: A Stronger Document Analysis System
Figure 2 for PP-StructureV2: A Stronger Document Analysis System
Figure 3 for PP-StructureV2: A Stronger Document Analysis System
Figure 4 for PP-StructureV2: A Stronger Document Analysis System

A large amount of document data exists in unstructured form such as raw images without any text information. Designing a practical document image analysis system is a meaningful but challenging task. In previous work, we proposed an intelligent document analysis system PP-Structure. In order to further upgrade the function and performance of PP-Structure, we propose PP-StructureV2 in this work, which contains two subsystems: Layout Information Extraction and Key Information Extraction. Firstly, we integrate Image Direction Correction module and Layout Restoration module to enhance the functionality of the system. Secondly, 8 practical strategies are utilized in PP-StructureV2 for better performance. For Layout Analysis model, we introduce ultra light-weight detector PP-PicoDet and knowledge distillation algorithm FGD for model lightweighting, which increased the inference speed by 11 times with comparable mAP. For Table Recognition model, we utilize PP-LCNet, CSP-PAN and SLAHead to optimize the backbone module, feature fusion module and decoding module, respectively, which improved the table structure accuracy by 6\% with comparable inference speed. For Key Information Extraction model, we introduce VI-LayoutXLM which is a visual-feature independent LayoutXLM architecture, TB-YX sorting algorithm and U-DML knowledge distillation algorithm, which brought 2.8\% and 9.1\% improvement respectively on the Hmean of Semantic Entity Recognition and Relation Extraction tasks. All the above mentioned models and code are open-sourced in the GitHub repository PaddleOCR.

Viaarxiv icon

ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding

Sep 18, 2022
Wenjin Wang, Zhengjie Huang, Bin Luo, Qianglong Chen, Qiming Peng, Yinxu Pan, Weichong Yin, Shikun Feng, Yu Sun, Dianhai Yu, Yin Zhang

Figure 1 for ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding
Figure 2 for ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding
Figure 3 for ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding
Figure 4 for ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding

Recent efforts of multimodal Transformers have improved Visually Rich Document Understanding (VrDU) tasks via incorporating visual and textual information. However, existing approaches mainly focus on fine-grained elements such as words and document image patches, making it hard for them to learn from coarse-grained elements, including natural lexical units like phrases and salient visual regions like prominent image regions. In this paper, we attach more importance to coarse-grained elements containing high-density information and consistent semantics, which are valuable for document understanding. At first, a document graph is proposed to model complex relationships among multi-grained multimodal elements, in which salient visual regions are detected by a cluster-based method. Then, a multi-grained multimodal Transformer called mmLayout is proposed to incorporate coarse-grained information into existing pre-trained fine-grained multimodal Transformers based on the graph. In mmLayout, coarse-grained information is aggregated from fine-grained, and then, after further processing, is fused back into fine-grained for final prediction. Furthermore, common sense enhancement is introduced to exploit the semantic information of natural lexical units. Experimental results on four tasks, including information extraction and document question answering, show that our method can improve the performance of multimodal Transformers based on fine-grained elements and achieve better performance with fewer parameters. Qualitative analyses show that our method can capture consistent semantics in coarse-grained elements.

* Accepted by ACM Multimedia 2022 
Viaarxiv icon

Large-scale Knowledge Distillation with Elastic Heterogeneous Computing Resources

Jul 14, 2022
Ji Liu, Daxiang Dong, Xi Wang, An Qin, Xingjian Li, Patrick Valduriez, Dejing Dou, Dianhai Yu

Figure 1 for Large-scale Knowledge Distillation with Elastic Heterogeneous Computing Resources
Figure 2 for Large-scale Knowledge Distillation with Elastic Heterogeneous Computing Resources
Figure 3 for Large-scale Knowledge Distillation with Elastic Heterogeneous Computing Resources
Figure 4 for Large-scale Knowledge Distillation with Elastic Heterogeneous Computing Resources

Although more layers and more parameters generally improve the accuracy of the models, such big models generally have high computational complexity and require big memory, which exceed the capacity of small devices for inference and incurs long training time. In addition, it is difficult to afford long training time and inference time of big models even in high performance servers, as well. As an efficient approach to compress a large deep model (a teacher model) to a compact model (a student model), knowledge distillation emerges as a promising approach to deal with the big models. Existing knowledge distillation methods cannot exploit the elastic available computing resources and correspond to low efficiency. In this paper, we propose an Elastic Deep Learning framework for knowledge Distillation, i.e., EDL-Dist. The advantages of EDL-Dist are three-fold. First, the inference and the training process is separated. Second, elastic available computing resources can be utilized to improve the efficiency. Third, fault-tolerance of the training and inference processes is supported. We take extensive experimentation to show that the throughput of EDL-Dist is up to 3.125 times faster than the baseline method (online knowledge distillation) while the accuracy is similar or higher.

* To appear in Concurrency and Computation: Practice and Experience, 16 pages, 7 figures, 5 tables 
Viaarxiv icon

HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle

Jul 13, 2022
Guoxia Wang, Xiaomin Fang, Zhihua Wu, Yiqun Liu, Yang Xue, Yingfei Xiang, Dianhai Yu, Fan Wang, Yanjun Ma

Figure 1 for HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle
Figure 2 for HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle
Figure 3 for HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle
Figure 4 for HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle

Accurate protein structure prediction can significantly accelerate the development of life science. The accuracy of AlphaFold2, a frontier end-to-end structure prediction system, is already close to that of the experimental determination techniques. Due to the complex model architecture and large memory consumption, it requires lots of computational resources and time to implement the training and inference of AlphaFold2 from scratch. The cost of running the original AlphaFold2 is expensive for most individuals and institutions. Therefore, reducing this cost could accelerate the development of life science. We implement AlphaFold2 using PaddlePaddle, namely HelixFold, to improve training and inference speed and reduce memory consumption. The performance is improved by operator fusion, tensor fusion, and hybrid parallelism computation, while the memory is optimized through Recompute, BFloat16, and memory read/write in-place. Compared with the original AlphaFold2 (implemented with Jax) and OpenFold (implemented with PyTorch), HelixFold needs only 7.5 days to complete the full end-to-end training and only 5.3 days when using hybrid parallelism, while both AlphaFold2 and OpenFold take about 11 days. HelixFold saves 1x training time. We verified that HelixFold's accuracy could be on par with AlphaFold2 on the CASP14 and CAMEO datasets. HelixFold's code is available on GitHub for free download: https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/helixfold, and we also provide stable web services on https://paddlehelix.baidu.com/app/drug/protein/forecast.

Viaarxiv icon

PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR System

Jun 07, 2022
Chenxia Li, Weiwei Liu, Ruoyu Guo, Xiaoting Yin, Kaitao Jiang, Yongkun Du, Yuning Du, Lingfeng Zhu, Baohua Lai, Xiaoguang Hu, Dianhai Yu, Yanjun Ma

Figure 1 for PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR System
Figure 2 for PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR System
Figure 3 for PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR System
Figure 4 for PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR System

Optical character recognition (OCR) technology has been widely used in various scenes, as shown in Figure 1. Designing a practical OCR system is still a meaningful but challenging task. In previous work, considering the efficiency and accuracy, we proposed a practical ultra lightweight OCR system (PP-OCR), and an optimized version PP-OCRv2. In order to further improve the performance of PP-OCRv2, a more robust OCR system PP-OCRv3 is proposed in this paper. PP-OCRv3 upgrades the text detection model and text recognition model in 9 aspects based on PP-OCRv2. For text detector, we introduce a PAN module with large receptive field named LK-PAN, a FPN module with residual attention mechanism named RSE-FPN, and DML distillation strategy. For text recognizer, the base model is replaced from CRNN to SVTR, and we introduce lightweight text recognition network SVTR LCNet, guided training of CTC by attention, data augmentation strategy TextConAug, better pre-trained model by self-supervised TextRotNet, UDML, and UIM to accelerate the model and improve the effect. Experiments on real data show that the hmean of PP-OCRv3 is 5% higher than PP-OCRv2 under comparable inference speed. All the above mentioned models are open-sourced and the code is available in the GitHub repository PaddleOCR which is powered by PaddlePaddle.

* arXiv admin note: text overlap with arXiv:2109.03144 
Viaarxiv icon

PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit

May 20, 2022
Hui Zhang, Tian Yuan, Junkun Chen, Xintong Li, Renjie Zheng, Yuxin Huang, Xiaojie Chen, Enlei Gong, Zeyu Chen, Xiaoguang Hu, Dianhai Yu, Yanjun Ma, Liang Huang

Figure 1 for PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
Figure 2 for PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
Figure 3 for PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
Figure 4 for PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit

PaddleSpeech is an open-source all-in-one speech toolkit. It aims at facilitating the development and research of speech processing technologies by providing an easy-to-use command-line interface and a simple code structure. This paper describes the design philosophy and core architecture of PaddleSpeech to support several essential speech-to-text and text-to-speech tasks. PaddleSpeech achieves competitive or state-of-the-art performance on various speech datasets and implements the most popular methods. It also provides recipes and pretrained models to quickly reproduce the experimental results in this paper. PaddleSpeech is publicly avaiable at https://github.com/PaddlePaddle/PaddleSpeech.

Viaarxiv icon