Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lei Li

Carnegie Mellon University

StrokeNet: Stroke Assisted and Hierarchical Graph Reasoning Networks

Nov 23, 2021

Lei Li, Kai Fan, Chun Yuan

Figure 1 for StrokeNet: Stroke Assisted and Hierarchical Graph Reasoning Networks

Figure 2 for StrokeNet: Stroke Assisted and Hierarchical Graph Reasoning Networks

Figure 3 for StrokeNet: Stroke Assisted and Hierarchical Graph Reasoning Networks

Figure 4 for StrokeNet: Stroke Assisted and Hierarchical Graph Reasoning Networks

Abstract:Scene text detection is still a challenging task, as there may be extremely small or low-resolution strokes, and close or arbitrary-shaped texts. In this paper, StrokeNet is proposed to effectively detect the texts by capturing the fine-grained strokes, and infer structural relations between the hierarchical representation in the graph. Different from existing approaches that represent the text area by a series of points or rectangular boxes, we directly localize strokes of each text instance through Stroke Assisted Prediction Network (SAPN). Besides, Hierarchical Relation Graph Network (HRGN) is adopted to perform relational reasoning and predict the likelihood of linkages, effectively splitting the close text instances and grouping node classification results into arbitrary-shaped text region. We introduce a novel dataset with stroke-level annotations, namely SynthStroke, for offline pre-training of our model. Experiments on wide-ranging benchmarks verify the State-of-the-Art performance of our method. Our dataset and code will be available.

Via

Access Paper or Ask Questions

A Survey on Green Deep Learning

Nov 10, 2021

Jingjing Xu, Wangchunshu Zhou, Zhiyi Fu, Hao Zhou, Lei Li

Figure 1 for A Survey on Green Deep Learning

Figure 2 for A Survey on Green Deep Learning

Figure 3 for A Survey on Green Deep Learning

Figure 4 for A Survey on Green Deep Learning

Abstract:In recent years, larger and deeper models are springing up and continuously pushing state-of-the-art (SOTA) results across various fields like natural language processing (NLP) and computer vision (CV). However, despite promising results, it needs to be noted that the computations required by SOTA models have been increased at an exponential rate. Massive computations not only have a surprisingly large carbon footprint but also have negative effects on research inclusiveness and deployment on real-world applications. Green deep learning is an increasingly hot research field that appeals to researchers to pay attention to energy usage and carbon emission during model training and inference. The target is to yield novel results with lightweight and efficient technologies. Many technologies can be used to achieve this goal, like model compression and knowledge distillation. This paper focuses on presenting a systematic review of the development of Green deep learning technologies. We classify these approaches into four categories: (1) compact networks, (2) energy-efficient training strategies, (3) energy-efficient inference approaches, and (4) efficient data usage. For each category, we discuss the progress that has been achieved and the unresolved challenges.

Via

Access Paper or Ask Questions

Learning Logic Rules for Document-level Relation Extraction

Nov 09, 2021

Dongyu Ru, Changzhi Sun, Jiangtao Feng, Lin Qiu, Hao Zhou, Weinan Zhang, Yong Yu, Lei Li

Figure 1 for Learning Logic Rules for Document-level Relation Extraction

Figure 2 for Learning Logic Rules for Document-level Relation Extraction

Figure 3 for Learning Logic Rules for Document-level Relation Extraction

Figure 4 for Learning Logic Rules for Document-level Relation Extraction

Abstract:Document-level relation extraction aims to identify relations between entities in a whole document. Prior efforts to capture long-range dependencies have relied heavily on implicitly powerful representations learned through (graph) neural networks, which makes the model less transparent. To tackle this challenge, in this paper, we propose LogiRE, a novel probabilistic model for document-level relation extraction by learning logic rules. LogiRE treats logic rules as latent variables and consists of two modules: a rule generator and a relation extractor. The rule generator is to generate logic rules potentially contributing to final predictions, and the relation extractor outputs final predictions based on the generated logic rules. Those two modules can be efficiently optimized with the expectation-maximization (EM) algorithm. By introducing logic rules into neural networks, LogiRE can explicitly capture long-range dependencies as well as enjoy better interpretation. Empirical results show that LogiRE significantly outperforms several strong baselines in terms of relation performance (1.8 F1 score) and logical consistency (over 3.3 logic score). Our code is available at https://github.com/rudongyu/LogiRE.

* Appear at EMNLP 2021 main conference

Via

Access Paper or Ask Questions

Multi-Modality Cardiac Image Analysis with Deep Learning

Nov 08, 2021

Lei Li, Fuping Wu, Sihang Wang, Xiahai Zhuang

Figure 1 for Multi-Modality Cardiac Image Analysis with Deep Learning

Figure 2 for Multi-Modality Cardiac Image Analysis with Deep Learning

Figure 3 for Multi-Modality Cardiac Image Analysis with Deep Learning

Figure 4 for Multi-Modality Cardiac Image Analysis with Deep Learning

Abstract:Accurate cardiac computing, analysis and modeling from multi-modality images are important for the diagnosis and treatment of cardiac disease. Late gadolinium enhancement magnetic resonance imaging (LGE MRI) is a promising technique to visualize and quantify myocardial infarction (MI) and atrial scars. Automating quantification of MI and atrial scars can be challenging due to the low image quality and complex enhancement patterns of LGE MRI. Moreover, compared with the other sequences LGE MRIs with gold standard labels are particularly limited, which represents another obstacle for developing novel algorithms for automatic segmentation and quantification of LGE MRIs. This chapter aims to summarize the state-of-the-art and our recent advanced contributions on deep learning based multi-modality cardiac image analysis. Firstly, we introduce two benchmark works for multi-sequence cardiac MRI based myocardial and pathology segmentation. Secondly, two novel frameworks for left atrial scar segmentation and quantification from LGE MRI were presented. Thirdly, we present three unsupervised domain adaptation techniques for cross-modality cardiac image segmentation.

* Under review as a chapter of book 'Deep Learning for Medical Image Analysis, 2E'

Via

Access Paper or Ask Questions

LightSeq2: Accelerated Training for Transformer-based Models on GPUs

Oct 27, 2021

Xiaohui Wang, Ying Xiong, Xian Qian, Yang Wei, Lei Li, Mingxuan Wang

Figure 1 for LightSeq2: Accelerated Training for Transformer-based Models on GPUs

Figure 2 for LightSeq2: Accelerated Training for Transformer-based Models on GPUs

Figure 3 for LightSeq2: Accelerated Training for Transformer-based Models on GPUs

Figure 4 for LightSeq2: Accelerated Training for Transformer-based Models on GPUs

Abstract:Transformer-based models have proven to be powerful in many natural language, computer vision, and speech recognition applications. It is expensive to train these types of models due to unfixed input length, complex computation, and large numbers of parameters. Existing systems either only focus on efficient inference or optimize only BERT-like encoder models. In this paper, we present LightSeq2, a system for efficient training of Transformer-based models on GPUs. We propose a series of GPU optimization techniques tailored to computation flow and memory access patterns of neural layers in Transformers. LightSeq2 supports a variety of network architectures, including BERT (encoder-only), GPT (decoder-only), and Transformer (encoder-decoder). Our experiments on GPUs with varying models and datasets show that LightSeq2 is 1.4-3.5x faster than previous systems. In particular, it gains 308% training speedup compared with existing systems on a large public machine translation benchmark (WMT14 English-German).

* 12 pages, 17 figures

Via

Access Paper or Ask Questions

CNewSum: A Large-scale Chinese News Summarization Dataset with Human-annotated Adequacy and Deducibility Level

Oct 21, 2021

Danqing Wang, Jiaze Chen, Xianze Wu, Hao Zhou, Lei Li

Figure 1 for CNewSum: A Large-scale Chinese News Summarization Dataset with Human-annotated Adequacy and Deducibility Level

Figure 2 for CNewSum: A Large-scale Chinese News Summarization Dataset with Human-annotated Adequacy and Deducibility Level

Figure 3 for CNewSum: A Large-scale Chinese News Summarization Dataset with Human-annotated Adequacy and Deducibility Level

Figure 4 for CNewSum: A Large-scale Chinese News Summarization Dataset with Human-annotated Adequacy and Deducibility Level

Abstract:Automatic text summarization aims to produce a brief but crucial summary for the input documents. Both extractive and abstractive methods have witnessed great success in English datasets in recent years. However, there has been a minimal exploration of text summarization in Chinese, limited by the lack of large-scale datasets. In this paper, we present a large-scale Chinese news summarization dataset CNewSum, which consists of 304,307 documents and human-written summaries for the news feed. It has long documents with high-abstractive summaries, which can encourage document-level understanding and generation for current summarization models. An additional distinguishing feature of CNewSum is that its test set contains adequacy and deducibility annotations for the summaries. The adequacy level measures the degree of summary information covered by the document, and the deducibility indicates the reasoning ability the model needs to generate the summary. These annotations can help researchers analyze and target their model performance bottleneck. We examine recent methods on CNewSum and release our dataset to provide a solid testbed for automatic Chinese summarization research.

Via

Access Paper or Ask Questions

Well-classified Examples are Underestimated in Classification with Deep Neural Networks

Oct 15, 2021

Guangxiang Zhao, Wenkai Yang, Xuancheng Ren, Lei Li, Xu Sun

Figure 1 for Well-classified Examples are Underestimated in Classification with Deep Neural Networks

Figure 2 for Well-classified Examples are Underestimated in Classification with Deep Neural Networks

Figure 3 for Well-classified Examples are Underestimated in Classification with Deep Neural Networks

Figure 4 for Well-classified Examples are Underestimated in Classification with Deep Neural Networks

Abstract:The conventional wisdom behind learning deep classification models is to focus on bad-classified examples and ignore well-classified examples that are far from the decision boundary. For instance, when training with cross-entropy loss, examples with higher likelihoods (i.e., well-classified examples) contribute smaller gradients in back-propagation. However, we theoretically show that this common practice hinders representation learning, energy optimization, and the growth of margin. To counteract this deficiency, we propose to reward well-classified examples with additive bonuses to revive their contribution to learning. This counterexample theoretically addresses these three issues. We empirically support this claim by directly verify the theoretical results or through the significant performance improvement with our counterexample on diverse tasks, including image classification, graph classification, and machine translation. Furthermore, this paper shows that because our idea can solve these three issues, we can deal with complex scenarios, such as imbalanced classification, OOD detection, and applications under adversarial attacks. Code is available at: https://github.com/lancopku/well-classified-examples-are-underestimated.

* 16 pages, 11 figures, 13 tables

Via

Access Paper or Ask Questions

Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision

Oct 14, 2021

Chenyang Huang, Hao Zhou, Osmar R. Zaïane, Lili Mou, Lei Li

Figure 1 for Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision

Figure 2 for Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision

Figure 3 for Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision

Figure 4 for Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision

Abstract:How do we perform efficient inference while retaining high translation quality? Existing neural machine translation models, such as Transformer, achieve high performance, but they decode words one by one, which is inefficient. Recent non-autoregressive translation models speed up the inference, but their quality is still inferior. In this work, we propose DSLP, a highly efficient and high-performance model for machine translation. The key insight is to train a non-autoregressive Transformer with Deep Supervision and feed additional Layer-wise Predictions. We conducted extensive experiments on four translation tasks (both directions of WMT'14 EN-DE and WMT'16 EN-RO). Results show that our approach consistently improves the BLEU scores compared with respective base models. Specifically, our best variant outperforms the autoregressive model on three translation tasks, while being 14.8 times more efficient in inference.

Via

Access Paper or Ask Questions

The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21

Sep 24, 2021

Lihua Qian, Yi Zhou, Zaixiang Zheng, Yaoming Zhu, Zehui Lin, Jiangtao Feng, Shanbo Cheng, Lei Li, Mingxuan Wang, Hao Zhou

Figure 1 for The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21

Figure 2 for The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21

Figure 3 for The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21

Figure 4 for The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21

Abstract:This paper describes the Volctrans' submission to the WMT21 news translation shared task for German->English translation. We build a parallel (i.e., non-autoregressive) translation system using the Glancing Transformer, which enables fast and accurate parallel decoding in contrast to the currently prevailing autoregressive models. To the best of our knowledge, this is the first parallel translation system that can be scaled to such a practical scenario like WMT competition. More importantly, our parallel translation system achieves the best BLEU score (35.0) on German->English translation task, outperforming all strong autoregressive counterparts.

* 10 pages, 5 figures, WMT2021

Via

Access Paper or Ask Questions

Dynamic Knowledge Distillation for Pre-trained Language Models

Sep 23, 2021

Lei Li, Yankai Lin, Shuhuai Ren, Peng Li, Jie Zhou, Xu Sun

Figure 1 for Dynamic Knowledge Distillation for Pre-trained Language Models

Figure 2 for Dynamic Knowledge Distillation for Pre-trained Language Models

Figure 3 for Dynamic Knowledge Distillation for Pre-trained Language Models

Figure 4 for Dynamic Knowledge Distillation for Pre-trained Language Models

Abstract:Knowledge distillation~(KD) has been proved effective for compressing large-scale pre-trained language models. However, existing methods conduct KD statically, e.g., the student model aligns its output distribution to that of a selected teacher model on the pre-defined training dataset. In this paper, we explore whether a dynamic knowledge distillation that empowers the student to adjust the learning procedure according to its competency, regarding the student performance and learning efficiency. We explore the dynamical adjustments on three aspects: teacher model adoption, data selection, and KD objective adaptation. Experimental results show that (1) proper selection of teacher model can boost the performance of student model; (2) conducting KD with 10% informative instances achieves comparable performance while greatly accelerates the training; (3) the student performance can be boosted by adjusting the supervision contribution of different alignment objective. We find dynamic knowledge distillation is promising and provide discussions on potential future directions towards more efficient KD methods. Our code is available at https://github.com/lancopku/DynamicKD.

* Main Conference EMNLP 2021, Camera Ready

Via

Access Paper or Ask Questions