Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lei Li

Carnegie Mellon University

EasyNLP: A Comprehensive and Easy-to-use Toolkit for Natural Language Processing

Apr 30, 2022

Chengyu Wang, Minghui Qiu, Taolin Zhang, Tingting Liu, Lei Li, Jianing Wang, Ming Wang, Jun Huang, Wei Lin

Figure 1 for EasyNLP: A Comprehensive and Easy-to-use Toolkit for Natural Language Processing

Figure 2 for EasyNLP: A Comprehensive and Easy-to-use Toolkit for Natural Language Processing

Figure 3 for EasyNLP: A Comprehensive and Easy-to-use Toolkit for Natural Language Processing

Figure 4 for EasyNLP: A Comprehensive and Easy-to-use Toolkit for Natural Language Processing

Abstract:The success of Pre-Trained Models (PTMs) has reshaped the development of Natural Language Processing (NLP). Yet, it is not easy to obtain high-performing models and deploy them online for industrial practitioners. To bridge this gap, EasyNLP is designed to make it easy to build NLP applications, which supports a comprehensive suite of NLP algorithms. It further features knowledge-enhanced pre-training, knowledge distillation and few-shot learning functionalities for large-scale PTMs, and provides a unified framework of model training, inference and deployment for real-world applications. Currently, EasyNLP has powered over ten business units within Alibaba Group and is seamlessly integrated to the Platform of AI (PAI) products on Alibaba Cloud. The source code of our EasyNLP toolkit is released at GitHub (https://github.com/alibaba/EasyNLP).

* 8 pages

Via

Access Paper or Ask Questions

Learning Design and Construction with Varying-Sized Materials via Prioritized Memory Resets

Apr 12, 2022

Yunfei Li, Tao Kong, Lei Li, Yi Wu

Figure 1 for Learning Design and Construction with Varying-Sized Materials via Prioritized Memory Resets

Figure 2 for Learning Design and Construction with Varying-Sized Materials via Prioritized Memory Resets

Figure 3 for Learning Design and Construction with Varying-Sized Materials via Prioritized Memory Resets

Figure 4 for Learning Design and Construction with Varying-Sized Materials via Prioritized Memory Resets

Abstract:Can a robot autonomously learn to design and construct a bridge from varying-sized blocks without a blueprint? It is a challenging task with long horizon and sparse reward -- the robot has to figure out physically stable design schemes and feasible actions to manipulate and transport blocks. Due to diverse block sizes, the state space and action trajectories are vast to explore. In this paper, we propose a hierarchical approach for this problem. It consists of a reinforcement-learning designer to propose high-level building instructions and a motion-planning-based action generator to manipulate blocks at the low level. For high-level learning, we develop a novel technique, prioritized memory resetting (PMR) to improve exploration. PMR adaptively resets the state to those most critical configurations from a replay buffer so that the robot can resume training on partial architectures instead of from scratch. Furthermore, we augment PMR with auxiliary training objectives and fine-tune the designer with the locomotion generator. Our experiments in simulation and on a real deployed robotic system demonstrate that it is able to effectively construct bridges with blocks of varying sizes at a high success rate. Demos can be found at https://sites.google.com/view/bridge-pmr.

* To be published in ICRA 2022

Via

Access Paper or Ask Questions

Confidence Estimation Transformer for Long-term Renewable Energy Forecasting in Reinforcement Learning-based Power Grid Dispatching

Apr 10, 2022

Xinhang Li, Zihao Li, Nan Yang, Zheng Yuan, Qinwen Wang, Yiying Yang, Yupeng Huang, Xuri Song, Lei Li, Lin Zhang

Figure 1 for Confidence Estimation Transformer for Long-term Renewable Energy Forecasting in Reinforcement Learning-based Power Grid Dispatching

Figure 2 for Confidence Estimation Transformer for Long-term Renewable Energy Forecasting in Reinforcement Learning-based Power Grid Dispatching

Figure 3 for Confidence Estimation Transformer for Long-term Renewable Energy Forecasting in Reinforcement Learning-based Power Grid Dispatching

Figure 4 for Confidence Estimation Transformer for Long-term Renewable Energy Forecasting in Reinforcement Learning-based Power Grid Dispatching

Abstract:The expansion of renewable energy could help realizing the goals of peaking carbon dioxide emissions and carbon neutralization. Some existing grid dispatching methods integrating short-term renewable energy prediction and reinforcement learning (RL) have been proved to alleviate the adverse impact of energy fluctuations risk. However, these methods omit the long-term output prediction, which leads to stability and security problems on the optimal power flow. This paper proposes a confidence estimation Transformer for long-term renewable energy forecasting in reinforcement learning-based power grid dispatching (Conformer-RLpatching). Conformer-RLpatching predicts long-term active output of each renewable energy generator with an enhanced Transformer to boost the performance of hybrid energy grid dispatching. Furthermore, a confidence estimation method is proposed to reduce the prediction error of renewable energy. Meanwhile, a dispatching necessity evaluation mechanism is put forward to decide whether the active output of a generator needs to be adjusted. Experiments carried out on the SG-126 power grid simulator show that Conformer-RLpatching achieves great improvement over the second best algorithm DDPG in security score by 25.8% and achieves a better total reward compared with the golden medal team in the power grid dispatching competition sponsored by State Grid Corporation of China under the same simulation environment. Codes are outsourced in https://github.com/buptlxh/Conformer-RLpatching.

Via

Access Paper or Ask Questions

Contextual Representation Learning beyond Masked Language Modeling

Apr 08, 2022

Zhiyi Fu, Wangchunshu Zhou, Jingjing Xu, Hao Zhou, Lei Li

Figure 1 for Contextual Representation Learning beyond Masked Language Modeling

Figure 2 for Contextual Representation Learning beyond Masked Language Modeling

Figure 3 for Contextual Representation Learning beyond Masked Language Modeling

Figure 4 for Contextual Representation Learning beyond Masked Language Modeling

Abstract:How do masked language models (MLMs) such as BERT learn contextual representations? In this work, we analyze the learning dynamics of MLMs. We find that MLMs adopt sampled embeddings as anchors to estimate and inject contextual semantics to representations, which limits the efficiency and effectiveness of MLMs. To address these issues, we propose TACO, a simple yet effective representation learning approach to directly model global semantics. TACO extracts and aligns contextual semantics hidden in contextualized representations to encourage models to attend global semantics when generating contextualized representations. Experiments on the GLUE benchmark show that TACO achieves up to 5x speedup and up to 1.2 points average improvement over existing MLMs. The code is available at https://github.com/FUZHIYI/TACO.

* ACL 2022

Via

Access Paper or Ask Questions

$\textit{latent}$-GLAT: Glancing at Latent Variables for Parallel Text Generation

Apr 05, 2022

Yu Bao, Hao Zhou, Shujian Huang, Dongqi Wang, Lihua Qian, Xinyu Dai, Jiajun Chen, Lei Li

$Figure 1 for $\textit{latent}$-GLAT: Glancing at Latent Variables for Parallel Text Generation$

$Figure 2 for $\textit{latent}$-GLAT: Glancing at Latent Variables for Parallel Text Generation$

$Figure 3 for $\textit{latent}$-GLAT: Glancing at Latent Variables for Parallel Text Generation$

$Figure 4 for $\textit{latent}$-GLAT: Glancing at Latent Variables for Parallel Text Generation$

Abstract:Recently, parallel text generation has received widespread attention due to its success in generation efficiency. Although many advanced techniques are proposed to improve its generation quality, they still need the help of an autoregressive model for training to overcome the one-to-many multi-modal phenomenon in the dataset, limiting their applications. In this paper, we propose $\textit{latent}$-GLAT, which employs the discrete latent variables to capture word categorical information and invoke an advanced curriculum learning technique, alleviating the multi-modality problem. Experiment results show that our method outperforms strong baselines without the help of an autoregressive model, which further broadens the application scenarios of the parallel decoding paradigm.

* 12 pages, 5 figures, 6 tables. Accepted as a long paper in the main conference of ACL-2022

Via

Access Paper or Ask Questions

STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation

Mar 20, 2022

Qingkai Fang, Rong Ye, Lei Li, Yang Feng, Mingxuan Wang

Figure 1 for STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation

Figure 2 for STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation

Figure 3 for STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation

Figure 4 for STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation

Abstract:How to learn a better speech representation for end-to-end speech-to-text translation (ST) with limited labeled data? Existing techniques often attempt to transfer powerful machine translation (MT) capabilities to ST, but neglect the representation discrepancy across modalities. In this paper, we propose the Speech-TExt Manifold Mixup (STEMM) method to calibrate such discrepancy. Specifically, we mix up the representation sequences of different modalities, and take both unimodal speech sequences and multimodal mixed sequences as input to the translation model in parallel, and regularize their output predictions with a self-learning framework. Experiments on MuST-C speech translation benchmark and further analysis show that our method effectively alleviates the cross-modal representation discrepancy, and achieves significant improvements over a strong baseline on eight translation directions.

* ACL 2022 main conference

Via

Access Paper or Ask Questions

E-KAR: A Benchmark for Rationalizing Natural Language Analogical Reasoning

Mar 16, 2022

Jiangjie Chen, Rui Xu, Ziquan Fu, Wei Shi, Zhongqiao Li, Xinbo Zhang, Changzhi Sun, Lei Li, Yanghua Xiao, Hao Zhou

Figure 1 for E-KAR: A Benchmark for Rationalizing Natural Language Analogical Reasoning

Figure 2 for E-KAR: A Benchmark for Rationalizing Natural Language Analogical Reasoning

Figure 3 for E-KAR: A Benchmark for Rationalizing Natural Language Analogical Reasoning

Figure 4 for E-KAR: A Benchmark for Rationalizing Natural Language Analogical Reasoning

Abstract:The ability to recognize analogies is fundamental to human cognition. Existing benchmarks to test word analogy do not reveal the underneath process of analogical reasoning of neural models. Holding the belief that models capable of reasoning should be right for the right reasons, we propose a first-of-its-kind Explainable Knowledge-intensive Analogical Reasoning benchmark (E-KAR). Our benchmark consists of 1,655 (in Chinese) and 1,251 (in English) problems sourced from the Civil Service Exams, which require intensive background knowledge to solve. More importantly, we design a free-text explanation scheme to explain whether an analogy should be drawn, and manually annotate them for each and every question and candidate answer. Empirical results suggest that this benchmark is very challenging for some state-of-the-art models for both explanation generation and analogical question answering tasks, which invites further research in this area.

* Accepted to ACL 2022 (Findings)

Via

Access Paper or Ask Questions

Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation

Mar 15, 2022

Xuandong Zhao, Zhiguo Yu, Ming Wu, Lei Li

Figure 1 for Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation

Figure 2 for Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation

Figure 3 for Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation

Figure 4 for Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation

Abstract:How to learn highly compact yet effective sentence representation? Pre-trained language models have been effective in many NLP tasks. However, these models are often huge and produce large sentence embeddings. Moreover, there is a big performance gap between large and small models. In this paper, we propose Homomorphic Projective Distillation (HPD) to learn compressed sentence embeddings. Our method augments a small Transformer encoder model with learnable projection layers to produce compact representations while mimicking a large pre-trained language model to retain the sentence representation quality. We evaluate our method with different model sizes on both semantic textual similarity (STS) and semantic retrieval (SR) tasks. Experiments show that our method achieves 2.7-4.5 points performance gain on STS tasks compared with previous best representations of the same size. In SR tasks, our method improves retrieval speed (8.2$\times$) and memory usage (8.0$\times$) compared with state-of-the-art large models.

* Findings of ACL 2022

Via

Access Paper or Ask Questions

Deepfake Network Architecture Attribution

Mar 14, 2022

Tianyun Yang, Ziyao Huang, Juan Cao, Lei Li, Xirong Li

Figure 1 for Deepfake Network Architecture Attribution

Figure 2 for Deepfake Network Architecture Attribution

Figure 3 for Deepfake Network Architecture Attribution

Figure 4 for Deepfake Network Architecture Attribution

Abstract:With the rapid progress of generation technology, it has become necessary to attribute the origin of fake images. Existing works on fake image attribution perform multi-class classification on several Generative Adversarial Network (GAN) models and obtain high accuracies. While encouraging, these works are restricted to model-level attribution, only capable of handling images generated by seen models with a specific seed, loss and dataset, which is limited in real-world scenarios when fake images may be generated by privately trained models. This motivates us to ask whether it is possible to attribute fake images to the source models' architectures even if they are finetuned or retrained under different configurations. In this work, we present the first study on Deepfake Network Architecture Attribution to attribute fake images on architecture-level. Based on an observation that GAN architecture is likely to leave globally consistent fingerprints while traces left by model weights vary in different regions, we provide a simple yet effective solution named DNA-Det for this problem. Extensive experiments on multiple cross-test setups and a large-scale dataset demonstrate the effectiveness of DNA-Det.

* Accepted to AAAI'22

Via

Access Paper or Ask Questions

$ \text{T}^3 $OMVP: A Transformer-based Time and Team Reinforcement Learning Scheme for Observation-constrained Multi-Vehicle Pursuit in Urban Area

Mar 04, 2022

Zheng Yuan, Tianhao Wu, Qinwen Wang, Yiying Yang, Lei Li, Lin Zhang

$Figure 1 for $ \text{T}^3 $OMVP: A Transformer-based Time and Team Reinforcement Learning Scheme for Observation-constrained Multi-Vehicle Pursuit in Urban Area$

$Figure 2 for $ \text{T}^3 $OMVP: A Transformer-based Time and Team Reinforcement Learning Scheme for Observation-constrained Multi-Vehicle Pursuit in Urban Area$

$Figure 3 for $ \text{T}^3 $OMVP: A Transformer-based Time and Team Reinforcement Learning Scheme for Observation-constrained Multi-Vehicle Pursuit in Urban Area$

$Figure 4 for $ \text{T}^3 $OMVP: A Transformer-based Time and Team Reinforcement Learning Scheme for Observation-constrained Multi-Vehicle Pursuit in Urban Area$

Abstract:Smart Internet of Vehicles (IoVs) combined with Artificial Intelligence (AI) will contribute to vehicle decision-making in the Intelligent Transportation System (ITS). Multi-Vehicle Pursuit games (MVP), a multi-vehicle cooperative ability to capture mobile targets, is becoming a hot research topic gradually. Although there are some achievements in the field of MVP in the open space environment, the urban area brings complicated road structures and restricted moving spaces as challenges to the resolution of MVP games. We define an Observation-constrained MVP (OMVP) problem in this paper and propose a Transformer-based Time and Team Reinforcement Learning scheme ($ \text{T}^3 $OMVP) to address the problem. First, a new multi-vehicle pursuit model is constructed based on decentralized partially observed Markov decision processes (Dec-POMDP) to instantiate this problem. Second, by introducing and modifying the transformer-based observation sequence, QMIX is redefined to adapt to the complicated road structure, restricted moving spaces and constrained observations, so as to control vehicles to pursue the target combining the vehicle's observations. Third, a multi-intersection urban environment is built to verify the proposed scheme. Extensive experimental results demonstrate that the proposed $ \text{T}^3 $OMVP scheme achieves significant improvements relative to state-of-the-art QMIX approaches by 9.66%~106.25%. Code is available at https://github.com/pipihaiziguai/T3OMVP.

Via

Access Paper or Ask Questions