Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qiang Wang

Towards Reliable Neural Machine Translation with Consistency-Aware Meta-Learning

Mar 20, 2023

Rongxiang Weng, Qiang Wang, Wensen Cheng, Changfeng Zhu, Min Zhang

Abstract:Neural machine translation (NMT) has achieved remarkable success in producing high-quality translations. However, current NMT systems suffer from a lack of reliability, as their outputs that are often affected by lexical or syntactic changes in inputs, resulting in large variations in quality. This limitation hinders the practicality and trustworthiness of NMT. A contributing factor to this problem is that NMT models trained with the one-to-one paradigm struggle to handle the source diversity phenomenon, where inputs with the same meaning can be expressed differently. In this work, we treat this problem as a bilevel optimization problem and present a consistency-aware meta-learning (CAML) framework derived from the model-agnostic meta-learning (MAML) algorithm to address it. Specifically, the NMT model with CAML (named CoNMT) first learns a consistent meta representation of semantically equivalent sentences in the outer loop. Subsequently, a mapping from the meta representation to the output sentence is learned in the inner loop, allowing the NMT model to translate semantically equivalent sentences to the same target sentence. We conduct experiments on the NIST Chinese to English task, three WMT translation tasks, and the TED M2O task. The results demonstrate that CoNMT effectively improves overall translation quality and reliably handles diverse inputs.

Via

Access Paper or Ask Questions

SVDE: Scalable Value-Decomposition Exploration for Cooperative Multi-Agent Reinforcement Learning

Mar 16, 2023

Shuhan Qi, Shuhao Zhang, Qiang Wang, Jiajia Zhang, Jing Xiao, Xuan Wang

Figure 1 for SVDE: Scalable Value-Decomposition Exploration for Cooperative Multi-Agent Reinforcement Learning

Figure 2 for SVDE: Scalable Value-Decomposition Exploration for Cooperative Multi-Agent Reinforcement Learning

Figure 3 for SVDE: Scalable Value-Decomposition Exploration for Cooperative Multi-Agent Reinforcement Learning

Figure 4 for SVDE: Scalable Value-Decomposition Exploration for Cooperative Multi-Agent Reinforcement Learning

Abstract:Value-decomposition methods, which reduce the difficulty of a multi-agent system by decomposing the joint state-action space into local observation-action spaces, have become popular in cooperative multi-agent reinforcement learning (MARL). However, value-decomposition methods still have the problems of tremendous sample consumption for training and lack of active exploration. In this paper, we propose a scalable value-decomposition exploration (SVDE) method, which includes a scalable training mechanism, intrinsic reward design, and explorative experience replay. The scalable training mechanism asynchronously decouples strategy learning with environmental interaction, so as to accelerate sample generation in a MapReduce manner. For the problem of lack of exploration, an intrinsic reward design and explorative experience replay are proposed, so as to enhance exploration to produce diverse samples and filter non-novel samples, respectively. Empirically, our method achieves the best performance on almost all maps compared to other popular algorithms in a set of StarCraft II micromanagement games. A data-efficiency experiment also shows the acceleration of SVDE for sample collection and policy convergence, and we demonstrate the effectiveness of factors in SVDE through a set of ablation experiments.

* 13 pages, 9 figures

Via

Access Paper or Ask Questions

Winning Solution of Real Robot Challenge III

Jan 30, 2023

Qiang Wang, Robert McCarthy, David Cordova Bulens, Stephen J. Redmond

Abstract:This report introduces our winning solution of the real-robot phase of the Real Robot Challenge (RRC) 2022. The goal of this year's challenge is to solve dexterous manipulation tasks with offline reinforcement learning (RL) or imitation learning. To this end, participants are provided with datasets containing dozens of hours of robotic data. For each task an expert and a mixed dataset are provided. In our experiments, when learning from the expert datasets, we find standard Behavioral Cloning (BC) outperforms state-of-the-art offline RL algorithms. When learning from the mixed datasets, BC performs poorly, as expected, while surprisingly offline RL performs suboptimally, failing to match the average performance of the baseline model used for collecting the datasets. To remedy this, motivated by the strong performance of BC on the expert datasets we elect to use a semi-supervised classification technique to filter the subset of expert data out from the mixed datasets, and subsequently perform BC on this extracted subset of data. To further improve results, in all settings we use a simple data augmentation method that exploits the geometric symmetry of the RRC physical robotic environment. Our submitted BC policies each surpass the mean return of their respective raw datasets, and the policies trained on the filtered mixed datasets come close to matching the performances of those trained on the expert datasets.

Via

Access Paper or Ask Questions

Behaviour Discriminator: A Simple Data Filtering Method to Improve Offline Policy Learning

Jan 27, 2023

Qiang Wang, Robert McCarthy, David Cordova Bulens, Kevin McGuinness, Noel E. O'Connor, Francisco Roldan Sanchez, Stephen J. Redmond

Abstract:This paper studies the problem of learning a control policy without the need for interactions with the environment; instead, learning purely from an existing dataset. Prior work has demonstrated that offline learning algorithms (e.g., behavioural cloning and offline reinforcement learning) are more likely to discover a satisfactory policy when trained using high-quality expert data. However, many real-world/practical datasets can contain significant proportions of examples generated using low-skilled agents. Therefore, we propose a behaviour discriminator (BD) concept, a novel and simple data filtering approach based on semi-supervised learning, which can accurately discern expert data from a mixed-quality dataset. Our BD approach was used to pre-process the mixed-skill-level datasets from the Real Robot Challenge (RRC) III, an open competition requiring participants to solve several dexterous robotic manipulation tasks using offline learning methods; the new BD method allowed a standard behavioural cloning algorithm to outperform other more sophisticated offline learning algorithms. Moreover, we demonstrate that the new BD pre-processing method can be applied to a number of D4RL benchmark problems, improving the performance of multiple state-of-the-art offline reinforcement learning algorithms.

Via

Access Paper or Ask Questions

Rethinking Disparity: A Depth Range Free Multi-View Stereo Based on Disparity

Dec 04, 2022

Qingsong Yan, Qiang Wang, Kaiyong Zhao, Bo Li, Xiaowen Chu, Fei Deng

Abstract:Existing learning-based multi-view stereo (MVS) methods rely on the depth range to build the 3D cost volume and may fail when the range is too large or unreliable. To address this problem, we propose a disparity-based MVS method based on the epipolar disparity flow (E-flow), called DispMVS, which infers the depth information from the pixel movement between two views. The core of DispMVS is to construct a 2D cost volume on the image plane along the epipolar line between each pair (between the reference image and several source images) for pixel matching and fuse uncountable depths triangulated from each pair by multi-view geometry to ensure multi-view consistency. To be robust, DispMVS starts from a randomly initialized depth map and iteratively refines the depth map with the help of the coarse-to-fine strategy. Experiments on DTUMVS and Tanks\&Temple datasets show that DispMVS is not sensitive to the depth range and achieves state-of-the-art results with lower GPU memory.

* Accepted at the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI23)

Via

Access Paper or Ask Questions

The RoyalFlush System for the WMT 2022 Efficiency Task

Dec 03, 2022

Bo Qin, Aixin Jia, Qiang Wang, Jianning Lu, Shuqin Pan, Haibo Wang, Ming Chen

Abstract:This paper describes the submission of the RoyalFlush neural machine translation system for the WMT 2022 translation efficiency task. Unlike the commonly used autoregressive translation system, we adopted a two-stage translation paradigm called Hybrid Regression Translation (HRT) to combine the advantages of autoregressive and non-autoregressive translation. Specifically, HRT first autoregressively generates a discontinuous sequence (e.g., make a prediction every $k$ tokens, $k>1$) and then fills in all previously skipped tokens at once in a non-autoregressive manner. Thus, we can easily trade off the translation quality and speed by adjusting $k$. In addition, by integrating other modeling techniques (e.g., sequence-level knowledge distillation and deep-encoder-shallow-decoder layer allocation strategy) and a mass of engineering efforts, HRT improves 80\% inference speed and achieves equivalent translation performance with the same-capacity AT counterpart. Our fastest system reaches 6k+ words/second on the GPU latency setting, estimated to be about 3.1x faster than the last year's winner.

* Accepted by WMT 2022. arXiv admin note: text overlap with arXiv:2210.10416

Via

Access Paper or Ask Questions

Hybrid-Regressive Neural Machine Translation

Oct 19, 2022

Qiang Wang, Xinhui Hu, Ming Chen

Figure 1 for Hybrid-Regressive Neural Machine Translation

Figure 2 for Hybrid-Regressive Neural Machine Translation

Figure 3 for Hybrid-Regressive Neural Machine Translation

Figure 4 for Hybrid-Regressive Neural Machine Translation

Abstract:In this work, we empirically confirm that non-autoregressive translation with an iterative refinement mechanism (IR-NAT) suffers from poor acceleration robustness because it is more sensitive to decoding batch size and computing device setting than autoregressive translation (AT). Inspired by it, we attempt to investigate how to combine the strengths of autoregressive and non-autoregressive translation paradigms better. To this end, we demonstrate through synthetic experiments that prompting a small number of AT's predictions can promote one-shot non-autoregressive translation to achieve the equivalent performance of IR-NAT. Following this line, we propose a new two-stage translation prototype called hybrid-regressive translation (HRT). Specifically, HRT first generates discontinuous sequences via autoregression (e.g., make a prediction every k tokens, k>1) and then fills in all previously skipped tokens at once in a non-autoregressive manner. We also propose a bag of techniques to effectively and efficiently train HRT without adding any model parameters. HRT achieves the state-of-the-art BLEU score of 28.49 on the WMT En-De task and is at least 1.5x faster than AT, regardless of batch size and device. In addition, another bonus of HRT is that it successfully inherits the good characteristics of AT in the deep-encoder-shallow-decoder architecture. Concretely, compared to the vanilla HRT with a 6-layer encoder and 6-layer decoder, the inference speed of HRT with a 12-layer encoder and 1-layer decoder is further doubled on both GPU and CPU without BLEU loss.

* Submitted to ICLR 2023

Via

Access Paper or Ask Questions

Hierarchical reinforcement learning for in-hand robotic manipulation using Davenport chained rotations

Oct 03, 2022

Francisco Roldan Sanchez, Qiang Wang, David Cordova Bulens, Kevin McGuinness, Stephen Redmond, Noel O'Connor

Figure 1 for Hierarchical reinforcement learning for in-hand robotic manipulation using Davenport chained rotations

Figure 2 for Hierarchical reinforcement learning for in-hand robotic manipulation using Davenport chained rotations

Figure 3 for Hierarchical reinforcement learning for in-hand robotic manipulation using Davenport chained rotations

Figure 4 for Hierarchical reinforcement learning for in-hand robotic manipulation using Davenport chained rotations

Abstract:End-to-end reinforcement learning techniques are among the most successful methods for robotic manipulation tasks. However, the training time required to find a good policy capable of solving complex tasks is prohibitively large. Therefore, depending on the computing resources available, it might not be feasible to use such techniques. The use of domain knowledge to decompose manipulation tasks into primitive skills, to be performed in sequence, could reduce the overall complexity of the learning problem, and hence reduce the amount of training required to achieve dexterity. In this paper, we propose the use of Davenport chained rotations to decompose complex 3D rotation goals into a concatenation of a smaller set of more simple rotation skills. State-of-the-art reinforcement-learning-based methods can then be trained using less overall simulated experience. We compare its performance with the popular Hindsight Experience Replay method, trained in an end-to-end fashion using the same amount of experience in a simulated robotic hand environment. Despite a general decrease in performance of the primitive skills when being sequentially executed, we find that decomposing arbitrary 3D rotations into elementary rotations is beneficial when computing resources are limited, obtaining increases of success rates of approximately 10% on the most complex 3D rotations with respect to the success rates obtained by HER trained in an end-to-end fashion, and increases of success rates between 20% and 40% on the most simple rotations.

* 11 pages, 3 figures, 2 tables, submitted to AICS 2022

Via

Access Paper or Ask Questions

Learning Decoupled Retrieval Representation for Nearest Neighbour Neural Machine Translation

Sep 20, 2022

Qiang Wang, Rongxiang Weng, Ming Chen

Figure 1 for Learning Decoupled Retrieval Representation for Nearest Neighbour Neural Machine Translation

Figure 2 for Learning Decoupled Retrieval Representation for Nearest Neighbour Neural Machine Translation

Figure 3 for Learning Decoupled Retrieval Representation for Nearest Neighbour Neural Machine Translation

Figure 4 for Learning Decoupled Retrieval Representation for Nearest Neighbour Neural Machine Translation

Abstract:K-Nearest Neighbor Neural Machine Translation (kNN-MT) successfully incorporates external corpus by retrieving word-level representations at test time. Generally, kNN-MT borrows the off-the-shelf context representation in the translation task, e.g., the output of the last decoder layer, as the query vector of the retrieval task. In this work, we highlight that coupling the representations of these two tasks is sub-optimal for fine-grained retrieval. To alleviate it, we leverage supervised contrastive learning to learn the distinctive retrieval representation derived from the original context representation. We also propose a fast and effective approach to constructing hard negative samples. Experimental results on five domains show that our approach improves the retrieval accuracy and BLEU score compared to vanilla kNN-MT.

* Accepted by COLING 2022

Via

Access Paper or Ask Questions

SphereDepth: Panorama Depth Estimation from Spherical Domain

Aug 30, 2022

Qingsong Yan, Qiang Wang, Kaiyong Zhao, Bo Li, Xiaowen Chu, Fei Deng

Figure 1 for SphereDepth: Panorama Depth Estimation from Spherical Domain

Figure 2 for SphereDepth: Panorama Depth Estimation from Spherical Domain

Figure 3 for SphereDepth: Panorama Depth Estimation from Spherical Domain

Figure 4 for SphereDepth: Panorama Depth Estimation from Spherical Domain

Abstract:The panorama image can simultaneously demonstrate complete information of the surrounding environment and has many advantages in virtual tourism, games, robotics, etc. However, the progress of panorama depth estimation cannot completely solve the problems of distortion and discontinuity caused by the commonly used projection methods. This paper proposes SphereDepth, a novel panorama depth estimation method that predicts the depth directly on the spherical mesh without projection preprocessing. The core idea is to establish the relationship between the panorama image and the spherical mesh and then use a deep neural network to extract features on the spherical domain to predict depth. To address the efficiency challenges brought by the high-resolution panorama data, we introduce two hyper-parameters for the proposed spherical mesh processing framework to balance the inference speed and accuracy. Validated on three public panorama datasets, SphereDepth achieves comparable results with the state-of-the-art methods of panorama depth estimation. Benefiting from the spherical domain setting, SphereDepth can generate a high-quality point cloud and significantly alleviate the issues of distortion and discontinuity.

Via

Access Paper or Ask Questions