Alert button
Picture for Jun Cao

Jun Cao

Alert button

Augmentation-Free Graph Contrastive Learning of Invariant-Discriminative Representations

Oct 15, 2022
Haifeng Li, Jun Cao, Jiawei Zhu, Qinyao Luo, Silu He, Xuyin Wang

Figure 1 for Augmentation-Free Graph Contrastive Learning of Invariant-Discriminative Representations
Figure 2 for Augmentation-Free Graph Contrastive Learning of Invariant-Discriminative Representations
Figure 3 for Augmentation-Free Graph Contrastive Learning of Invariant-Discriminative Representations
Figure 4 for Augmentation-Free Graph Contrastive Learning of Invariant-Discriminative Representations

The pretasks are mainly built on mutual information estimation, which requires data augmentation to construct positive samples with similar semantics to learn invariant signals and negative samples with dissimilar semantics in order to empower representation discriminability. However, an appropriate data augmentation configuration depends heavily on lots of empirical trials such as choosing the compositions of data augmentation techniques and the corresponding hyperparameter settings. We propose an augmentation-free graph contrastive learning method, invariant-discriminative graph contrastive learning (iGCL), that does not intrinsically require negative samples. iGCL designs the invariant-discriminative loss (ID loss) to learn invariant and discriminative representations. On the one hand, ID loss learns invariant signals by directly minimizing the mean square error between the target samples and positive samples in the representation space. On the other hand, ID loss ensures that the representations are discriminative by an orthonormal constraint forcing the different dimensions of representations to be independent of each other. This prevents representations from collapsing to a point or subspace. Our theoretical analysis explains the effectiveness of ID loss from the perspectives of the redundancy reduction criterion, canonical correlation analysis, and information bottleneck principle. The experimental results demonstrate that iGCL outperforms all baselines on 5 node classification benchmark datasets. iGCL also shows superior performance for different label ratios and is capable of resisting graph attacks, which indicates that iGCL has excellent generalization and robustness.

* 11 pages 8 figs 
Viaarxiv icon

Zero-shot Domain Adaptation for Neural Machine Translation with Retrieved Phrase-level Prompts

Sep 23, 2022
Zewei Sun, Qingnan Jiang, Shujian Huang, Jun Cao, Shanbo Cheng, Mingxuan Wang

Figure 1 for Zero-shot Domain Adaptation for Neural Machine Translation with Retrieved Phrase-level Prompts
Figure 2 for Zero-shot Domain Adaptation for Neural Machine Translation with Retrieved Phrase-level Prompts
Figure 3 for Zero-shot Domain Adaptation for Neural Machine Translation with Retrieved Phrase-level Prompts
Figure 4 for Zero-shot Domain Adaptation for Neural Machine Translation with Retrieved Phrase-level Prompts

Domain adaptation is an important challenge for neural machine translation. However, the traditional fine-tuning solution requires multiple extra training and yields a high cost. In this paper, we propose a non-tuning paradigm, resolving domain adaptation with a prompt-based method. Specifically, we construct a bilingual phrase-level database and retrieve relevant pairs from it as a prompt for the input sentences. By utilizing Retrieved Phrase-level Prompts (RePP), we effectively boost the translation quality. Experiments show that our method improves domain-specific machine translation for 6.2 BLEU scores and improves translation constraints for 11.5% accuracy without additional training.

Viaarxiv icon

Battery and Hydrogen Energy Storage Control in a Smart Energy Network with Flexible Energy Demand using Deep Reinforcement Learning

Aug 26, 2022
Cephas Samende, Zhong Fan, Jun Cao

Figure 1 for Battery and Hydrogen Energy Storage Control in a Smart Energy Network with Flexible Energy Demand using Deep Reinforcement Learning
Figure 2 for Battery and Hydrogen Energy Storage Control in a Smart Energy Network with Flexible Energy Demand using Deep Reinforcement Learning
Figure 3 for Battery and Hydrogen Energy Storage Control in a Smart Energy Network with Flexible Energy Demand using Deep Reinforcement Learning
Figure 4 for Battery and Hydrogen Energy Storage Control in a Smart Energy Network with Flexible Energy Demand using Deep Reinforcement Learning

Smart energy networks provide for an effective means to accommodate high penetrations of variable renewable energy sources like solar and wind, which are key for deep decarbonisation of energy production. However, given the variability of the renewables as well as the energy demand, it is imperative to develop effective control and energy storage schemes to manage the variable energy generation and achieve desired system economics and environmental goals. In this paper, we introduce a hybrid energy storage system composed of battery and hydrogen energy storage to handle the uncertainties related to electricity prices, renewable energy production and consumption. We aim to improve renewable energy utilisation and minimise energy costs and carbon emissions while ensuring energy reliability and stability within the network. To achieve this, we propose a multi-agent deep deterministic policy gradient approach, which is a deep reinforcement learning-based control strategy to optimise the scheduling of the hybrid energy storage system and energy demand in real-time. The proposed approach is model-free and does not require explicit knowledge and rigorous mathematical models of the smart energy network environment. Simulation results based on real-world data show that: (i) integration and optimised operation of the hybrid energy storage system and energy demand reduces carbon emissions by 78.69%, improves cost savings by 23.5% and renewable energy utilisation by over 13.2% compared to other baseline models and (ii) the proposed algorithm outperforms the state-of-the-art self-learning algorithms like deep-Q network.

* 13 pages, 10 figures 
Viaarxiv icon

GigaST: A 10,000-hour Pseudo Speech Translation Corpus

Apr 08, 2022
Rong Ye, Chengqi Zhao, Tom Ko, Chutong Meng, Tao Wang, Mingxuan Wang, Jun Cao

Figure 1 for GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Figure 2 for GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Figure 3 for GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Figure 4 for GigaST: A 10,000-hour Pseudo Speech Translation Corpus

This paper introduces GigaST, a large-scale pseudo speech translation (ST) corpus. We create the corpus by translating the text in GigaSpeech, an English ASR corpus, into German and Chinese. The training set is translated by a strong machine translation system and the test set is translated by human. ST models trained with an addition of our corpus obtain new state-of-the-art results on the MuST-C English-German benchmark test set. We provide a detailed description of the translation process and verify its quality. We make the translated text data public and hope to facilitate research in speech translation. Additionally, we also release the training scripts on NeurST to make it easy to replicate our systems. GigaST dataset is available at https://st-benchmark.github.io/resources/GigaST.

* Submitted to Interspeech 2022. GigaST dataset is available at https://st-benchmark.github.io/resources/GigaST 
Viaarxiv icon

Renewable energy integration and microgrid energy trading using multi-agent deep reinforcement learning

Dec 05, 2021
Daniel J. B. Harrold, Jun Cao, Zhong Fan

Figure 1 for Renewable energy integration and microgrid energy trading using multi-agent deep reinforcement learning
Figure 2 for Renewable energy integration and microgrid energy trading using multi-agent deep reinforcement learning
Figure 3 for Renewable energy integration and microgrid energy trading using multi-agent deep reinforcement learning
Figure 4 for Renewable energy integration and microgrid energy trading using multi-agent deep reinforcement learning

In this paper, multi-agent reinforcement learning is used to control a hybrid energy storage system working collaboratively to reduce the energy costs of a microgrid through maximising the value of renewable energy and trading. The agents must learn to control three different types of energy storage system suited for short, medium, and long-term storage under fluctuating demand, dynamic wholesale energy prices, and unpredictable renewable energy generation. Two case studies are considered: the first looking at how the energy storage systems can better integrate renewable energy generation under dynamic pricing, and the second with how those same agents can be used alongside an aggregator agent to sell energy to self-interested external microgrids looking to reduce their own energy bills. This work found that the centralised learning with decentralised execution of the multi-agent deep deterministic policy gradient and its state-of-the-art variants allowed the multi-agent methods to perform significantly better than the control from a single global agent. It was also found that using separate reward functions in the multi-agent approach performed much better than using a single control agent. Being able to trade with the other microgrids, rather than just selling back to the utility grid, also was found to greatly increase the grid's savings.

Viaarxiv icon

Learning Kernel-Smoothed Machine Translation with Retrieved Examples

Sep 21, 2021
Qingnan Jiang, Mingxuan Wang, Jun Cao, Shanbo Cheng, Shujian Huang, Lei Li

Figure 1 for Learning Kernel-Smoothed Machine Translation with Retrieved Examples
Figure 2 for Learning Kernel-Smoothed Machine Translation with Retrieved Examples
Figure 3 for Learning Kernel-Smoothed Machine Translation with Retrieved Examples
Figure 4 for Learning Kernel-Smoothed Machine Translation with Retrieved Examples

How to effectively adapt neural machine translation (NMT) models according to emerging cases without retraining? Despite the great success of neural machine translation, updating the deployed models online remains a challenge. Existing non-parametric approaches that retrieve similar examples from a database to guide the translation process are promising but are prone to overfit the retrieved examples. However, non-parametric methods are prone to overfit the retrieved examples. In this work, we propose to learn Kernel-Smoothed Translation with Example Retrieval (KSTER), an effective approach to adapt neural machine translation models online. Experiments on domain adaptation and multi-domain machine translation datasets show that even without expensive retraining, KSTER is able to achieve improvement of 1.1 to 1.5 BLEU scores over the best existing online adaptation methods. The code and trained models are released at https://github.com/jiangqn/KSTER.

* EMNLP 2021 
Viaarxiv icon

Curvature Graph Neural Network

Jun 30, 2021
Haifeng Li, Jun Cao, Jiawei Zhu, Yu Liu, Qing Zhu, Guohua Wu

Figure 1 for Curvature Graph Neural Network
Figure 2 for Curvature Graph Neural Network
Figure 3 for Curvature Graph Neural Network
Figure 4 for Curvature Graph Neural Network

Graph neural networks (GNNs) have achieved great success in many graph-based tasks. Much work is dedicated to empowering GNNs with the adaptive locality ability, which enables measuring the importance of neighboring nodes to the target node by a node-specific mechanism. However, the current node-specific mechanisms are deficient in distinguishing the importance of nodes in the topology structure. We believe that the structural importance of neighboring nodes is closely related to their importance in aggregation. In this paper, we introduce discrete graph curvature (the Ricci curvature) to quantify the strength of structural connection of pairwise nodes. And we propose Curvature Graph Neural Network (CGNN), which effectively improves the adaptive locality ability of GNNs by leveraging the structural property of graph curvature. To improve the adaptability of curvature to various datasets, we explicitly transform curvature into the weights of neighboring nodes by the necessary Negative Curvature Processing Module and Curvature Normalization Module. Then, we conduct numerous experiments on various synthetic datasets and real-world datasets. The experimental results on synthetic datasets show that CGNN effectively exploits the topology structure information, and the performance is improved significantly. CGNN outperforms the baselines on 5 dense node classification benchmark datasets. This study deepens the understanding of how to utilize advanced topology information and assign the importance of neighboring nodes from the perspective of graph curvature and encourages us to bridge the gap between graph theory and neural networks.

* 16 Pages, 9 figures, 4 tables 
Viaarxiv icon

Data-driven battery operation for energy arbitrage using rainbow deep reinforcement learning

Jun 10, 2021
Daniel J. B. Harrold, Jun Cao, Zhong Fan

Figure 1 for Data-driven battery operation for energy arbitrage using rainbow deep reinforcement learning
Figure 2 for Data-driven battery operation for energy arbitrage using rainbow deep reinforcement learning
Figure 3 for Data-driven battery operation for energy arbitrage using rainbow deep reinforcement learning
Figure 4 for Data-driven battery operation for energy arbitrage using rainbow deep reinforcement learning

As the world seeks to become more sustainable, intelligent solutions are needed to increase the penetration of renewable energy. In this paper, the model-free deep reinforcement learning algorithm Rainbow Deep Q-Networks is used to control a battery in a small microgrid to perform energy arbitrage and more efficiently utilise solar and wind energy sources. The grid operates with its own demand and renewable generation based on a dataset collected at Keele University, as well as using dynamic energy pricing from a real wholesale energy market. Four scenarios are tested including using demand and price forecasting produced with local weather data. The algorithm and its subcomponents are evaluated against two continuous control benchmarks with Rainbow able to outperform all other method. This research shows the importance of using the distributional approach for reinforcement learning when working with complex environments and reward functions, as well as how it can be used to visualise and contextualise the agent's behaviour for real-world applications.

* 13 pages, 9 figures (17 counting each subfigure) 
Viaarxiv icon

The Volctrans Neural Speech Translation System for IWSLT 2021

May 16, 2021
Chengqi Zhao, Zhicheng Liu, Jian Tong, Tao Wang, Mingxuan Wang, Rong Ye, Qianqian Dong, Jun Cao, Lei Li

Figure 1 for The Volctrans Neural Speech Translation System for IWSLT 2021
Figure 2 for The Volctrans Neural Speech Translation System for IWSLT 2021
Figure 3 for The Volctrans Neural Speech Translation System for IWSLT 2021
Figure 4 for The Volctrans Neural Speech Translation System for IWSLT 2021

This paper describes the systems submitted to IWSLT 2021 by the Volctrans team. We participate in the offline speech translation and text-to-text simultaneous translation tracks. For offline speech translation, our best end-to-end model achieves 8.1 BLEU improvements over the benchmark on the MuST-C test set and is even approaching the results of a strong cascade solution. For text-to-text simultaneous translation, we explore the best practice to optimize the wait-k model. As a result, our final submitted systems exceed the benchmark at around 7 BLEU on the same latency regime. We will publish our code and model to facilitate both future research works and industrial applications.

Viaarxiv icon