Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hei Yi Mak

Low-Resource NMT: A Case Study on the Written and Spoken Languages in Hong Kong

May 23, 2025

Hei Yi Mak, Tan Lee

Abstract:The majority of inhabitants in Hong Kong are able to read and write in standard Chinese but use Cantonese as the primary spoken language in daily life. Spoken Cantonese can be transcribed into Chinese characters, which constitute the so-called written Cantonese. Written Cantonese exhibits significant lexical and grammatical differences from standard written Chinese. The rise of written Cantonese is increasingly evident in the cyber world. The growing interaction between Mandarin speakers and Cantonese speakers is leading to a clear demand for automatic translation between Chinese and Cantonese. This paper describes a transformer-based neural machine translation (NMT) system for written-Chinese-to-written-Cantonese translation. Given that parallel text data of Chinese and Cantonese are extremely scarce, a major focus of this study is on the effort of preparing good amount of training data for NMT. In addition to collecting 28K parallel sentences from previous linguistic studies and scattered internet resources, we devise an effective approach to obtaining 72K parallel sentences by automatically extracting pairs of semantically similar sentences from parallel articles on Chinese Wikipedia and Cantonese Wikipedia. We show that leveraging highly similar sentence pairs mined from Wikipedia improves translation performance in all test sets. Our system outperforms Baidu Fanyi's Chinese-to-Cantonese translation on 6 out of 8 test sets in BLEU scores. Translation examples reveal that our system is able to capture important linguistic transformations between standard Chinese and spoken Cantonese.

* Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval

Via

Access Paper or Ask Questions

CAESAR: Enhancing Federated RL in Heterogeneous MDPs through Convergence-Aware Sampling with Screening

Mar 29, 2024

Hei Yi Mak, Flint Xiaofeng Fan, Luca A. Lanzendörfer, Cheston Tan, Wei Tsang Ooi, Roger Wattenhofer

Figure 1 for CAESAR: Enhancing Federated RL in Heterogeneous MDPs through Convergence-Aware Sampling with Screening

Figure 2 for CAESAR: Enhancing Federated RL in Heterogeneous MDPs through Convergence-Aware Sampling with Screening

Figure 3 for CAESAR: Enhancing Federated RL in Heterogeneous MDPs through Convergence-Aware Sampling with Screening

Figure 4 for CAESAR: Enhancing Federated RL in Heterogeneous MDPs through Convergence-Aware Sampling with Screening

Abstract:In this study, we delve into Federated Reinforcement Learning (FedRL) in the context of value-based agents operating across diverse Markov Decision Processes (MDPs). Existing FedRL methods typically aggregate agents' learning by averaging the value functions across them to improve their performance. However, this aggregation strategy is suboptimal in heterogeneous environments where agents converge to diverse optimal value functions. To address this problem, we introduce the Convergence-AwarE SAmpling with scReening (CAESAR) aggregation scheme designed to enhance the learning of individual agents across varied MDPs. CAESAR is an aggregation strategy used by the server that combines convergence-aware sampling with a screening mechanism. By exploiting the fact that agents learning in identical MDPs are converging to the same optimal value function, CAESAR enables the selective assimilation of knowledge from more proficient counterparts, thereby significantly enhancing the overall learning efficiency. We empirically validate our hypothesis and demonstrate the effectiveness of CAESAR in enhancing the learning efficiency of agents, using both a custom-built GridWorld environment and the classical FrozenLake-v1 task, each presenting varying levels of environmental heterogeneity.

Via

Access Paper or Ask Questions