Alert button
Picture for Wei Pan

Wei Pan

Alert button

Few shot font generation via transferring similarity guided global style and quantization local style

Sep 14, 2023
Wei Pan, Anna Zhu, Xinyu Zhou, Brian Kenji Iwana, Shilin Li

Figure 1 for Few shot font generation via transferring similarity guided global style and quantization local style
Figure 2 for Few shot font generation via transferring similarity guided global style and quantization local style
Figure 3 for Few shot font generation via transferring similarity guided global style and quantization local style
Figure 4 for Few shot font generation via transferring similarity guided global style and quantization local style

Automatic few-shot font generation (AFFG), aiming at generating new fonts with only a few glyph references, reduces the labor cost of manually designing fonts. However, the traditional AFFG paradigm of style-content disentanglement cannot capture the diverse local details of different fonts. So, many component-based approaches are proposed to tackle this problem. The issue with component-based approaches is that they usually require special pre-defined glyph components, e.g., strokes and radicals, which is infeasible for AFFG of different languages. In this paper, we present a novel font generation approach by aggregating styles from character similarity-guided global features and stylized component-level representations. We calculate the similarity scores of the target character and the referenced samples by measuring the distance along the corresponding channels from the content features, and assigning them as the weights for aggregating the global style features. To better capture the local styles, a cross-attention-based style transfer module is adopted to transfer the styles of reference glyphs to the components, where the components are self-learned discrete latent codes through vector quantization without manual definition. With these designs, our AFFG method could obtain a complete set of component-level style representations, and also control the global glyph characteristics. The experimental results reflect the effectiveness and generalization of the proposed method on different linguistic scripts, and also show its superiority when compared with other state-of-the-art methods. The source code can be found at https://github.com/awei669/VQ-Font.

* Accepted by ICCV 2023 
Viaarxiv icon

Cross-Utterance Conditioned VAE for Speech Generation

Sep 08, 2023
Yang Li, Cheng Yu, Guangzhi Sun, Weiqin Zu, Zheng Tian, Ying Wen, Wei Pan, Chao Zhang, Jun Wang, Yang Yang, Fanglei Sun

Figure 1 for Cross-Utterance Conditioned VAE for Speech Generation
Figure 2 for Cross-Utterance Conditioned VAE for Speech Generation
Figure 3 for Cross-Utterance Conditioned VAE for Speech Generation
Figure 4 for Cross-Utterance Conditioned VAE for Speech Generation

Speech synthesis systems powered by neural networks hold promise for multimedia production, but frequently face issues with producing expressive speech and seamless editing. In response, we present the Cross-Utterance Conditioned Variational Autoencoder speech synthesis (CUC-VAE S2) framework to enhance prosody and ensure natural speech generation. This framework leverages the powerful representational capabilities of pre-trained language models and the re-expression abilities of variational autoencoders (VAEs). The core component of the CUC-VAE S2 framework is the cross-utterance CVAE, which extracts acoustic, speaker, and textual features from surrounding sentences to generate context-sensitive prosodic features, more accurately emulating human prosody generation. We further propose two practical algorithms tailored for distinct speech synthesis applications: CUC-VAE TTS for text-to-speech and CUC-VAE SE for speech editing. The CUC-VAE TTS is a direct application of the framework, designed to generate audio with contextual prosody derived from surrounding texts. On the other hand, the CUC-VAE SE algorithm leverages real mel spectrogram sampling conditioned on contextual information, producing audio that closely mirrors real sound and thereby facilitating flexible speech editing based on text such as deletion, insertion, and replacement. Experimental results on the LibriTTS datasets demonstrate that our proposed models significantly enhance speech synthesis and editing, producing more natural and expressive speech.

* 13 pages; 
Viaarxiv icon

JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games

Aug 09, 2023
Yang Li, Kun Xiong, Yingping Zhang, Jiangcheng Zhu, Stephen Mcaleer, Wei Pan, Jun Wang, Zonghong Dai, Yaodong Yang

Figure 1 for JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games
Figure 2 for JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games
Figure 3 for JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games
Figure 4 for JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games

This paper presents an empirical exploration of non-transitivity in perfect-information games, specifically focusing on Xiangqi, a traditional Chinese board game comparable in game-tree complexity to chess and shogi. By analyzing over 10,000 records of human Xiangqi play, we highlight the existence of both transitive and non-transitive elements within the game's strategic structure. To address non-transitivity, we introduce the JiangJun algorithm, an innovative combination of Monte-Carlo Tree Search (MCTS) and Policy Space Response Oracles (PSRO) designed to approximate a Nash equilibrium. We evaluate the algorithm empirically using a WeChat mini program and achieve a Master level with a 99.41\% win rate against human players. The algorithm's effectiveness in overcoming non-transitivity is confirmed by a plethora of metrics, such as relative population performance and visualization results. Our project site is available at \url{https://sites.google.com/view/jiangjun-site/}.

* 28 pages, accepted by Transactions on Machine Learning Research (TMLR) 
Viaarxiv icon

Tackling Cooperative Incompatibility for Zero-Shot Human-AI Coordination

Jun 05, 2023
Yang Li, Shao Zhang, Jichen Sun, Wenhao Zhang, Yali Du, Ying Wen, Xinbing Wang, Wei Pan

Figure 1 for Tackling Cooperative Incompatibility for Zero-Shot Human-AI Coordination
Figure 2 for Tackling Cooperative Incompatibility for Zero-Shot Human-AI Coordination
Figure 3 for Tackling Cooperative Incompatibility for Zero-Shot Human-AI Coordination
Figure 4 for Tackling Cooperative Incompatibility for Zero-Shot Human-AI Coordination

Achieving coordination between humans and artificial intelligence in scenarios involving previously unencountered humans remains a substantial obstacle within Zero-Shot Human-AI Coordination, which aims to develop AI agents capable of efficiently working alongside previously unknown human teammates. Traditional algorithms have aimed to collaborate with humans by optimizing fixed objectives within a population, fostering diversity in strategies and behaviors. However, these techniques may lead to learning loss and an inability to cooperate with specific strategies within the population, a phenomenon named cooperative incompatibility. To mitigate this issue, we introduce the Cooperative Open-ended LEarning (COLE) framework, which formulates open-ended objectives in cooperative games with two players using perspectives of graph theory to evaluate and pinpoint the cooperative capacity of each strategy. We put forth a practical algorithm incorporating insights from game theory and graph theory, e.g., Shapley Value and Centrality. We also show that COLE could effectively overcome the cooperative incompatibility from theoretical and empirical analysis. Subsequently, we created an online Overcooked human-AI experiment platform, the COLE platform, which enables easy customization of questionnaires, model weights, and other aspects. Utilizing the COLE platform, we enlist 130 participants for human experiments. Our findings reveal a preference for our approach over state-of-the-art methods using a variety of subjective metrics. Moreover, objective experimental outcomes in the Overcooked game environment indicate that our method surpasses existing ones when coordinating with previously unencountered AI agents and the human proxy model. Our code and demo are publicly available at https://sites.google.com/view/cole-2023.

* arXiv admin note: substantial text overlap with arXiv:2302.04831 
Viaarxiv icon

Reinforcement Learning for Safe Robot Control using Control Lyapunov Barrier Functions

May 16, 2023
Desong Du, Shaohang Han, Naiming Qi, Haitham Bou Ammar, Jun Wang, Wei Pan

Figure 1 for Reinforcement Learning for Safe Robot Control using Control Lyapunov Barrier Functions
Figure 2 for Reinforcement Learning for Safe Robot Control using Control Lyapunov Barrier Functions
Figure 3 for Reinforcement Learning for Safe Robot Control using Control Lyapunov Barrier Functions
Figure 4 for Reinforcement Learning for Safe Robot Control using Control Lyapunov Barrier Functions

Reinforcement learning (RL) exhibits impressive performance when managing complicated control tasks for robots. However, its wide application to physical robots is limited by the absence of strong safety guarantees. To overcome this challenge, this paper explores the control Lyapunov barrier function (CLBF) to analyze the safety and reachability solely based on data without explicitly employing a dynamic model. We also proposed the Lyapunov barrier actor-critic (LBAC), a model-free RL algorithm, to search for a controller that satisfies the data-based approximation of the safety and reachability conditions. The proposed approach is demonstrated through simulation and real-world robot control experiments, i.e., a 2D quadrotor navigation task. The experimental findings reveal this approach's effectiveness in reachability and safety, surpassing other model-free RL methods.

Viaarxiv icon

Weighted Point Cloud Normal Estimation

May 06, 2023
Weijia Wang, Xuequan Lu, Di Shao, Xiao Liu, Richard Dazeley, Antonio Robles-Kelly, Wei Pan

Figure 1 for Weighted Point Cloud Normal Estimation
Figure 2 for Weighted Point Cloud Normal Estimation
Figure 3 for Weighted Point Cloud Normal Estimation
Figure 4 for Weighted Point Cloud Normal Estimation

Existing normal estimation methods for point clouds are often less robust to severe noise and complex geometric structures. Also, they usually ignore the contributions of different neighbouring points during normal estimation, which leads to less accurate results. In this paper, we introduce a weighted normal estimation method for 3D point cloud data. We innovate in two key points: 1) we develop a novel weighted normal regression technique that predicts point-wise weights from local point patches and use them for robust, feature-preserving normal regression; 2) we propose to conduct contrastive learning between point patches and the corresponding ground-truth normals of the patches' central points as a pre-training process to facilitate normal regression. Comprehensive experiments demonstrate that our method can robustly handle noisy and complex point clouds, achieving state-of-the-art performance on both synthetic and real-world datasets.

* Accepted by ICME 2023 
Viaarxiv icon

Sim-and-Real Reinforcement Learning for Manipulation: A Consensus-based Approach

Feb 26, 2023
Wenxing Liu, Hanlin Niu, Wei Pan, Guido Herrmann, Joaquin Carrasco

Figure 1 for Sim-and-Real Reinforcement Learning for Manipulation: A Consensus-based Approach
Figure 2 for Sim-and-Real Reinforcement Learning for Manipulation: A Consensus-based Approach
Figure 3 for Sim-and-Real Reinforcement Learning for Manipulation: A Consensus-based Approach
Figure 4 for Sim-and-Real Reinforcement Learning for Manipulation: A Consensus-based Approach

Sim-and-real training is a promising alternative to sim-to-real training for robot manipulations. However, the current sim-and-real training is neither efficient, i.e., slow convergence to the optimal policy, nor effective, i.e., sizeable real-world robot data. Given limited time and hardware budgets, the performance of sim-and-real training is not satisfactory. In this paper, we propose a Consensus-based Sim-And-Real deep reinforcement learning algorithm (CSAR) for manipulator pick-and-place tasks, which shows comparable performance in both sim-and-real worlds. In this algorithm, we train the agents in simulators and the real world to get the optimal policies for both sim-and-real worlds. We found two interesting phenomenons: (1) Best policy in simulation is not the best for sim-and-real training. (2) The more simulation agents, the better sim-and-real training. The experimental video is available at: https://youtu.be/mcHJtNIsTEQ.

* 7 pages, 8 figures, IEEE International Conference on Robotics and Automation (ICRA) 2023 
Viaarxiv icon

Cooperative Open-ended Learning Framework for Zero-shot Coordination

Feb 09, 2023
Yang Li, Shao Zhang, Jichen Sun, Yali Du, Ying Wen, Xinbing Wang, Wei Pan

Figure 1 for Cooperative Open-ended Learning Framework for Zero-shot Coordination
Figure 2 for Cooperative Open-ended Learning Framework for Zero-shot Coordination
Figure 3 for Cooperative Open-ended Learning Framework for Zero-shot Coordination
Figure 4 for Cooperative Open-ended Learning Framework for Zero-shot Coordination

Zero-shot coordination in cooperative artificial intelligence (AI) remains a significant challenge, which means effectively coordinating with a wide range of unseen partners. Previous algorithms have attempted to address this challenge by optimizing fixed objectives within a population to improve strategy or behavior diversity. However, these approaches can result in a loss of learning and an inability to cooperate with certain strategies within the population, known as cooperative incompatibility. To address this issue, we propose the Cooperative Open-ended LEarning (COLE) framework, which constructs open-ended objectives in cooperative games with two players from the perspective of graph theory to assess and identify the cooperative ability of each strategy. We further specify the framework and propose a practical algorithm that leverages knowledge from game theory and graph theory. Furthermore, an analysis of the learning process of the algorithm shows that it can efficiently overcome cooperative incompatibility. The experimental results in the Overcooked game environment demonstrate that our method outperforms current state-of-the-art methods when coordinating with different-level partners. Our code and demo are available at https://sites.google.com/view/cole-2023.

Viaarxiv icon

Data-Adaptive Discriminative Feature Localization with Statistically Guaranteed Interpretation

Nov 18, 2022
Ben Dai, Xiaotong Shen, Lin Yee Chen, Chunlin Li, Wei Pan

Figure 1 for Data-Adaptive Discriminative Feature Localization with Statistically Guaranteed Interpretation
Figure 2 for Data-Adaptive Discriminative Feature Localization with Statistically Guaranteed Interpretation
Figure 3 for Data-Adaptive Discriminative Feature Localization with Statistically Guaranteed Interpretation
Figure 4 for Data-Adaptive Discriminative Feature Localization with Statistically Guaranteed Interpretation

In explainable artificial intelligence, discriminative feature localization is critical to reveal a blackbox model's decision-making process from raw data to prediction. In this article, we use two real datasets, the MNIST handwritten digits and MIT-BIH Electrocardiogram (ECG) signals, to motivate key characteristics of discriminative features, namely adaptiveness, predictive importance and effectiveness. Then, we develop a localization framework based on adversarial attacks to effectively localize discriminative features. In contrast to existing heuristic methods, we also provide a statistically guaranteed interpretability of the localized features by measuring a generalized partial $R^2$. We apply the proposed method to the MNIST dataset and the MIT-BIH dataset with a convolutional auto-encoder. In the first, the compact image regions localized by the proposed method are visually appealing. Similarly, in the second, the identified ECG features are biologically plausible and consistent with cardiac electrophysiological principles while locating subtle anomalies in a QRS complex that may not be discernible by the naked eye. Overall, the proposed method compares favorably with state-of-the-art competitors. Accompanying this paper is a Python library dnn-locate (https://dnn-locate.readthedocs.io/en/latest/) that implements the proposed approach.

* The Annals of Applied Statistics, 2022  
* 27 pages, 11 figures 
Viaarxiv icon