The utilization of artificial intelligence (AI) in card games has been a well-explored subject within AI research for an extensive period. Recent advancements have propelled AI programs to showcase expertise in intricate card games such as Mahjong, DouDizhu, and Texas Hold'em. In this work, we aim to develop an AI program for an exceptionally complex and popular card game called GuanDan. This game involves four players engaging in both competitive and cooperative play throughout a long process to upgrade their level, posing great challenges for AI due to its expansive state and action space, long episode length, and complex rules. Employing reinforcement learning techniques, specifically Deep Monte Carlo (DMC), and a distributed training framework, we first put forward an AI program named DanZero for this game. Evaluation against baseline AI programs based on heuristic rules highlights the outstanding performance of our bot. Besides, in order to further enhance the AI's capabilities, we apply policy-based reinforcement learning algorithm to GuanDan. To address the challenges arising from the huge action space, which will significantly impact the performance of policy-based algorithms, we adopt the pre-trained model to facilitate the training process and the achieved AI program manages to achieve a superior performance.
* arXiv admin note: text overlap with arXiv:2210.17087
Card game AI has always been a hot topic in the research of artificial intelligence. In recent years, complex card games such as Mahjong, DouDizhu and Texas Hold'em have been solved and the corresponding AI programs have reached the level of human experts. In this paper, we are devoted to developing an AI program for a more complex card game, GuanDan, whose rules are similar to DouDizhu but much more complicated. To be specific, the characteristics of large state and action space, long length of one episode and the unsure number of players in the GuanDan pose great challenges for the development of the AI program. To address these issues, we propose the first AI program DanZero for GuanDan using reinforcement learning technique. Specifically, we utilize a distributed framework to train our AI system. In the actor processes, we carefully design the state features and agents generate samples by self-play. In the learner process, the model is updated by Deep Monte-Carlo Method. After training for 30 days using 160 CPUs and 1 GPU, we get our DanZero bot. We compare it with 8 baseline AI programs which are based on heuristic rules and the results reveal the outstanding performance of DanZero. We also test DanZero with human players and demonstrate its human-level performance.
Recent advances in vision transformers (ViTs) have achieved great performance in visual recognition tasks. Convolutional neural networks (CNNs) exploit spatial inductive bias to learn visual representations, but these networks are spatially local. ViTs can learn global representations with their self-attention mechanism, but they are usually heavy-weight and unsuitable for mobile devices. In this paper, we propose cross feature attention (XFA) to bring down computation cost for transformers, and combine efficient mobile CNNs to form a novel efficient light-weight CNN-ViT hybrid model, XFormer, which can serve as a general-purpose backbone to learn both global and local representation. Experimental results show that XFormer outperforms numerous CNN and ViT-based models across different tasks and datasets. On ImageNet1K dataset, XFormer achieves top-1 accuracy of 78.5% with 5.5 million parameters, which is 2.2% and 6.3% more accurate than EfficientNet-B0 (CNN-based) and DeiT (ViT-based) for similar number of parameters. Our model also performs well when transferring to object detection and semantic segmentation tasks. On MS COCO dataset, XFormer exceeds MobileNetV2 by 10.5 AP (22.7 -> 33.2 AP) in YOLOv3 framework with only 6.3M parameters and 3.8G FLOPs. On Cityscapes dataset, with only a simple all-MLP decoder, XFormer achieves mIoU of 78.5 and FPS of 15.3, surpassing state-of-the-art lightweight segmentation networks.
Morphology of mitochondria plays critical roles in mediating their physiological functions. Accurate segmentation of mitochondria from 3D electron microscopy (EM) images is essential to quantitative characterization of their morphology at the nanometer scale. Fully supervised deep learning models developed for this task achieve excellent performance but require substantial amounts of annotated data for training. However, manual annotation of EM images is laborious and time-consuming because of their large volumes, limited contrast, and low signal-to-noise ratios (SNRs). To overcome this challenge, we propose a semi-supervised deep learning model that segments mitochondria by leveraging the spatial continuity of their structural, morphological, and contextual information in both labeled and unlabeled images. We use random piecewise affine transformation to synthesize comprehensive and realistic mitochondrial morphology for augmentation of training data. Experiments on the EPFL dataset show that our model achieves performance similar as that of state-of-the-art fully supervised models but requires only ~20% of their annotated training data. Our semi-supervised model is versatile and can also accurately segment other spatially continuous structures from EM images. Data and code of this study are openly accessible at https://github.com/cbmi-group/MPP.
* 2022 IEEE 19th International Symposium on Biomedical Imaging
(ISBI). IEEE, 2022: 1-5 * 4 pages of main text, 5 pages of supplementary material and 1 page of
Recent years have witnessed the great breakthrough of deep reinforcement learning (DRL) in various perfect and imperfect information games. Among these games, DouDizhu, a popular card game in China, is very challenging due to the imperfect information, large state space, elements of collaboration and a massive number of possible moves from turn to turn. Recently, a DouDizhu AI system called DouZero has been proposed. Trained using traditional Monte Carlo method with deep neural networks and self-play procedure without the abstraction of human prior knowledge, DouZero has outperformed all the existing DouDizhu AI programs. In this work, we propose to enhance DouZero by introducing opponent modeling into DouZero. Besides, we propose a novel coach network to further boost the performance of DouZero and accelerate its training process. With the integration of the above two techniques into DouZero, our DouDizhu AI system achieves better performance and ranks top in the Botzone leaderboard among more than 400 AI agents, including DouZero.
Multi-agent reinforcement learning is difficult to be applied in practice, which is partially due to the gap between the simulated and real-world scenarios. One reason for the gap is that the simulated systems always assume that the agents can work normally all the time, while in practice, one or more agents may unexpectedly "crash" during the coordination process due to inevitable hardware or software failures. Such crashes will destroy the cooperation among agents, leading to performance degradation. In this work, we present a formal formulation of a cooperative multi-agent reinforcement learning system with unexpected crashes. To enhance the robustness of the system to crashes, we propose a coach-assisted multi-agent reinforcement learning framework, which introduces a virtual coach agent to adjust the crash rate during training. We design three coaching strategies and the re-sampling strategy for our coach agent. To the best of our knowledge, this work is the first to study the unexpected crashes in the multi-agent system. Extensive experiments on grid-world and StarCraft II micromanagement tasks demonstrate the efficacy of adaptive strategy compared with the fixed crash rate strategy and curriculum learning strategy. The ablation study further illustrates the effectiveness of our re-sampling strategy.