Pipeline parallelism is an essential technique in the training of large-scale Transformer models. However, it suffers from imbalanced memory consumption, leading to insufficient memory utilization. The BPipe technique was proposed to address this issue and has proven effective in the GPT-3 model. Nevertheless, our experiments have not yielded similar benefits for LLaMA training. Additionally, BPipe only yields negligible benefits for GPT-3 training when applying flash attention. We analyze the underlying causes of the divergent performance of BPipe on GPT-3 and LLaMA. Furthermore, we introduce a novel method to estimate the performance of BPipe.
The shapes and morphology of the organs and tissues are important prior knowledge in medical imaging recognition and segmentation. The morphological operation is a well-known method for morphological feature extraction. As the morphological operation is performed well in hand-crafted image segmentation techniques, it is also promising to design an approach to approximate morphological operation in the convolutional networks. However, using the traditional convolutional neural network as a black-box is usually hard to specify the morphological operation action. Here, we introduced a 3D morphological operation residual block to extract morphological features in end-to-end deep learning models for semantic segmentation. This study proposed a novel network block architecture that embedded the morphological operation as an infinitely strong prior in the convolutional neural network. Several 3D deep learning models with the proposed morphological operation block were built and compared in different medical imaging segmentation tasks. Experimental results showed the proposed network achieved a relatively higher performance in the segmentation tasks comparing with the conventional approach. In conclusion, the novel network block could be easily embedded in traditional networks and efficiently reinforce the deep learning models for medical imaging segmentation.
Anomaly detection is facing with emerging challenges in many important industry domains, such as cyber security and online recommendation and advertising. The recent trend in these areas calls for anomaly detection on time-evolving data with high-dimensional categorical features without labeled samples. Also, there is an increasing demand for identifying and monitoring irregular patterns at multiple resolutions. In this work, we propose a unified end-to-end approach to solve these challenges by combining the advantages of Adversarial Autoencoder and Recurrent Neural Network. The model learns data representations cross different scales with attention mechanisms, on which an enhanced two-resolution anomaly detector is developed for both instances and data blocks. Extensive experiments are performed over three types of datasets to demonstrate the efficacy of our method and its superiority over the state-of-art approaches.