Handwritten character recognition is a crucial task because of its abundant applications. The recognition task of Bangla handwritten characters is especially challenging because of the cursive nature of Bangla characters and the presence of compound characters with more than one way of writing. In this paper, a classification model based on the ensembling of several Convolutional Neural Networks (CNN), namely, BanglaNet is proposed to classify Bangla basic characters, compound characters, numerals, and modifiers. Three different models based on the idea of state-of-the-art CNN models like Inception, ResNet, and DenseNet have been trained with both augmented and non-augmented inputs. Finally, all these models are averaged or ensembled to get the finishing model. Rigorous experimentation on three benchmark Bangla handwritten characters datasets, namely, CMATERdb, BanglaLekha-Isolated, and Ekush has exhibited significant recognition accuracies compared to some recent CNN-based research. The top-1 recognition accuracies obtained are 98.40%, 97.65%, and 97.32%, and the top-3 accuracies are 99.79%, 99.74%, and 99.56% for CMATERdb, BanglaLekha-Isolated, and Ekush datasets respectively.
In recent years, medical image segmentation has become an important application in the field of computer-aided diagnosis. In this paper, we are the first to propose a new graph convolution-based decoder namely, Cascaded Graph Convolutional Attention Decoder (G-CASCADE), for 2D medical image segmentation. G-CASCADE progressively refines multi-stage feature maps generated by hierarchical transformer encoders with an efficient graph convolution block. The encoder utilizes the self-attention mechanism to capture long-range dependencies, while the decoder refines the feature maps preserving long-range information due to the global receptive fields of the graph convolution block. Rigorous evaluations of our decoder with multiple transformer encoders on five medical image segmentation tasks (i.e., Abdomen organs, Cardiac organs, Polyp lesions, Skin lesions, and Retinal vessels) show that our model outperforms other state-of-the-art (SOTA) methods. We also demonstrate that our decoder achieves better DICE scores than the SOTA CASCADE decoder with 80.8% fewer parameters and 82.3% fewer FLOPs. Our decoder can easily be used with other hierarchical encoders for general-purpose semantic and medical image segmentation tasks.
Transformers have shown great success in medical image segmentation. However, transformers may exhibit a limited generalization ability due to the underlying single-scale self-attention (SA) mechanism. In this paper, we address this issue by introducing a Multi-scale hiERarchical vIsion Transformer (MERIT) backbone network, which improves the generalizability of the model by computing SA at multiple scales. We also incorporate an attention-based decoder, namely Cascaded Attention Decoding (CASCADE), for further refinement of multi-stage features generated by MERIT. Finally, we introduce an effective multi-stage feature mixing loss aggregation (MUTATION) method for better model training via implicit ensembling. Our experiments on two widely used medical image segmentation benchmarks (i.e., Synapse Multi-organ, ACDC) demonstrate the superior performance of MERIT over state-of-the-art methods. Our MERIT architecture and MUTATION loss aggregation can be used with downstream medical image and semantic segmentation tasks.