Abstract:Accurate classification of second-trimester fetal ultrasound images remains challenging due to low image quality, high intra-class variability, and significant class imbalance. In this work, we introduce a simple yet powerful, biologically inspired deep learning ensemble framework that-unlike prior studies focused on only a handful of anatomical targets-simultaneously distinguishes 16 fetal structures. Drawing on the hierarchical, modular organization of biological vision systems, our model stacks two complementary branches (a "shallow" path for coarse, low-resolution cues and a "detailed" path for fine, high-resolution features), concatenating their outputs for final prediction. To our knowledge, no existing method has addressed such a large number of classes with a comparably lightweight architecture. We trained and evaluated on 5,298 routinely acquired clinical images (annotated by three experts and reconciled via Dawid-Skene), reflecting real-world noise and variability rather than a "cleaned" dataset. Despite this complexity, our ensemble (EfficientNet-B0 + EfficientNet-B6 with LDAM-Focal loss) identifies 90% of organs with accuracy > 0.75 and 75% of organs with accuracy > 0.85-performance competitive with more elaborate models applied to far fewer categories. These results demonstrate that biologically inspired modular stacking can yield robust, scalable fetal anatomy recognition in challenging clinical settings.