Abstract:Accurate classification of second-trimester fetal ultrasound images remains challenging due to low image quality, high intra-class variability, and significant class imbalance. In this work, we introduce a simple yet powerful, biologically inspired deep learning ensemble framework that-unlike prior studies focused on only a handful of anatomical targets-simultaneously distinguishes 16 fetal structures. Drawing on the hierarchical, modular organization of biological vision systems, our model stacks two complementary branches (a "shallow" path for coarse, low-resolution cues and a "detailed" path for fine, high-resolution features), concatenating their outputs for final prediction. To our knowledge, no existing method has addressed such a large number of classes with a comparably lightweight architecture. We trained and evaluated on 5,298 routinely acquired clinical images (annotated by three experts and reconciled via Dawid-Skene), reflecting real-world noise and variability rather than a "cleaned" dataset. Despite this complexity, our ensemble (EfficientNet-B0 + EfficientNet-B6 with LDAM-Focal loss) identifies 90% of organs with accuracy > 0.75 and 75% of organs with accuracy > 0.85-performance competitive with more elaborate models applied to far fewer categories. These results demonstrate that biologically inspired modular stacking can yield robust, scalable fetal anatomy recognition in challenging clinical settings.
Abstract:This paper introduces CognitiveDog, a pioneering development of quadruped robot with Large Multi-modal Model (LMM) that is capable of not only communicating with humans verbally but also physically interacting with the environment through object manipulation. The system was realized on Unitree Go1 robot-dog equipped with a custom gripper and demonstrated autonomous decision-making capabilities, independently determining the most appropriate actions and interactions with various objects to fulfill user-defined tasks. These tasks do not necessarily include direct instructions, challenging the robot to comprehend and execute them based on natural language input and environmental cues. The paper delves into the intricacies of this system, dataset characteristics, and the software architecture. Key to this development is the robot's proficiency in navigating space using Visual-SLAM, effectively manipulating and transporting objects, and providing insightful natural language commentary during task execution. Experimental results highlight the robot's advanced task comprehension and adaptability, underscoring its potential in real-world applications. The dataset used to fine-tune the robot-dog behavior generation model is provided at the following link: huggingface.co/datasets/ArtemLykov/CognitiveDog_dataset