Abstract:Deploying foundation models in embodied edge systems is fundamentally a systems problem, not just a problem of model compression. Real-time control must operate within strict size, weight, and power constraints, where memory traffic, compute latency, timing variability, and safety margins interact directly. The Deployment Gauntlet organizes these constraints into eight coupled barriers that determine whether embodied foundation models can run reliably in practice. Across representative edge workloads, autoregressive Vision-Language-Action policies are constrained primarily by memory bandwidth, whereas diffusion-based controllers are limited more by compute latency and sustained execution cost. Reliable deployment therefore depends on system-level co-design across memory, scheduling, communication, and model architecture, including decompositions that separate fast control from slower semantic reasoning.




Abstract:Object detection is vital in precision agriculture for plant monitoring, disease detection, and yield estimation. However, models like YOLO struggle with occlusions, irregular structures, and background noise, reducing detection accuracy. While Spatial Transformer Networks (STNs) improve spatial invariance through learned transformations, affine mappings are insufficient for non-rigid deformations such as bent leaves and overlaps. We propose CBAM-STN-TPS-YOLO, a model integrating Thin-Plate Splines (TPS) into STNs for flexible, non-rigid spatial transformations that better align features. Performance is further enhanced by the Convolutional Block Attention Module (CBAM), which suppresses background noise and emphasizes relevant spatial and channel-wise features. On the occlusion-heavy Plant Growth and Phenotyping (PGP) dataset, our model outperforms STN-YOLO in precision, recall, and mAP. It achieves a 12% reduction in false positives, highlighting the benefits of improved spatial flexibility and attention-guided refinement. We also examine the impact of the TPS regularization parameter in balancing transformation smoothness and detection performance. This lightweight model improves spatial awareness and supports real-time edge deployment, making it ideal for smart farming applications requiring accurate and efficient monitoring.