Advances in lightweight neural networks have revolutionized computer vision in a broad range of IoT applications, encompassing remote monitoring and process automation. However, the detection of small objects, which is crucial for many of these applications, remains an underexplored area in current computer vision research, particularly for embedded devices. To address this gap, the paper proposes a novel adaptive tiling method that can be used on top of any existing object detector including the popular FOMO network for object detection on microcontrollers. Our experimental results show that the proposed tiling method can boost the F1-score by up to 225% while reducing the average object count error by up to 76%. Furthermore, the findings of this work suggest that using a soft F1 loss over the popular binary cross-entropy loss can significantly reduce the negative impact of imbalanced data. Finally, we validate our approach by conducting experiments on the Sony Spresense microcontroller, showcasing the proposed method's ability to strike a balance between detection performance, low latency, and minimal memory consumption.
Smart glasses are rapidly gaining advanced functionality thanks to cutting-edge computing technologies, accelerated hardware architectures, and tiny AI algorithms. Integrating AI into smart glasses featuring a small form factor and limited battery capacity is still challenging when targeting full-day usage for a satisfactory user experience. This paper illustrates the design and implementation of tiny machine-learning algorithms exploiting novel low-power processors to enable prolonged continuous operation in smart glasses. We explore the energy- and latency-efficient of smart glasses in the case of real-time object detection. To this goal, we designed a smart glasses prototype as a research platform featuring two microcontrollers, including a novel milliwatt-power RISC-V parallel processor with a hardware accelerator for visual AI, and a Bluetooth low-power module for communication. The smart glasses integrate power cycling mechanisms, including image and audio sensing interfaces. Furthermore, we developed a family of novel tiny deep-learning models based on YOLO with sub-million parameters customized for microcontroller-based inference dubbed TinyissimoYOLO v1.3, v5, and v8, aiming at benchmarking object detection with smart glasses for energy and latency. Evaluations on the prototype of the smart glasses demonstrate TinyissimoYOLO's 17ms inference latency and 1.59mJ energy consumption per inference while ensuring acceptable detection accuracy. Further evaluation reveals an end-to-end latency from image capturing to the algorithm's prediction of 56ms or equivalently 18 fps, with a total power consumption of 62.9mW, equivalent to a 9.3 hours of continuous run time on a 154mAh battery. These results outperform MCUNet (TinyNAS+TinyEngine), which runs a simpler task (image classification) at just 7.3 fps per second.
Event-based cameras, also called silicon retinas, potentially revolutionize computer vision by detecting and reporting significant changes in intensity asynchronous events, offering extended dynamic range, low latency, and low power consumption, enabling a wide range of applications from autonomous driving to longtime surveillance. As an emerging technology, there is a notable scarcity of publicly available datasets for event-based systems that also feature frame-based cameras, in order to exploit the benefits of both technologies. This work quantitatively evaluates a multi-modal camera setup for fusing high-resolution DVS data with RGB image data by static camera alignment. The proposed setup, which is intended for semi-automatic DVS data labeling, combines two recently released Prophesee EVK4 DVS cameras and one global shutter XIMEA MQ022CG-CM RGB camera. After alignment, state-of-the-art object detection or segmentation networks label the image data by mapping boundary boxes or labeled pixels directly to the aligned events. To facilitate this process, various time-based synchronization methods for DVS data are analyzed, and calibration accuracy, camera alignment, and lens impact are evaluated. Experimental results demonstrate the benefits of the proposed system: the best synchronization method yields an image calibration error of less than 0.90px and a pixel cross-correlation deviation of1.6px, while a lens with 8mm focal length enables detection of objects with size 30cm at a distance of 350m against homogeneous background.
In the ever-growing Internet of Things (IoT) landscape, smart power management algorithms combined with energy harvesting solutions are crucial to obtain self-sustainability. This paper presents an energy-aware adaptive sampling rate algorithm designed for embedded deployment in resource-constrained, battery-powered IoT devices. The algorithm, based on a finite state machine (FSM) and inspired by Transmission Control Protocol (TCP) Reno's additive increase and multiplicative decrease, maximizes sensor sampling rates, ensuring power self-sustainability without risking battery depletion. Moreover, we characterized our solar cell with data acquired over 48 days and used the model created to obtain energy data from an open-source world-wide dataset. To validate our approach, we introduce the EcoTrack device, a versatile device with global navigation satellite system (GNSS) capabilities and Long-Term Evolution Machine Type Communication (LTE-M) connectivity, supporting MQTT protocol for cloud data relay. This multi-purpose device can be used, for instance, as a health and safety wearable, remote hazard monitoring system, or as a global asset tracker. The results, validated on data from three different European cities, show that the proposed algorithm enables self-sustainability while maximizing sampled locations per day. In experiments conducted with a 3000 mAh battery capacity, the algorithm consistently maintained a minimum of 24 localizations per day and achieved peaks of up to 3000.
This paper introduces an effective solution for retrofitting construction power tools with low-power IoT to enable accurate activity classification. We address the challenge of distinguishing between when a power tool is being moved and when it is actually being used. To achieve classification accuracy and power consumption preservation a newly released algorithm called MiniRocket was employed. Known for its accuracy, scalability, and fast training for time-series classification, in this paper, it is proposed as a TinyML algorithm for inference on resource-constrained IoT devices. The paper demonstrates the portability and performance of MiniRocket on a resource-constrained, ultra-low power sensor node for floating-point and fixed-point arithmetic, matching up to 1% of the floating-point accuracy. The hyperparameters of the algorithm have been optimized for the task at hand to find a Pareto point that balances memory usage, accuracy and energy consumption. For the classification problem, we rely on an accelerometer as the sole sensor source, and BLE for data transmission. Extensive real-world construction data, using 16 different power tools, were collected, labeled, and used to validate the algorithm's performance directly embedded in the IoT device. Experimental results demonstrate that the proposed solution achieves an accuracy of 96.9% in distinguishing between real usage status and other motion statuses while consuming only 7kB of flash and 3kB of RAM. The final application exhibits an average current consumption of less than 15{\mu}W for the whole system, resulting in battery life performance ranging from 3 to 9 years depending on the battery capacity (250-500mAh) and the number of power tool usage hours (100-1500h).
Indoor Positioning System (IPS) is a crucial technology that enables medical staff and hospital managements to accurately locate and track persons or assets inside the medical buildings. Among other technologies, Bluetooth Low Energy (BLE) can be exploited for achieving an energy-efficient and low-cost solution. This work presents the design and implementation of an received signal strength indicator (RSSI)-based indoor localization system. The paper shows the implementation of a low complex weighted k-Nearest Neighbors algorithm that processes raw RSSI data from connection-less iBeacon's. The designed hardware and firmware are implemented around the low-power and low-cost nRF52832 from Nordic Semiconductor. Experimental evaluation with the real-time data processing has been evaluated and presented in a 7.2 m by 7.2 m room with furniture and 5 beacon nodes. The experimental results show an average error of only 0.72 m in realistic conditions. Finally, the overall power consumption of the fixed beacon with a periodic advertisement of 100 ms is only 50 uA at 3 V, which leads to a long-lasting solution of over one year with a 500 mAh coin battery.
Nano-sized unmanned aerial vehicles (UAVs) are well-fit for indoor applications and for close proximity to humans. To enable autonomy, the nano-UAV must be able to self-localize in its operating environment. This is a particularly-challenging task due to the limited sensing and compute resources on board. This work presents an online and onboard approach for localization in floor plans annotated with semantic information. Unlike sensor-based maps, floor plans are readily-available, and do not increase the cost and time of deployment. To overcome the difficulty of localizing in sparse maps, the proposed approach fuses geometric information from miniaturized time-of-flight sensors and semantic cues. The semantic information is extracted from images by deploying a state-of-the-art object detection model on a high-performance multi-core microcontroller onboard the drone, consuming only 2.5mJ per frame and executing in 38ms. In our evaluation, we globally localize in a real-world office environment, achieving 90% success rate. We also release an open-source implementation of our work.
Ultrasound is a key technology in healthcare, and it is being explored for non-invasive, wearable, continuous monitoring of vital signs. However, its widespread adoption in this scenario is still hindered by the size, complexity, and power consumption of current devices. Moreover, such an application demands adaptability to human anatomy, which is hard to achieve with current transducer technology. This paper presents a novel ultrasound system prototype based on a fully printed, lead-free, and flexible polymer ultrasound transducer, whose bending radius promises good adaptability to the human anatomy. Our application scenario focuses on continuous blood flow monitoring. We implemented a hardware envelope filter to efficiently transpose high-frequency ultrasound signals to a lower-frequency spectrum. This reduces computational and power demands with little to no degradation in the task proposed for this work. We validated our method on a setup that mimics human blood flow by using a flow phantom and a peristaltic pump simulating 3 different heartbeat rhythms: 60, 90 and 120 beats per minute. Our ultrasound setup reconstructs peristaltic pump frequencies with errors of less than 0.05 Hz (3 bpm) from the set pump frequency, both for the raw echo and the enveloped echo. The analog pre-processing showed a promising reduction of signal bandwidth of more than 6x: pulse-echo signals of transducers excited at 12.5 MHz were reduced to about 2 MHz. Thus, allowing consumer MCUs to acquire and elaborate signals within mW-power range in an inexpensive fashion.
In ski jumping, low repetition rates of jumps limit the effectiveness of training. Thus, increasing learning rate within every single jump is key to success. A critical element of athlete training is motor learning, which has been shown to be accelerated by feedback methods. In particular, a fine-grained control of the center of gravity in the in-run is essential. This is because the actual takeoff occurs within a blink of an eye ($\sim$300ms), thus any unbalanced body posture during the in-run will affect flight. This paper presents a smart, compact, and energy-efficient wireless sensor system for real-time performance analysis and biofeedback during ski jumping. The system operates by gauging foot pressures at three distinct points on the insoles of the ski boot at 100Hz. Foot pressure data can either be directly sent to coaches to improve their feedback, or fed into a ML model to give athletes instantaneous in-action feedback using a vibration motor in the ski boot. In the biofeedback scenario, foot pressures act as input variables for an optimized XGBoost model. We achieve a high predictive accuracy of 92.7% for center of mass predictions (dorsal shift, neutral stand, ventral shift). Subsequently, we parallelized and fine-tuned our XGBoost model for a RISC-V based low power parallel processor (GAP9), based on the PULP architecture. We demonstrate real-time detection and feedback (0.0109ms/inference) using our on-chip deployment. The proposed smart system is unobtrusive with a slim form factor (13mm baseboard, 3.2mm antenna) and a lightweight build (26g). Power consumption analysis reveals that the system's energy-efficient design enables sustained operation over multiple days (up to 300 hours) without requiring recharge.
Perceiving and mapping the surroundings are essential for enabling autonomous navigation in any robotic platform. The algorithm class that enables accurate mapping while correcting the odometry errors present in most robotics systems is Simultaneous Localization and Mapping (SLAM). Today, fully onboard mapping is only achievable on robotic platforms that can host high-wattage processors, mainly due to the significant computational load and memory demands required for executing SLAM algorithms. For this reason, pocket-size hardware-constrained robots offload the execution of SLAM to external infrastructures. To address the challenge of enabling SLAM algorithms on resource-constrained processors, this paper proposes NanoSLAM, a lightweight and optimized end-to-end SLAM approach specifically designed to operate on centimeter-size robots at a power budget of only 87.9 mW. We demonstrate the mapping capabilities in real-world scenarios and deploy NanoSLAM on a nano-drone weighing 44 g and equipped with a novel commercial RISC-V low-power parallel processor called GAP9. The algorithm is designed to leverage the parallel capabilities of the RISC-V processing cores and enables mapping of a general environment with an accuracy of 4.5 cm and an end-to-end execution time of less than 250 ms.