Accurate drone detection is strongly desired in drone collision avoidance, drone defense and autonomous Unmanned Aerial Vehicle (UAV) self-landing. With the recent emergence of the Vision Transformer (ViT), this critical task is reassessed in this paper using a UAV dataset composed of 1359 drone photos. We construct various CNN and ViT-based models, demonstrating that for single-drone detection, a basic ViT can achieve performance 4.6 times more robust than our best CNN-based transfer learning models. By implementing the state-of-the-art You Only Look Once (YOLO v7, 200 epochs) and the experimental ViT-based You Only Look At One Sequence (YOLOS, 20 epochs) in multi-drone detection, we attain impressive 98% and 96% mAP values, respectively. We find that ViT outperforms CNN at the same epoch, but also requires more training data, computational power, and sophisticated, performance-oriented designs to fully surpass the capabilities of cutting-edge CNN detectors. We summarize the distinct characteristics of ViT and CNN models to aid future researchers in developing more efficient deep learning models.
* 7 pages, 23 figures, IEEE Xplore, 2023 International Conference on
Computer Vision and Robotics Science
In recent years, advances in neuroscience and artificial intelligence have paved the way for unprecedented opportunities for understanding the complexity of the brain and its emulation by computational systems. Cutting-edge advancements in neuroscience research have revealed the intricate relationship between brain structure and function, while the success of artificial neural networks highlights the importance of network architecture. Now is the time to bring them together to better unravel how intelligence emerges from the brain's multiscale repositories. In this review, we propose the Digital Twin Brain (DTB) as a transformative platform that bridges the gap between biological and artificial intelligence. It consists of three core elements: the brain structure that is fundamental to the twinning process, bottom-layer models to generate brain functions, and its wide spectrum of applications. Crucially, brain atlases provide a vital constraint, preserving the brain's network organization within the DTB. Furthermore, we highlight open questions that invite joint efforts from interdisciplinary fields and emphasize the far-reaching implications of the DTB. The DTB can offer unprecedented insights into the emergence of intelligence and neurological disorders, which holds tremendous promise for advancing our understanding of both biological and artificial intelligence, and ultimately propelling the development of artificial general intelligence and facilitating precision mental healthcare.
Deep Reinforcement Learning (DRL) based controllers offer high performance in a variety of network environments. However, simulator-based training of DRL controllers using highly skewed datasets of real-world traces often results in poor performance in the wild. In this paper, we put forward a generalizable solution for training high-performance DRL controllers in simulators -- Prioritized Trace Selection (PTS). PTS employs an automated three-stage process. First, we identify critical features that determine trace behavior. Second, we classify the traces into clusters. Finally, we dynamically identify and prioritize the salient clusters during training. PTS does not require any changes to the DRL workflow. It can work across both on-policy and off-policy DRL algorithms. We use Adaptive Bit Rate selection and Congestion Control as representative applications to show that PTS offers better performance in simulation and real-world, across multiple controllers and DRL algorithms. Our novel ABR controller, Gelato, trained with PTS outperforms state-of-the-art controllers on the real-world live-streaming platform, Puffer, reducing stalls by 59% and significantly improving average video quality.