Abstract:Long videos contain a vast amount of information, making video-text retrieval an essential and challenging task in multimodal learning. However, existing benchmarks suffer from limited video duration, low-quality captions, and coarse annotation granularity, which hinder the evaluation of advanced video-text retrieval methods. To address these limitations, we introduce LoVR, a benchmark specifically designed for long video-text retrieval. LoVR contains 467 long videos and over 40,804 fine-grained clips with high-quality captions. To overcome the issue of poor machine-generated annotations, we propose an efficient caption generation framework that integrates VLM automatic generation, caption quality scoring, and dynamic refinement. This pipeline improves annotation accuracy while maintaining scalability. Furthermore, we introduce a semantic fusion method to generate coherent full-video captions without losing important contextual information. Our benchmark introduces longer videos, more detailed captions, and a larger-scale dataset, presenting new challenges for video understanding and retrieval. Extensive experiments on various advanced embedding models demonstrate that LoVR is a challenging benchmark, revealing the limitations of current approaches and providing valuable insights for future research. We release the code and dataset link at https://github.com/TechNomad-ds/LoVR-benchmark
Abstract:This paper presents the design, modeling, and experimental validation of CapsuleBot, a compact hybrid aerial-ground vehicle designed for long-term covert reconnaissance. CapsuleBot combines the manoeuvrability of bicopter in the air with the energy efficiency and noise reduction of ground vehicles on the ground. To accomplish this, a structure named actuated-wheel-rotor has been designed, utilizing a sole motor for both the unilateral rotor tilting in the bicopter configuration and the wheel movement in ground mode. CapsuleBot comes equipped with two of these structures, enabling it to attain hybrid aerial-ground propulsion with just four motors. Importantly, the decoupling of motion modes is achieved without the need for additional drivers, enhancing the versatility and robustness of the system. Furthermore, we have designed the full dynamics and control for aerial and ground locomotion based on the bicopter model and the two-wheeled self-balancing vehicle model. The performance of CapsuleBot has been validated through experiments. The results demonstrate that CapsuleBot produces 40.53% less noise in ground mode and consumes 99.35% less energy, highlighting its potential for long-term covert reconnaissance applications.
Abstract:Roller-Quadrotor is a novel hybrid terrestrial and aerial quadrotor that combines the elevated maneuverability of the quadrotor with the lengthy endurance of the ground vehicle. This work presents the design, modeling, and experimental validation of Roller-Quadrotor. Flying is achieved through a quadrotor configuration, and four actuators providing thrust. Rolling is supported by unicycle-driven and rotor-assisted turning structure. During terrestrial locomotion, the vehicle needs to overcome rolling and turning resistance, thus saving energy compared to flight mode. This work overcomes the challenging problems of general rotorcraft, reduces energy consumption and allows to through special terrain, such as narrow gaps. It also solves the obstacle avoidance challenge faced by terrestrial robots by flying. We design the models and controllers for the vehicle. The experiment results show that it can switch between aerial and terrestrial locomotion, and be able to safely pass through a narrow gap half the size of its diameter. Besides, it is capable of rolling a distance approximately 3.8 times as much as flying or operating about 42.2 times as lengthy as flying. These results demonstrate the feasibility and effectiveness of the structure and control in rolling through special terrain and energy saving.