Abstract:Recent advancements in Large Vision-Language Models (LVLMs) have demonstrated impressive capabilities across various multimodal tasks. They continue, however, to struggle with trivial scenarios such as reading values from Digital Measurement Devices (DMDs), particularly in real-world conditions involving clutter, occlusions, extreme viewpoints, and motion blur; common in head-mounted cameras and Augmented Reality (AR) applications. Motivated by these limitations, this work introduces CAD2DMD-SET, a synthetic data generation tool designed to support visual question answering (VQA) tasks involving DMDs. By leveraging 3D CAD models, advanced rendering, and high-fidelity image composition, our tool produces diverse, VQA-labelled synthetic DMD datasets suitable for fine-tuning LVLMs. Additionally, we present DMDBench, a curated validation set of 1,000 annotated real-world images designed to evaluate model performance under practical constraints. Benchmarking three state-of-the-art LVLMs using Average Normalised Levenshtein Similarity (ANLS) and further fine-tuning LoRA's of these models with CAD2DMD-SET's generated dataset yielded substantial improvements, with InternVL showcasing a score increase of 200% without degrading on other tasks. This demonstrates that the CAD2DMD-SET training dataset substantially improves the robustness and performance of LVLMs when operating under the previously stated challenging conditions. The CAD2DMD-SET tool is expected to be released as open-source once the final version of this manuscript is prepared, allowing the community to add different measurement devices and generate their own datasets.
Abstract:This paper describes Fields2Cover, a novel open source library for coverage path planning (CPP) for agricultural vehicles. While there are several CPP solutions nowadays, there have been limited efforts to unify them into an open source library and provide benchmarking tools to compare their performance. Fields2Cover provides a framework for planning coverage paths, developing novel techniques, and benchmarking state-of-the-art algorithms. The library features a modular and extensible architecture that supports various vehicles and can be used for a variety of applications, including farms. Its core modules are: a headland generator, a swath generator, a route planner and a path planner. An interface to the Robot Operating System (ROS) is also supplied as an add-on. In this paper, the functionalities of the library for planning a coverage path in agriculture are demonstrated using 8 state-of-the-art methods and 7 objective functions in simulation and field experiments.