Abstract:Visual Question Answering (VQA) is a challenge task that combines natural language processing and computer vision techniques and gradually becomes a benchmark test task in multimodal large language models (MLLMs). The goal of our survey is to provide an overview of the development of VQA and a detailed description of the latest models with high timeliness. This survey gives an up-to-date synthesis of natural language understanding of images and text, as well as the knowledge reasoning module based on image-question information on the core VQA tasks. In addition, we elaborate on recent advances in extracting and fusing modal information with vision-language pretraining models and multimodal large language models in VQA. We also exhaustively review the progress of knowledge reasoning in VQA by detailing the extraction of internal knowledge and the introduction of external knowledge. Finally, we present the datasets of VQA and different evaluation metrics and discuss possible directions for future work.
Abstract:Tree skeleton plays an important role in tree structure analysis, forest inventory and ecosystem monitoring. However, it is a challenge to extract a skeleton from a tree point cloud with complex branches. In this paper, an automatic and fast tree skeleton extraction method (FTSEM) based on voxel thinning is proposed. In this method, a wood-leaf classification algorithm was introduced to filter leaf points for the reduction of the leaf interference on tree skeleton generation, tree voxel thinning was adopted to extract raw tree skeleton quickly, and a breakpoint connection algorithm was used to improve the skeleton connectivity and completeness. Experiments were carried out in Haidian Park, Beijing, in which 24 trees were scanned and processed to obtain tree skeletons. The graph search algorithm (GSA) is used to extract tree skeletons based on the same datasets. Compared with GSA method, the FTSEM method obtained more complete tree skeletons. And the time cost of the FTSEM method is evaluated using the runtime and time per million points (TPMP). The runtime of FTSEM is from 1.0 s to 13.0 s, and the runtime of GSA is from 6.4 s to 309.3 s. The average value of TPMP is 1.8 s for FTSEM, and 22.3 s for GSA respectively. The experimental results demonstrate that the proposed method is feasible, robust, and fast with a good potential on tree skeleton extraction.
Abstract:Terrestrial laser scanner is a kind of fast, high-precision data acquisition device, which had been more and more applied to the research areas of forest inventory. In this study, a kind of automated low-cost terrestrial laser scanner was designed and implemented based on a two-dimensional laser radar sensor SICK LMS-511 and a stepper motor. The new scanner was named as BEE, which can scan the forest trees in three dimension. The BEE scanner and its supporting software are specifically designed for forest inventory. The experiments have been performed by using the BEE scanner in an artificial ginkgo forest which was located in Haidian district of Beijing. Four square plots were selected to do the experiments. The BEE scanner scanned in the four plots and acquired the single scan data respectively. The DBH, tree height and tree position of trees in the four plots were estimated and analyzed. For comparison, the manual measured data was also collected in the four plots. The tree stem detection rate for all four plots was 92.75%; the root mean square error of the DBH estimation was 1.27cm; the root mean square error of the tree height estimation was 0.24m; the tree position estimation was in line with the actual position. Experimental results show that the BEE scanner can efficiently estimate the structure parameters of forest trees and has a good potential in practical application of forest inventory.