Abstract:Vision Foundation Models (VFM) pre-trained on large-scale unlabeled data have achieved remarkable success on general computer vision tasks, yet typically suffer from significant domain gaps when applied to agriculture. In this context, we introduce $SPROUT$ ($S$calable $P$lant $R$epresentation model via $O$pen-field $U$nsupervised $T$raining), a multi-crop, multi-task agricultural foundation model trained via diffusion denoising. SPROUT leverages a VAE-free Pixel-space Diffusion Transformer to learn rich, structure-aware representations through denoising and enabling efficient end-to-end training. We pre-train SPROUT on a curated dataset of 2.6 million high-quality agricultural images spanning diverse crops, growth stages, and environments. Extensive experiments demonstrate that SPROUT consistently outperforms state-of-the-art web-pretrained and agricultural foundation models across a wide range of downstream tasks, while requiring substantially lower pre-training cost. The code and model are available at https://github.com/UTokyo-FieldPhenomics-Lab/SPROUT.




Abstract:Accurate estimation of heading date of paddy rice greatly helps the breeders to understand the adaptability of different crop varieties in a given location. The heading date also plays a vital role in determining grain yield for research experiments. Visual examination of the crop is laborious and time consuming. Therefore, quick and precise estimation of heading date of paddy rice is highly essential. In this work, we propose a simple pipeline to detect regions containing flowering panicles from ground level RGB images of paddy rice. Given a fixed region size for an image, the number of regions containing flowering panicles is directly proportional to the number of flowering panicles present. Consequently, we use the flowering panicle region counts to estimate the heading date of the crop. The method is based on image classification using Convolutional Neural Networks (CNNs). We evaluated the performance of our algorithm on five time series image sequences of three different varieties of rice crops. When compared to the previous work on this dataset, the accuracy and general versatility of the method has been improved and heading date has been estimated with a mean absolute error of less than 1 day.