Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fumio Okura

Zero-shot Hierarchical Plant Segmentation via Foundation Segmentation Models and Text-to-image Attention

Sep 11, 2025

Junhao Xing, Ryohei Miyakawa, Yang Yang, Xinpeng Liu, Risa Shinoda, Hiroaki Santo, Yosuke Toda, Fumio Okura

Abstract:Foundation segmentation models achieve reasonable leaf instance extraction from top-view crop images without training (i.e., zero-shot). However, segmenting entire plant individuals with each consisting of multiple overlapping leaves remains challenging. This problem is referred to as a hierarchical segmentation task, typically requiring annotated training datasets, which are often species-specific and require notable human labor. To address this, we introduce ZeroPlantSeg, a zero-shot segmentation for rosette-shaped plant individuals from top-view images. We integrate a foundation segmentation model, extracting leaf instances, and a vision-language model, reasoning about plants' structures to extract plant individuals without additional training. Evaluations on datasets with multiple plant species, growth stages, and shooting environments demonstrate that our method surpasses existing zero-shot methods and achieves better cross-domain performance than supervised methods. Implementations are available at https://github.com/JunhaoXing/ZeroPlantSeg.

* WACV 2026 accepted

Via

Access Paper or Ask Questions

Spectral Sensitivity Estimation with an Uncalibrated Diffraction Grating

Aug 01, 2025

Lilika Makabe, Hiroaki Santo, Fumio Okura, Michael S. Brown, Yasuyuki Matsushita

Abstract:This paper introduces a practical and accurate calibration method for camera spectral sensitivity using a diffraction grating. Accurate calibration of camera spectral sensitivity is crucial for various computer vision tasks, including color correction, illumination estimation, and material analysis. Unlike existing approaches that require specialized narrow-band filters or reference targets with known spectral reflectances, our method only requires an uncalibrated diffraction grating sheet, readily available off-the-shelf. By capturing images of the direct illumination and its diffracted pattern through the grating sheet, our method estimates both the camera spectral sensitivity and the diffraction grating parameters in a closed-form manner. Experiments on synthetic and real-world data demonstrate that our method outperforms conventional reference target-based methods, underscoring its effectiveness and practicality.

Via

Access Paper or Ask Questions

HoGS: Unified Near and Far Object Reconstruction via Homogeneous Gaussian Splatting

Mar 25, 2025

Xinpeng Liu, Zeyi Huang, Fumio Okura, Yasuyuki Matsushita

Abstract:Novel view synthesis has demonstrated impressive progress recently, with 3D Gaussian splatting (3DGS) offering efficient training time and photorealistic real-time rendering. However, reliance on Cartesian coordinates limits 3DGS's performance on distant objects, which is important for reconstructing unbounded outdoor environments. We found that, despite its ultimate simplicity, using homogeneous coordinates, a concept on the projective geometry, for the 3DGS pipeline remarkably improves the rendering accuracies of distant objects. We therefore propose Homogeneous Gaussian Splatting (HoGS) incorporating homogeneous coordinates into the 3DGS framework, providing a unified representation for enhancing near and distant objects. HoGS effectively manages both expansive spatial positions and scales particularly in outdoor unbounded environments by adopting projective geometry principles. Experiments show that HoGS significantly enhances accuracy in reconstructing distant objects while maintaining high-quality rendering of nearby objects, along with fast training speed and real-time rendering capability. Our implementations are available on our project page https://kh129.github.io/hogs/.

* Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'25)

Via

Access Paper or Ask Questions

TreeFormer: Single-view Plant Skeleton Estimation via Tree-constrained Graph Generation

Nov 25, 2024

Xinpeng Liu, Hiroaki Santo, Yosuke Toda, Fumio Okura

Figure 1 for TreeFormer: Single-view Plant Skeleton Estimation via Tree-constrained Graph Generation

Figure 2 for TreeFormer: Single-view Plant Skeleton Estimation via Tree-constrained Graph Generation

Figure 3 for TreeFormer: Single-view Plant Skeleton Estimation via Tree-constrained Graph Generation

Figure 4 for TreeFormer: Single-view Plant Skeleton Estimation via Tree-constrained Graph Generation

Abstract:Accurate estimation of plant skeletal structure (e.g., branching structure) from images is essential for smart agriculture and plant science. Unlike human skeletons with fixed topology, plant skeleton estimation presents a unique challenge, i.e., estimating arbitrary tree graphs from images. While recent graph generation methods successfully infer thin structures from images, it is challenging to constrain the output graph strictly to a tree structure. To this problem, we present TreeFormer, a plant skeleton estimator via tree-constrained graph generation. Our approach combines learning-based graph generation with traditional graph algorithms to impose the constraints during the training loop. Specifically, our method projects an unconstrained graph onto a minimum spanning tree (MST) during the training loop and incorporates this prior knowledge into the gradient descent optimization by suppressing unwanted feature values. Experiments show that our method accurately estimates target plant skeletal structures for multiple domains: Synthetic tree patterns, real botanical roots, and grapevine branches. Our implementations are available at https://github.com/huntorochi/TreeFormer/.

* IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2025)

Via

Access Paper or Ask Questions

NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images

Jun 11, 2024

Yufei Han, Heng Guo, Koki Fukai, Hiroaki Santo, Boxin Shi, Fumio Okura, Zhanyu Ma, Yunpeng Jia

Figure 1 for NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images

Figure 2 for NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images

Figure 3 for NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images

Figure 4 for NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images

Abstract:We present NeRSP, a Neural 3D reconstruction technique for Reflective surfaces with Sparse Polarized images. Reflective surface reconstruction is extremely challenging as specular reflections are view-dependent and thus violate the multiview consistency for multiview stereo. On the other hand, sparse image inputs, as a practical capture setting, commonly cause incomplete or distorted results due to the lack of correspondence matching. This paper jointly handles the challenges from sparse inputs and reflective surfaces by leveraging polarized images. We derive photometric and geometric cues from the polarimetric image formation model and multiview azimuth consistency, which jointly optimize the surface geometry modeled via implicit neural representation. Based on the experiments on our synthetic and real datasets, we achieve the state-of-the-art surface reconstruction results with only 6 views as input.

* 10 pages

Via

Access Paper or Ask Questions

Multi-View Azimuth Stereo via Tangent Space Consistency

Mar 29, 2023

Xu Cao, Hiroaki Santo, Fumio Okura, Yasuyuki Matsushita

Abstract:We present a method for 3D reconstruction only using calibrated multi-view surface azimuth maps. Our method, multi-view azimuth stereo, is effective for textureless or specular surfaces, which are difficult for conventional multi-view stereo methods. We introduce the concept of tangent space consistency: Multi-view azimuth observations of a surface point should be lifted to the same tangent space. Leveraging this consistency, we recover the shape by optimizing a neural implicit surface representation. Our method harnesses the robust azimuth estimation capabilities of photometric stereo methods or polarization imaging while bypassing potentially complex zenith angle estimation. Experiments using azimuth maps from various sources validate the accurate shape recovery with our method, even without zenith angles.

* CVPR 2023 camera-ready. Appendices after references. 16 pages, 20 figures. Project page: https://xucao-42.github.io/mvas_homepage/

Via

Access Paper or Ask Questions

Text-Guided Scene Sketch-to-Photo Synthesis

Feb 14, 2023

AprilPyone MaungMaung, Makoto Shing, Kentaro Mitsui, Kei Sawada, Fumio Okura

Figure 1 for Text-Guided Scene Sketch-to-Photo Synthesis

Figure 2 for Text-Guided Scene Sketch-to-Photo Synthesis

Figure 3 for Text-Guided Scene Sketch-to-Photo Synthesis

Figure 4 for Text-Guided Scene Sketch-to-Photo Synthesis

Abstract:We propose a method for scene-level sketch-to-photo synthesis with text guidance. Although object-level sketch-to-photo synthesis has been widely studied, whole-scene synthesis is still challenging without reference photos that adequately reflect the target style. To this end, we leverage knowledge from recent large-scale pre-trained generative models, resulting in text-guided sketch-to-photo synthesis without the need for reference images. To train our model, we use self-supervised learning from a set of photographs. Specifically, we use a pre-trained edge detector that maps both color and sketch images into a standardized edge domain, which reduces the gap between photograph-based edge images (during training) and hand-drawn sketch images (during inference). We implement our method by fine-tuning a latent diffusion model (i.e., Stable Diffusion) with sketch and text conditions. Experiments show that the proposed method translates original sketch images that are not extracted from color images into photos with compelling visual quality.

Via

Access Paper or Ask Questions

Descriptor-Free Multi-View Region Matching for Instance-Wise 3D Reconstruction

Nov 27, 2020

Takuma Doi, Fumio Okura, Toshiki Nagahara, Yasuyuki Matsushita, Yasushi Yagi

Figure 1 for Descriptor-Free Multi-View Region Matching for Instance-Wise 3D Reconstruction

Figure 2 for Descriptor-Free Multi-View Region Matching for Instance-Wise 3D Reconstruction

Figure 3 for Descriptor-Free Multi-View Region Matching for Instance-Wise 3D Reconstruction

Figure 4 for Descriptor-Free Multi-View Region Matching for Instance-Wise 3D Reconstruction

Abstract:This paper proposes a multi-view extension of instance segmentation without relying on texture or shape descriptor matching. Multi-view instance segmentation becomes challenging for scenes with repetitive textures and shapes, e.g., plant leaves, due to the difficulty of multi-view matching using texture or shape descriptors. To this end, we propose a multi-view region matching method based on epipolar geometry, which does not rely on any feature descriptors. We further show that the epipolar region matching can be easily integrated into instance segmentation and effective for instance-wise 3D reconstruction. Experiments demonstrate the improved accuracy of multi-view instance matching and the 3D reconstruction compared to the baseline methods.

* ACCV2020 Oral

Via

Access Paper or Ask Questions

Probabilistic Plant Modeling via Multi-View Image-to-Image Translation

Apr 25, 2018

Takahiro Isokane, Fumio Okura, Ayaka Ide, Yasuyuki Matsushita, Yasushi Yagi

Figure 1 for Probabilistic Plant Modeling via Multi-View Image-to-Image Translation

Figure 2 for Probabilistic Plant Modeling via Multi-View Image-to-Image Translation

Figure 3 for Probabilistic Plant Modeling via Multi-View Image-to-Image Translation

Figure 4 for Probabilistic Plant Modeling via Multi-View Image-to-Image Translation

Abstract:This paper describes a method for inferring three-dimensional (3D) plant branch structures that are hidden under leaves from multi-view observations. Unlike previous geometric approaches that heavily rely on the visibility of the branches or use parametric branching models, our method makes statistical inferences of branch structures in a probabilistic framework. By inferring the probability of branch existence using a Bayesian extension of image-to-image translation applied to each of multi-view images, our method generates a probabilistic plant 3D model, which represents the 3D branching pattern that cannot be directly observed. Experiments demonstrate the usefulness of the proposed approach in generating convincing branch structures in comparison to prior approaches.

* To appear in CVPR2018. The first two authors contributed equally. Project website: http://www.am.sanken.osaka-u.ac.jp/~okura/project/cvpr2018_plant.html

Via

Access Paper or Ask Questions