Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aaron Parness

Visual Foresight for Robotic Stow: A Diffusion-Based World Model from Sparse Snapshots

Feb 12, 2026

Lijun Zhang, Nikhil Chacko, Petter Nilsson, Ruinian Xu, Shantanu Thakar, Bai Lou, Harpreet Sawhney, Zhebin Zhang, Mudit Agrawal, Bhavana Chandrashekhar(+1 more)

Abstract:Automated warehouses execute millions of stow operations, where robots place objects into storage bins. For these systems it is valuable to anticipate how a bin will look from the current observations and the planned stow behavior before real execution. We propose FOREST, a stow-intent-conditioned world model that represents bin states as item-aligned instance masks and uses a latent diffusion transformer to predict the post-stow configuration from the observed context. Our evaluation shows that FOREST substantially improves the geometric agreement between predicted and true post-stow layouts compared with heuristic baselines. We further evaluate the predicted post-stow layouts in two downstream tasks, in which replacing the real post-stow masks with FOREST predictions causes only modest performance loss in load-quality assessment and multi-stow reasoning, indicating that our model can provide useful foresight signals for warehouse planning.

* 20 pages, 16 figures

Via

Access Paper or Ask Questions

Stow: Robotic Packing of Items into Fabric Pods

May 07, 2025

Nicolas Hudson, Josh Hooks, Rahul Warrier, Curt Salisbury, Ross Hartley, Kislay Kumar, Bhavana Chandrashekhar, Paul Birkmeyer, Bosch Tang, Matt Frost(+27 more)

Figure 1 for Stow: Robotic Packing of Items into Fabric Pods

Figure 2 for Stow: Robotic Packing of Items into Fabric Pods

Figure 3 for Stow: Robotic Packing of Items into Fabric Pods

Figure 4 for Stow: Robotic Packing of Items into Fabric Pods

Abstract:This paper presents a compliant manipulation system capable of placing items onto densely packed shelves. The wide diversity of items and strict business requirements for high producing rates and low defect generation have prohibited warehouse robotics from performing this task. Our innovations in hardware, perception, decision-making, motion planning, and control have enabled this system to perform over 500,000 stows in a large e-commerce fulfillment center. The system achieves human levels of packing density and speed while prioritizing work on overhead shelves to enhance the safety of humans working alongside the robots.

Via

Access Paper or Ask Questions

Improving Visual Feature Extraction in Glacial Environments

Aug 27, 2019

Steven D. Morad, Jeremy Nash, Shoya Higa, Russell Smith, Aaron Parness, Kobus Barnard

Figure 1 for Improving Visual Feature Extraction in Glacial Environments

Figure 2 for Improving Visual Feature Extraction in Glacial Environments

Figure 3 for Improving Visual Feature Extraction in Glacial Environments

Figure 4 for Improving Visual Feature Extraction in Glacial Environments

Abstract:Glacial science could benefit tremendously from autonomous robots, but previous glacial robots have had perception issues in these colorless and featureless environments, specifically with visual feature extraction. Glaciologists use near-infrared imagery to reveal the underlying heterogeneous spatial structure of snow and ice, and we theorize that this hidden near-infrared structure could produce more and higher quality features than available in visible light. We took a custom camera rig to Igloo Cave at Mt. St. Helens to test our theory. The camera rig contains two identical machine vision cameras, one which was outfitted with multiple filters to see only near-infrared light. We extracted features from short video clips taken inside Igloo Cave at Mt. St. Helens, using three popular feature extractors (FAST, SIFT, and SURF). We quantified the number of features and their quality for visual navigation using feature correspondence and the epipolar constraint. Our results indicate that near-infrared imagery produces more features that tend to be of higher quality than that of visible light imagery.

* 6 pages, submitted to RA-L with ICRA option

Via

Access Paper or Ask Questions