Abstract:Robotic bin packing is widely deployed in warehouse automation, with current systems achieving robust performance through heuristic and learning-based strategies. These systems must balance compact placement with rapid execution, where selecting alternative items or reorienting them can improve space utilization but introduce additional time. We propose a selection-based formulation that explicitly reasons over this trade-off: at each step, the robot evaluates multiple candidate actions, weighing expected packing benefit against estimated operational time. This enables time-aware strategies that selectively accept increased operational time when it yields meaningful spatial improvements. Our method, STEP (Space-Time Efficient Packing), uses a preference-conditioned, Transformer-based reinforcement learning policy, and allows generalization across candidate set sizes and integration with standard placement modules. It achieves a 44% reduction in operational time without compromising packing density. Additional material is available at https://step-packing.github.io.
Abstract:Contact-rich assembly of complex, non-convex parts with tight tolerances remains a formidable challenge. Purely model-based methods struggle with discontinuous contact dynamics, while model-free methods require vast data and often lack precision. In this work, we introduce a hybrid framework that uses only contact-state information between a complex peg and its mating hole to recover the full SE(3) pose during assembly. In under 10 seconds of online execution, a sequence of primitive probing motions constructs a local contact submanifold, which is then aligned to a precomputed offline contact manifold to yield sub-mm and sub-degree pose estimates. To eliminate costly k-NN searches, we train a lightweight network that projects sparse contact observations onto the contact manifold and is 95x faster and 18% more accurate. Our method, evaluated on three industrially relevant geometries with clearances of 0.1-1.0 mm, achieves a success rate of 93.3%, a 4.1x improvement compared to primitive-only strategies without state estimation.