Abstract:Object insertion under tight tolerances ($< \hspace{-.02in} 1mm$) is an important but challenging assembly task as even small errors can result in undesirable contacts. Recent efforts focused on Reinforcement Learning (RL), which often depends on careful definition of dense reward functions. This work proposes an effective strategy for such tasks that integrates traditional model-based control with RL to achieve improved insertion accuracy. The policy is trained exclusively in simulation and is zero-shot transferred to the real system. It employs a potential field-based controller to acquire a model-based policy for inserting a plug into a socket given full observability in simulation. This policy is then integrated with residual RL, which is trained in simulation given only a sparse, goal-reaching reward. A curriculum scheme over observation noise and action magnitude is used for training the residual RL policy. Both policy components use as input the SE(3) poses of both the plug and the socket and return the plug's SE(3) pose transform, which is executed by a robotic arm using a controller. The integrated policy is deployed on the real system without further training or fine-tuning, given a visual SE(3) object tracker. The proposed solution and alternatives are evaluated across a variety of objects and conditions in simulation and reality. The proposed approach outperforms recent RL-based methods in this domain and prior efforts with hybrid policies. Ablations highlight the impact of each component of the approach.
Abstract:Multi-plane light converter (MPLC) designs supporting hundreds of modes are attractive in high-throughput optical communications. These photonic structures typically comprise >10 phase masks in free space, with millions of independent design parameters. Conventional MPLC design using wavefront matching updates one mask at a time while fixing the rest. Here we construct a physical neural network (PNN) to model the light propagation and phase modulation in MPLC, providing access to the entire parameter set for optimization, including not only profiles of the phase masks and the distances between them. PNN training supports flexible optimization sequences and is a superset of existing MPLC design methods. In addition, our method allows tuning of hyperparameters of PNN training such as learning rate and batch size. Because PNN-based MPLC is found to be insensitive to the number of input and target modes in each training step, we have demonstrated a high-order MPLC design (45 modes) using mini batches that fit into the available computing resources.