



Abstract:In this work we present a multi-armed bandit framework for online expert selection in Markov decision processes and demonstrate its use in high-dimensional settings. Our method takes a set of candidate expert policies and switches between them to rapidly identify the best performing expert using a variant of the classical upper confidence bound algorithm, thus ensuring low regret in the overall performance of the system. This is useful in applications where several expert policies may be available, and one needs to be selected at run-time for the underlying environment.




Abstract:In collaborative human-robot scenarios, when a person is not satisfied with how a robot performs a task, they can intervene to correct it. Reward learning methods enable the robot to adapt its reward function online based on such human input. However, this online adaptation requires low sample complexity algorithms which rely on simple functions of handcrafted features. In practice, pre-specifying an exhaustive set of features the person might care about is impossible; what should the robot do when the human correction cannot be explained by the features it already has access to? Recent progress in deep Inverse Reinforcement Learning (IRL) suggests that the robot could fall back on demonstrations: ask the human for demonstrations of the task, and recover a reward defined over not just the known features, but also the raw state space. Our insight is that rather than implicitly learning about the missing feature(s) from task demonstrations, the robot should instead ask for data that explicitly teaches it about what it is missing. We introduce a new type of human input, in which the person guides the robot from areas of the state space where the feature she is teaching is highly expressed to states where it is not. We propose an algorithm for learning the feature from the raw state space and integrating it into the reward function. By focusing the human input on the missing feature, our method decreases sample complexity and improves generalization of the learned reward over the above deep IRL baseline. We show this in experiments with a 7DOF robot manipulator. Finally, we discuss our method's potential implications for deep reward learning more broadly: taking a divide-and-conquer approach that focuses on important features separately before learning from demonstrations can improve generalization in tasks where such features are easy for the human to teach.




Abstract:Real world navigation requires robots to operate in unfamiliar, dynamic environments, sharing spaces with humans. Navigating around humans is especially difficult because it requires predicting their future motion, which can be quite challenging. We propose a novel framework for navigation around humans which combines learning-based perception with model-based optimal control. Specifically, we train a Convolutional Neural Network (CNN)-based perception module which maps the robot's visual inputs to a waypoint, or next desired state. This waypoint is then input into planning and control modules which convey the robot safely and efficiently to the goal. To train the CNN we contribute a photo-realistic bench-marking dataset for autonomous robot navigation in the presence of humans. The CNN is trained using supervised learning on images rendered from our photo-realistic dataset. The proposed framework learns to anticipate and react to peoples' motion based only on a monocular RGB image, without explicitly predicting future human motion. Our method generalizes well to unseen buildings and humans in both simulation and real world environments. Furthermore, our experiments demonstrate that combining model-based control and learning leads to better and more data-efficient navigational behaviors as compared to a purely learning based approach. Videos describing our approach and experiments are available on the project website.




Abstract:In Bansal et al. (2019), a novel visual navigation framework that combines learning-based and model-based approaches has been proposed. Specifically, a Convolutional Neural Network (CNN) predicts a waypoint that is used by the dynamics model for planning and tracking a trajectory to the waypoint. However, the CNN inevitably makes prediction errors, ultimately leading to collisions, especially when the robot is navigating through cluttered and tight spaces. In this paper, we present a novel Hamilton-Jacobi (HJ) reachability-based method to generate supervision for the CNN for waypoint prediction. By modeling the prediction error of the CNN as disturbances in dynamics, the proposed method generates waypoints that are robust to these disturbances, and consequently to the prediction errors. Moreover, using globally optimal HJ reachability analysis leads to predicting waypoints that are time-efficient and do not exhibit greedy behavior. Through simulations and experiments on a hardware testbed, we demonstrate the advantages of the proposed approach for navigation tasks where the robot needs to navigate through cluttered, narrow indoor environments.




Abstract:Here we present the design of an insect-scale microrobot that generates lift by spinning its wings. This is in contrast to most other microrobot designs at this size scale which rely on flapping wings to produce lift. The robot has a wing span of 4 centimeters and weighs 133 milligrams. It spins its wings at 47 revolutions/second generating $>$ 138 milligrams of lift while consuming approximately 60 milliwatts of total power and operating at a low voltage ($<$ 3 V). Of the total power consumed 8.8 milliwatts is mechanical power generated, part of which goes towards spinning the wings, and 51 milliwatts is wasted in resistive Joule heating. With a lift-to-power ratio of 2.3 grams/W, its performance is at par with the best reported flapping wing devices at the insect-scale.




Abstract:Here we report the construction of the simplest transmission mechanism ever designed capable of converting linear motions of any actuator to $\pm$60$^\circ$ rotary wing stroke motion. It is planar, compliant, can be fabricated in a single step and requires no assembly. Further, its design is universal in nature, that is, it can be used with any linear actuator capable of delivering sufficient power, irrespective of the magnitude of actuator displacements. We also report a novel passive wing pitch mechanism whose motion has little dependence on the aerodynamic loading on the wing. This exponentially simplifies the job of the designer by decoupling the as of yet highly coupled wing morphology, wing kinematics and flexure stiffness parameters. Like the contemporary flexure-based methods it is an add-on to a given wing stroke mechanism. Moreover, the intended wing pitch amplitude could easily be changed post-fabrication by tuning the resonance mass in the mechanism.




Abstract:Here we report the first sub-milligram flapping wing vehicle which is able to mimic insect wing kinematics. Wing stroke amplitude of 90$^\circ$ and wing pitch amplitude of 80$^\circ$ is demonstrated. This is also the smallest wing-span (single wing length of 3.5mm) device reported yet and is at the same mass-scale as a fruit fly. Assembly has been made simple and requires gluing together 5 components in contrast to higher part count and intensive assembly of other milligram-scale microrobots. This increases the fabrication speed and success-rate of the fully fabricated device. Low operational voltages (70mV) makes testing further easy and will enable eventual deployment of autonomous sub-milligram aerial vehicles.




Abstract:We design an insect-sized rolling microrobot driven by continuously rotating wheels. It measures 18mm$\times$8mm$\times$8mm. There are 2 versions of the robot - a 96mg laser-powered one and a 130mg supercapacitor powered one. The robot can move at 27mm/s (1.5 body lengths per second) with wheels rotating at 300$^\circ$/s, while consuming an average power of 2.5mW. Neither version has any electrical wires coming out of it, with the supercapacitor powered robot also being self-sufficient and is able to roll freely for 8 seconds after a single charge. Low-voltage electromagnetic actuators (1V-3V) along with a novel double-ratcheting mechanism enable the operation of this device. It is, to the best of our knowledge, the lightest and fastest self-sufficient rolling microrobot reported yet.




Abstract:We present the design of an insect-sized jumping microrobot measuring 17mm$\times$6mm$\times$14mm and weighing 75 milligrams. The microrobot consumes 6.4mW of power to jump up by 8mm in height. The tethered version of the robot can jump 6 times per minute each time landing perfectly on its feet. The untethered version of the robot is powered using onboard photovoltaic cells illuminated by an external infrared laser source. It is, to the best of our knowledge, the lightest untethered jumping microrobot with onboard power source that has been reported yet.




Abstract:To use neural networks in safety-critical settings it is paramount to provide assurances on their runtime operation. Recent work on ReLU networks has sought to verify whether inputs belonging to a bounded box can ever yield some undesirable output. Input-splitting procedures, a particular type of verification mechanism, do so by recursively partitioning the input set into smaller sets. The efficiency of these methods is largely determined by the number of splits the box must undergo before the property can be verified. In this work, we propose a new technique based on shadow prices that fully exploits the information of the problem yielding a more efficient generation of splits than the state-of-the-art. Results on the Airborne Collision Avoidance System (ACAS) benchmark verification tasks show a considerable reduction in the partitions generated which substantially reduces computation times. These results open the door to improved verification methods for a wide variety of machine learning applications including vision and control.