Robots need to learn behaviors in intuitive and practical ways for widespread deployment in human environments. To learn a robot behavior end-to-end, we train a variant of the ResNet that maps eye-in-hand camera images to end-effector velocities. In our setup, a human teacher demonstrates the task via joystick. We show that a simple servoing task can be learned in less than an hour including data collection, model training and deployment time. Moreover, 16 minutes of demonstrations were enough for the robot to learn the task.
We present a robotic system capable of navigating autonomously by following a line and taking good quality pictures of people. When a group of people are detected, the robot rotates towards them and then back to line while continuously taking pictures from different angles. Each picture is processed in the cloud where its quality is estimated in a two-stage algorithm. First, features such as the face orientation and likelihood of facial emotions are input to a fully connected neural network to assign a quality score to each face. Second, a representation is extracted by abstracting faces from the image and it is input to a to Convolutional Neural Network (CNN) to classify the quality of the overall picture. We collected a dataset in which a picture was labeled as good quality if subjects are well-positioned in the image and oriented towards the camera with a pleasant expression. Our approach detected the quality of pictures with 78.4% accuracy in this dataset and received a better mean user rating (3.71/5) than a heuristic method that uses photographic composition procedures in a study where 97 human judges rated each picture. A statistical analysis against the state-of-the-art verified the quality of the resulting pictures.
We discuss the process of building semantic maps, how to interactively label entities in them, and how to use them to enable context-aware navigation behaviors in human environments. We utilize planar surfaces, such as walls and tables, and static objects, such as door signs, as features for our semantic mapping approach. Users can interactively annotate these features by having the robot follow him/her, entering the label through a mobile app, and performing a pointing gesture toward the landmark of interest. Our gesture based approach can reliably estimate which object is being pointed at and detect ambiguous gestures with probabilistic modeling. Our person following method attempts to maximize future utility by a search for future actions assuming constant velocity model for the human. We describe a method to extract metric goals from a semantic map landmark and to plan a human aware path that takes into account the personal spaces of people. Finally, we demonstrate context-awareness for person following in two scenarios: interactive labeling and door passing. We believe that future navigation approaches and service robotics applications can be made more effective by further exploiting the structure of human environments.
Driving is a social activity: drivers often indicate their intent to change lanes via motion cues. We consider mixed-autonomy traffic where a Human-driven Vehicle (HV) and an Autonomous Vehicle (AV) drive together. We propose a planning framework where the degree to which the AV considers the other agent's reward is controlled by a selfishness factor. We test our approach on a simulated two-lane highway where the AV and HV merge into each other's lanes. In a user study with 21 subjects and 6 different selfishness factors, we found that our planning approach was sound and that both agents had less merging times when a factor that balances the rewards for the two agents was chosen. Our results on double lane merging suggest it to be a non-zero-sum game and encourage further investigation on collaborative decision making algorithms for mixed-autonomy traffic.
This paper presents a learning from demonstration approach to programming safe, autonomous behaviors for uncommon driving scenarios. Simulation is used to re-create a targeted driving situation, one containing a road-side hazard creating a significant occlusion in an urban neighborhood, and collect optimal driving behaviors from 24 users. Paper employs a key-frame based approach combined with an algorithm to linearly combine models in order to extend the behavior to novel variations of the target situation. This approach is theoretically agnostic to the kind of LfD framework used for modeling data and our results suggest it generalizes well to variations containing an additional number of hazards occurring in sequence. The linear combination algorithm is informed by analysis of driving data, which also suggests that decision-making algorithms need to consider a trade-off between road-rules and immediate rewards to tackle some complex cases.
Deep reinforcement learning has emerged as a powerful tool for a variety of learning tasks, however deep nets typically exhibit forgetting when learning multiple tasks in sequence. To mitigate forgetting, we propose an experience replay process that augments the standard FIFO buffer and selectively stores experiences in a long-term memory. We explore four strategies for selecting which experiences will be stored: favoring surprise, favoring reward, matching the global training distribution, and maximizing coverage of the state space. We show that distribution matching successfully prevents catastrophic forgetting, and is consistently the best approach on all domains tested. While distribution matching has better and more consistent performance, we identify one case in which coverage maximization is beneficial - when tasks that receive less trained are more important. Overall, our results show that selective experience replay, when suitable selection algorithms are employed, can prevent catastrophic forgetting.
Providing an efficient strategy to navigate safely through unsignaled intersections is a difficult task that requires determining the intent of other drivers. We explore the effectiveness of Deep Reinforcement Learning to handle intersection problems. Using recent advances in Deep RL, we are able to learn policies that surpass the performance of a commonly-used heuristic approach in several metrics including task completion time and goal success rate and have limited ability to generalize. We then explore a system's ability to learn active sensing behaviors to enable navigating safely in the case of occlusions. Our analysis, provides insight into the intersection handling problem, the solutions learned by the network point out several shortcomings of current rule-based methods, and the failures of our current deep reinforcement learning system point to future research directions.
We view intersection handling on autonomous vehicles as a reinforcement learning problem, and study its behavior in a transfer learning setting. We show that a network trained on one type of intersection generally is not able to generalize to other intersections. However, a network that is pre-trained on one intersection and fine-tuned on another performs better on the new task compared to training in isolation. This network also retains knowledge of the prior task, even though some forgetting occurs. Finally, we show that the benefits of fine-tuning hold when transferring simulated intersection handling knowledge to a real autonomous vehicle.
We analyze how the knowledge to autonomously handle one type of intersection, represented as a Deep Q-Network, translates to other types of intersections (tasks). We view intersection handling as a deep reinforcement learning problem, which approximates the state action Q function as a deep neural network. Using a traffic simulator, we show that directly copying a network trained for one type of intersection to another type of intersection decreases the success rate. We also show that when a network that is pre-trained on Task A and then is fine-tuned on a Task B, the resulting network not only performs better on the Task B than an network exclusively trained on Task A, but also retained knowledge on the Task A. Finally, we examine a lifelong learning setting, where we train a single network on five different types of intersections sequentially and show that the resulting network exhibited catastrophic forgetting of knowledge on previous tasks. This result suggests a need for a long-term memory component to preserve knowledge.
Each year, millions of motor vehicle traffic accidents all over the world cause a large number of fatalities, injuries and significant material loss. Automated Driving (AD) has potential to drastically reduce such accidents. In this work, we focus on the technical challenges that arise from AD in urban environments. We present the overall architecture of an AD system and describe in detail the perception and planning modules. The AD system, built on a modified Acura RLX, was demonstrated in a course in GoMentum Station in California. We demonstrated autonomous handling of 4 scenarios: traffic lights, cross-traffic at intersections, construction zones and pedestrians. The AD vehicle displayed safe behavior and performed consistently in repeated demonstrations with slight variations in conditions. Overall, we completed 44 runs, encompassing 110km of automated driving with only 3 cases where the driver intervened the control of the vehicle, mostly due to error in GPS positioning. Our demonstration showed that robust and consistent behavior in urban scenarios is possible, yet more investigation is necessary for full scale roll-out on public roads.