This paper highlights several properties of large urban networks that can have an impact on machine learning methods applied to traffic signal control. In particular, we show that the average network flow tends to be independent of the signal control policy as density increases. This property, which so far has remained under the radar, implies that deep reinforcement learning (DRL) methods becomes ineffective when trained under congested conditions, and might explain DRL's limited success for traffic signal control. Our results apply to all possible grid networks thanks to a parametrization based on two network parameters: the ratio of the expected distance between consecutive traffic lights to the expected green time, and the turning probability at intersections. Networks with different parameters exhibit very different responses to traffic signal control. Notably, we found that no control (i.e. random policy) can be an effective control strategy for a surprisingly large family of networks. The impact of the turning probability turned out to be very significant both for baseline and for DRL policies. It also explains the loss of symmetry observed for these policies, which is not captured by existing theories that rely on corridor approximations without turns. Our findings also suggest that supervised learning methods have enormous potential as they require very little examples to produce excellent policies.
This paper reviews machine learning methods for the motion planning of autonomous vehicles (AVs), with exclusive focus on the longitudinal behaviors and their impact on traffic congestion. An extensive survey of training data, model input/output, and learning methods for machine learning longitudinal motion planning (mMP) is first presented. Each of those major components is discussed and evaluated from the perspective of congestion impact. The emerging technologies adopted by leading AV giants like Waymo and Tesla are highlighted in our review. We find that: i) the AV industry has been focusing on the long tail problem caused by "corner errors" threatening driving safety, ii) none of the existing public datasets provides sufficient data under congestion scenarios, and iii) although alternative and more advanced learning methods are available in literature, the major mMP method adopted by industry is still behavior cloning (BC). The study also surveys the connections between mMP and traditional car-following (CF) models, and it reveals that: i) the model equivalence only exists in simple settings, ii) studies have shown mMP can significantly outperform CF models in long-term speed prediction, and iii) mMP's string stability remains intractable yet, which can only be analyzed by model approximation followed with numerical simulations. Future research needs are also identified in the end.