Reliable navigation like expert human drivers in urban environments is a critical capability for autonomous vehicles. Traditional methods for autonomous driving are implemented with many building blocks from perception, planning and control, making them difficult to generalize to varied scenarios due to complex assumptions and interdependencies. In this paper, we develop an end-to-end trajectory generation method based on imitation learning. It can extract spatiotemporal features from the front-view camera images for scene understanding, then generate collision-free trajectories several seconds into the future. The proposed network consists of three sub-networks, which are selectively activated for three common driving tasks: keep straight, turn left and turn right. The experimental results suggest that under various weather and lighting conditions, our network can reliably generate trajectories in different urban environments, such as turning at intersections and slowing down for collision avoidance. Furthermore, by integrating the proposed network into a navigation system, good generalization performance is presented in an unseen simulated world for autonomous driving on different types of vehicles, such as cars and trucks.