Long-tail and rare event problems become crucial when autonomous driving algorithms are applied in the real world. For the purpose of evaluating systems in challenging settings, we propose a generative framework to create safety-critical scenarios for evaluating specific task algorithms. We first represent the traffic scenarios with a series of autoregressive building blocks and generate diverse scenarios by sampling from the joint distribution of these blocks. We then train the generative model as an agent (or a generator) to investigate the risky distribution parameters for a given driving algorithm being evaluated. We regard the task algorithm as an environment (or a discriminator) that returns a reward to the agent when a risky scenario is generated. Through the experiments conducted on several scenarios in the simulation, we demonstrate that the proposed framework generates safety-critical scenarios more efficiently than grid search or human design methods. Another advantage of this method is its adaptiveness to the routes and parameters.