Learning graphical structure based on Directed Acyclic Graphs (DAGs) is a challenging problem, partly owing to the large search space of possible graphs. Recently, NOTEARS (Zheng et al., 2018) formulates the structure search problem as a continuous optimization task using the least squares objective and a proper characterization of DAGs. However, the formulation requires a hard DAG constraint and may lead to optimization difficulties. In this paper, we study the asymptotic roles of the sparsity and DAG constraints for learning DAG models in the linear Gaussian and non-Gaussian cases, and investigate their usefulness in the finite sample regime. Based on the theoretical results, we formulate a likelihood-based score function, and show that one only has to apply sparsity and DAG regularization terms to recover the underlying DAGs. This leads to an unconstrained optimization problem that is much easier to solve. Using gradient-based optimization and GPU acceleration, our procedure can easily handle thousand of nodes while retaining a high accuracy. Extensive experiments validate the effectiveness of our proposed method and show that the DAG-regularized likelihood objective is indeed favorable over the least squares one with the hard DAG constraint.