Crowd counting in images is a widely explored but challenging task. Though recent convolutional neural network (CNN) methods have achieved great progress, it is still difficult to accurately count and even to precisely localize people in very dense regions. A major issue is that dense regions usually consist of many instances of small size, and thus exhibit very different density patterns compared with sparse regions. Localizing or detecting dense small objects is also very delicate. In this paper, instead of processing image pyramid and aggregating multi-scale features, we propose a simple yet effective Learning to Scale (L2S) module to cope with significant scale variations in both regression and localization. Specifically, L2S module aims to automatically scale dense regions into similar and reasonable scale levels. This alleviates the density pattern shift for density regression methods and facilitates the localization of small instances. Besides, we also introduce a novel distance label map combined with a customized adapted cross-entropy loss for precise person localization. Extensive experiments demonstrate that the proposed method termed AutoScale consistently improves upon state-of-the-art methods in both regression and localization benchmarks on three widely used datasets. The proposed AutoScale also demonstrates a noteworthy transferability under cross-dataset validation on different datasets.