We propose a new end-to-end trainable model for lossy image compression which includes a number of novel components. This approach incorporates 1) a hierarchical auto-regressive model; 2)it also incorporates saliency in the images and focuses on reconstructing the salient regions better; 3) in addition, we empirically demonstrate that the popularly used evaluations metrics such as MS-SSIM and PSNR are inadequate for judging the performance of deep learned image compression techniques as they do not align well with human perceptual similarity. We, therefore propose an alternative metric, which is learned on perceptual similarity data specific to image compression. Our experiments show that this new metric aligns significantly better with human judgments when compared to other hand-crafted or learned metrics. The proposed compression model not only generates images that are visually better but also gives superior performance for subsequent computer vision tasks such as object detection and segmentation when compared to other engineered or learned codecs.