In augmented reality (AR), correct and precise estimations of user's visual fixations and head movements can enhance the quality of experience by allocating more computation resources for the analysing, rendering and 3D registration on the areas of interest. However, there is no research about understanding the visual exploration of users when using an AR system or modeling AR visual attention. To bridge the gap between the real-world scene and the scene augmented by virtual information, we construct the ARVR saliency dataset with 100 diverse videos evaluated by 20 people. The virtual reality (VR) technique is employed to simulate the real-world, and annotations of object recognition and tracking as augmented contents are blended into the omnidirectional videos. Users can get the sense of experiencing AR when watching the augmented videos. The saliency annotations of head and eye movements for both original and augmented videos are collected which constitute the ARVR dataset.