Sensor-based guidance is required for long-range platforms. To bypass the structural limitation of classical registration on reference image framework, we offer in this paper to encode a stack of images of the scene into a deep network. Relying on a stack is showed to be relevant on bimodal scene (e.g. when the scene can or can not be snowy).