



Given the recent success of Deep Learning applied to a variety of single tasks, it is natural to consider more human-realistic settings. Perhaps the most difficult of these settings is that of continual lifelong learning, where the model must learn online over a continuous stream of non-stationary data. A continual lifelong learning system must have three primary capabilities to succeed: it must learn and adapt over time, it must not forget what it has learned, and it must be efficient in both training time and memory. Recent techniques have focused their efforts largely on the first two capabilities while the third capability remains largely unexplored. In this paper, we consider the problem of efficient and effective storage of experiences over very large time-frames. In particular we consider the case where typical experiences are n bits and memories are limited to k bits for k << n. We present a novel scalable architecture and training algorithm in this challenging domain and provide an extensive evaluation of its performance. Our results show that we can achieve considerable gains on top of state-of-the-art methods such as GEM.