With data durability, high access speed, low power efficiency and byte addressability, NVMe and SSD, which are acknowledged representatives of emerging storage technologies, have been applied broadly in many areas. However, one key issue with high-performance adoption of these technologies is how to properly define intelligent cache layers such that the performance gap between emerging technologies and main memory can be well bridged. To this end, we propose Phoebe, a reuse-aware reinforcement learning framework for the optimal online caching that is applicable for a wide range of emerging storage models. By continuous interacting with the cache environment and the data stream, Phoebe is capable to extract critical temporal data dependency and relative positional information from a single trace, becoming ever smarter over time. To reduce training overhead during online learning, we utilize periodical training to amortize costs. Phoebe is evaluated on a set of Microsoft cloud storage workloads. Experiment results show that Phoebe is able to close the gap of cache miss rate from LRU and a state-of-the-art online learning based cache policy to the Belady's optimal policy by 70.3% and 52.6%, respectively.