Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lianjie Cao

Hippocampus: An Efficient and Scalable Memory Module for Agentic AI

Feb 14, 2026

Yi Li, Lianjie Cao, Faraz Ahmed, Puneet Sharma, Bingzhe Li

Abstract:Agentic AI require persistent memory to store user-specific histories beyond the limited context window of LLMs. Existing memory systems use dense vector databases or knowledge-graph traversal (or hybrid), incurring high retrieval latency and poor storage scalability. We introduce Hippocampus, an agentic memory management system that uses compact binary signatures for semantic search and lossless token-ID streams for exact content reconstruction. Its core is a Dynamic Wavelet Matrix (DWM) that compresses and co-indexes both streams to support ultra-fast search in the compressed domain, thus avoiding costly dense-vector or graph computations. This design scales linearly with memory size, making it suitable for long-horizon agentic deployments. Empirically, our evaluation shows that Hippocampus reduces end-to-end retrieval latency by up to 31$\times$ and cuts per-query token footprint by up to 14$\times$, while maintaining accuracy on both LoCoMo and LongMemEval benchmarks.

Via

Access Paper or Ask Questions

Towards Easy and Realistic Network Infrastructure Testing for Large-scale Machine Learning

Apr 29, 2025

Jinsun Yoo, ChonLam Lao, Lianjie Cao, Bob Lantz, Minlan Yu, Tushar Krishna, Puneet Sharma

Figure 1 for Towards Easy and Realistic Network Infrastructure Testing for Large-scale Machine Learning

Figure 2 for Towards Easy and Realistic Network Infrastructure Testing for Large-scale Machine Learning

Figure 3 for Towards Easy and Realistic Network Infrastructure Testing for Large-scale Machine Learning

Abstract:This paper lays the foundation for Genie, a testing framework that captures the impact of real hardware network behavior on ML workload performance, without requiring expensive GPUs. Genie uses CPU-initiated traffic over a hardware testbed to emulate GPU to GPU communication, and adapts the ASTRA-sim simulator to model interaction between the network and the ML workload.

* Presented as a poster in NSDI 25

Via

Access Paper or Ask Questions