Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Atsuki Sato

Optimized Learned Count-Min Sketch

Dec 13, 2025

Kyosuke Nishishita, Atsuki Sato, Yusuke Matsui

Abstract:Count-Min Sketch (CMS) is a memory-efficient data structure for estimating the frequency of elements in a multiset. Learned Count-Min Sketch (LCMS) enhances CMS with a machine learning model to reduce estimation error under the same memory usage, but suffers from slow construction due to empirical parameter tuning and lacks theoretical guarantees on intolerable error probability. We propose Optimized Learned Count-Min Sketch (OptLCMS), which partitions the input domain and assigns each partition to its own CMS instance, with CMS parameters analytically derived for fixed thresholds, and thresholds optimized via dynamic programming with approximate feasibility checks. This reduces the need for empirical validation, enabling faster construction while providing theoretical guarantees under these assumptions. OptLCMS also allows explicit control of the allowable error threshold, improving flexibility in practice. Experiments show that OptLCMS builds faster, achieves lower intolerable error probability, and matches the estimation accuracy of LCMS.

* 4 pages, 3 figures. Accepted at NeurIPS 2025 Workshop on Machine Learning for Systems

Via

Access Paper or Ask Questions

Cascaded Learned Bloom Filter for Optimal Model-Filter Size Balance and Fast Rejection

Feb 06, 2025

Atsuki Sato, Yusuke Matsui

Figure 1 for Cascaded Learned Bloom Filter for Optimal Model-Filter Size Balance and Fast Rejection

Figure 2 for Cascaded Learned Bloom Filter for Optimal Model-Filter Size Balance and Fast Rejection

Figure 3 for Cascaded Learned Bloom Filter for Optimal Model-Filter Size Balance and Fast Rejection

Figure 4 for Cascaded Learned Bloom Filter for Optimal Model-Filter Size Balance and Fast Rejection

Abstract:Recent studies have demonstrated that learned Bloom filters, which combine machine learning with the classical Bloom filter, can achieve superior memory efficiency. However, existing learned Bloom filters face two critical unresolved challenges: the balance between the machine learning model size and the Bloom filter size is not optimal, and the reject time cannot be minimized effectively. We propose the Cascaded Learned Bloom Filter (CLBF) to address these issues. Our dynamic programming-based optimization automatically selects configurations that achieve an optimal balance between the model and filter sizes while minimizing reject time. Experiments on real-world datasets show that CLBF reduces memory usage by up to 24% and decreases reject time by up to 14 times compared to state-of-the-art learned Bloom filters.

Via

Access Paper or Ask Questions