Abstract:For many social, scientific, and commercial purposes, it is often important to estimate the distribution of the users' data regarding a sensitive attribute, e.g., their ages, locations, etc. To allow this estimation while protecting the users' privacy, every user applies a local privacy protection mechanism that releases a noisy (sanitized) version of their original datum to the data collector; then the original distribution is estimated using one of the known methods, such as the matrix inversion (INV), RAPPOR's estimator, and the iterative Bayesian update (IBU). Unlike the other estimators, the consistency of IBU, i.e., the convergence of its estimate to the real distribution as the amount of noisy data grows, has been either ignored or incorrectly proved in the literature. In this article, we use the fact that IBU is a maximum likelihood estimator to prove that IBU is consistent. We also show, through experiments on real datasets, that IBU significantly outperforms the other methods when the users' data are sanitized by geometric, Laplace, and exponential mechanisms, whereas it is comparable to the other methods in the case of the k-RR and RAPPOR mechanisms. Finally, we consider the case when the alphabet of the sensitive data is infinite, and we show a technique that allows IBU to operate in this case too.