Abstract:We study exact-regenerating codes for entanglement-assisted distributed storage systems. Consider an $(n,k,d,α,β_{\mathsf{q}},B)$ distributed system that stores a file of $B$ classical symbols across $n$ nodes with each node storing $α$ symbols. A data collector can recover the file by accessing any $k$ nodes. When a node fails, any $d$ surviving nodes share an entangled state, and each of them transmits a quantum system of $β_{\mathsf{q}}$ qudits to a newcomer. The newcomer then performs a measurement on the received quantum systems to generate its storage. Recent work [1] showed that, under functional repair where the regenerated content may differ from that of the failed node, there exists a unique optimal regenerating point that \emph{simultaneously minimizes both storage $α$ and repair bandwidth $d β_{\mathsf{q}}$} when $d \geq 2k-2$. In this paper, we show that, under \emph{exact repair}, where the newcomer reproduces exactly the same content as the failed node, this optimal point remains achievable. Our construction builds on the classical product-matrix framework and the Calderbank-Shor-Steane (CSS)-based stabilizer formalism.
Abstract:We rethink the definition of privacy in multi-server, graph-replicated private information retrieval (PIR) systems, and introduce a novel setting where the user's privacy is governed by the servers' storage structure. In particular, while retrieving a message from a server, the user is concerned with hiding their desired message index from the server, only if the server stores the corresponding message. We coin this privacy requirement as local user privacy and the resulting PIR problem as local PIR on the graph. Our goal is to measure the gain in communication efficiency of local PIR, compared to that of canonical PIR, by establishing its capacity, i.e., the maximum number of message symbols retrieved, per downloaded symbol. To this end, we observe a remarkable gain in the local PIR capacity of graphs, that are disjoint union of distinct graphs, which is multiplicative, compared to the PIR capacity, when the individual graphs are identical. For connected graphs, we propose schemes to establish capacity lower bounds for edge-transitive and bipartite graphs, which are greater than the best-known PIR capacity bounds. Finally, we derive the exact local PIR capacity for the cyclic graph, and the path graph with an odd number of vertices.
Abstract:We reformulate the definition of privacy in the private information retrieval (PIR) problem to accommodate flexible privacy requirements. We focus on graph-replicated PIR, with a generalized privacy requirement, instead of requiring all messages to be private from all servers, during retrieval. Towards this, we define a privacy requirement set for each server, which can be an arbitrary subset of all message indices, as long as the stored message indices are in their privacy requirement set. Since both the storage and privacy requirement sets have many possibilities, we focus on two specific storage settings, namely the path and cyclic graphs. We consider several privacy settings for each of them, which are not necessarily the same, to give different examples for privacy sets. Of particular interest are the privacy sets that comprise the indices of messages stored at servers within a neighborhood range. The neighborhood range parameter allows a transition from the recently introduced local PIR [1] to the standard graph-replicated PIR. In these cases, we derive bounds on the capacity or find the exact capacity.
Abstract:We consider the storage problem in an asymmetric $X$-secure private information retrieval (A-XPIR) setting. The A-XPIR setting considers the $X$-secure PIR problem (XPIR) when a given arbitrary set of servers is communicating. We focus on the trade-off region between the average storage at the servers and the average download cost. In the case of $N=4$ servers and two non-overlapping sets of communicating servers with $K=2$ messages, we characterize the achievable region and show that the three main inequalities compared to the no-security case collapse to two inequalities in the asymmetric security case. In the general case, we derive bounds that need to be satisfied for the general achievable region for an arbitrary number of servers and messages. In addition, we provide the storage and retrieval scheme for the case of $N=4$ servers with $K=2$ messages and two non-overlapping sets of communicating servers, such that the messages are not replicated (in the sense of a coded version of each symbol) and at the same time achieve the optimal achievable rate for the case of replication. Finally, we derive the exact capacity for the case of asymmetric security and asymmetric collusion for $N=4$ servers, with the communication links $\{1,2\}$ and $\{3,4\}$, which splits the servers into two groups, i.e., $g=2$, and with the collusion links $\{1,3\}$, $\{2,4\}$, as $C=\frac{1}{3}$. More generally, we derive a capacity result for a certain family of asymmetric collusion and asymmetric security cases.
Abstract:An important part of the information theory folklore had been about the output statistics of codes that achieve the capacity and how the empirical distributions compare to the output distributions induced by the optimal input in the channel capacity problem. Results for a variety of such empirical output distributions of good codes have been known in the literature, such as the comparison of the output distribution of the code to the optimal output distribution in vanishing and non-vanishing error probability cases. Motivated by these, we aim to achieve similar results for the quantum codes that are used for classical communication, that is the setting in which the classical messages are communicated through quantum codewords that pass through a noisy quantum channel. We first show the uniqueness of the optimal output distribution, to be able to talk more concretely about the optimal output distribution. Then, we extend the vanishing error probability results to the quantum case, by using techniques that are close in spirit to the classical case. We also extend non-vanishing error probability results to the quantum case on block codes, by using the second-order converses for such codes based on hypercontractivity results for the quantum generalized depolarizing semi-groups.
Abstract:This work investigates the use of quantum resources in distributed storage systems. Consider an $(n,k,d)$ distributed storage system in which a file is stored across $n$ nodes such that any $k$ nodes suffice to reconstruct the file. When a node fails, any $d$ helper nodes transmit information to a newcomer to rebuild the system. In contrast to the classical repair, where helper nodes transmit classical bits, we allow them to send classical information over quantum channels to the newcomer. The newcomer then generates its storage by performing appropriate measurements on the received quantum states. In this setting, we fully characterize the fundamental tradeoff between storage and repair bandwidth (total communication cost). Compared to classical systems, the optimal storage--bandwidth tradeoff can be significantly improved with the enhancement of quantum entanglement shared only among the surviving nodes, particularly at the minimum-storage regenerating point. Remarkably, we show that when $d \geq 2k-2$, there exists an operating point at which \textit{both storage and repair bandwidth are simultaneously minimized}. This phenomenon breaks the tradeoff in the classical setting and reveals a fundamentally new regime enabled by quantum communication.
Abstract:We consider the problem of finding the asymptotic capacity of symmetric private information retrieval (SPIR) with $B$ Byzantine servers. Prior to finding the capacity, a definition for the Byzantine servers is needed since in the literature there are two different definitions. In \cite{byzantine_tpir}, where it was first defined, the Byzantine servers can send any symbol from the storage, their received queries and some independent random symbols. In \cite{unresponsive_byzantine_1}, Byzantine servers send any random symbol independently of their storage and queries. It is clear that these definitions are not identical, especially when \emph{symmetric} privacy is required. To that end, we define Byzantine servers, inspired by \cite{byzantine_tpir}, as the servers that can share everything, before and after the scheme initiation. In this setting, we find an upper bound, for an infinite number of messages case, that should be satisfied for all schemes that protect against this setting and develop a scheme that achieves this upper bound. Hence, we identify the capacity of the problem.
Abstract:We study a linear computation problem over a quantum multiple access channel (LC-QMAC), where $S$ servers share an entangled state and separately store classical data streams $W_1,\cdots, W_S$ over a finite field $\mathbb{F}_d$. A user aims to compute $K$ linear combinations of these data streams, represented as $Y = \mathbf{V}_1 W_1 + \mathbf{V}_2 W_2 + \cdots + \mathbf{V}_S W_S \in \mathbb{F}_d^{K \times 1}$. To this end, each server encodes its classical information into its local quantum subsystem and transmits it to the user, who retrieves the desired computations via quantum measurements. In this work, we propose an achievable scheme for LC-QMAC based on the stabilizer formalism and the ideas from entanglement-assisted quantum error-correcting codes (EAQECC). Specifically, given any linear computation matrix, we construct a self-orthogonal matrix that can be implemented using the stabilizer formalism. Also, we apply precoding matrices to minimize the number of auxiliary qudits required. Our scheme achieves more computations per qudit, i.e., a higher computation rate, compared to the best-known methods in the literature, and attains the capacity in certain cases.
Abstract:We consider the quantum \emph{symmetric} private information retrieval (QSPIR) problem in a system with $N$ databases and $K$ messages, with $U$ unresponsive servers, $T$-colluding servers, and $X$-security parameter, under several fundamental threat models. In the first model, there are $\mathcal{E}_1$ eavesdropped links in the uplink direction (the direction from the user to the $N$ servers), $\mathcal{E}_2$ eavesdropped links in the downlink direction (the direction from the servers to the user), where $|\mathcal{E}_1|, |\mathcal{E}_2| \leq E$; we coin this eavesdropper setting as \emph{dynamic} eavesdroppers. We show that super-dense coding gain can be achieved for some regimes. In the second model, we consider the case with Byzantine servers, i.e., servers that can coordinate to devise a plan to harm the privacy and security of the system together with static eavesdroppers, by listening to the same links in both uplink and downlink directions. It is important to note the considerable difference between the two threat models, since the eavesdroppers can take huge advantage of the presence of the Byzantine servers. Unlike the previous works in SPIR with Byzantine servers, that assume that the Byzantine servers can send only random symbols independent of the stored messages, we follow the definition of Byzantine servers in \cite{byzantine_tpir}, where the Byzantine servers can send symbols that can be functions of the storage, queries, as well as the random symbols in a way that can produce worse harm to the system. In the third and the most novel threat model, we consider the presence of Byzantine servers and dynamic eavesdroppers together. We show that having dynamic eavesdroppers along with Byzantine servers in the same system model creates more threats to the system than having static eavesdroppers with Byzantine servers.
Abstract:In a classification task, counterfactual explanations provide the minimum change needed for an input to be classified into a favorable class. We consider the problem of privately retrieving the exact closest counterfactual from a database of accepted samples while enforcing that certain features of the input sample cannot be changed, i.e., they are \emph{immutable}. An applicant (user) whose feature vector is rejected by a machine learning model wants to retrieve the sample closest to them in the database without altering a private subset of their features, which constitutes the immutable set. While doing this, the user should keep their feature vector, immutable set and the resulting counterfactual index information-theoretically private from the institution. We refer to this as immutable private counterfactual retrieval (I-PCR) problem which generalizes PCR to a more practical setting. In this paper, we propose two I-PCR schemes by leveraging techniques from private information retrieval (PIR) and characterize their communication costs. Further, we quantify the information that the user learns about the database and compare it for the proposed schemes.