CSE5313 Coding and information theory for data science (Lecture 20)
Review for Private Information Retrieval
PIR from replicated databases
For 2 replicated databases, we have the following protocol:
- User has .
- User chooses .
- Two queries to each server:
- , .
- , .
- Server responds with and .
- Decoding?
- .
- .
PIR-rate is .
PIR from coded parity-check databases
For 3 coded parity-check databases, we have the following protocol:
- User has .
- User chooses .
- Three queries to each server:
- , , .
- , , .
- , , .
- Server responds with .
- Decoding?
- .
- .
- .
PIR-rate is .
Beyond z=1
Star-product theme
Given , over , the star-product is defined as:
Given two linear codes, , the star-product code is defined as:
Singleton bound for star-product:
PIR form a database coded with any MDS code and z>1
To generalize the previous scheme to need to encode multiple ‘s together.
- As in the ramp scheme.
Recall from the ramp scheme, we use as our key vector to avoid occlusion of the servers.
In the star-product scheme:
- Files are coded with an MDS code .
- The multiple ‘s are coded with an MDS code .
- The scheme is based on the minimum distance of .
To code the data:
- Let be an MDS code of dimension .
- For all , encode file using :
- For all , store (a column of the above matrix) in server .
Let .
To code the queries:
- Let be an MDS code of dimension .
- Encode the ‘s using .
To introduce the “errors in known locations” to the encoded ‘s:
- Let with some entries in its -th row equal to 1.
- These are the entries we will retrieve.
For every server send , where is the -th column of .
- This is similar to ramp scheme, where is the “message”.
- Privacy against collusion of servers.
Response from server: .
Decoding? Let be a matrix whose columns are the ‘s.
- The user has
where is a row of and is a codeword in (an MDS code).
We have:
- with some entries in its -th row equal to 1.
- Each is a row of
- For , is a codeword in
- Therefore:
Multiply by and get elements of .
- Recall that
- Repeat times to obtain elements of .
- Suffices to obtain , since is MDS code.
PIR-rate:
- = \frac{k}{# \text{ downloaded elements}} = \frac{k}{\frac{k}{d_{C \star D} - 1} \cdot n} = \frac{d_{C \star D} - 1}{n}
- Singleton bound for star-product: .
- Achieved with equality if and are Reed-Solomon codes.
- PIR-rate = .
- Intuition:
- “paying” for “reconstruction from any ”.
- “paying” for “protection against colluding sets of size ”.
- Capacity unknown! (as of 2022).
- Known for special cases, e.g., , certain types of schemes, etc.
PIR over graphs
Graph-based replication:
- Every file is replicated twice on two separate servers.
- Every two servers have at most one file in common.
- “file” = “granularity” of data, i.e., the smallest information unit shared by any two servers.
A server that stores receives , and replies with .
The idea:
- Consider a 2-server replicated PIR and “split” the queries between the servers.
- Sum the responses, unwanted files “cancel out”, while does not.
Problem: Collusion.
Solution: Add per server randomness.
Good for any graph, and any (for simplicity assume ).
The protocol:
- Choose random , , and .
- Queries:
- If node is incident with edge , send to node .
- I.e., if server stores file .
- Except one node that stores , which gets .
- Server responds with .
- Where x_{i, d}$ are the files adjacent with it.
Example
- Consider the following graph.
- .
- and .
- and .

Correctness:
- find .
Parameters:
- Storage overhead 2 (for any graph).
- Download .
- PIR rate 1/n.
Collusion resistance:
1-privacy: Each node sees an entirely random vector.
2-privacy:
- If no edge – as for 1-privacy.
- If edge exists – E.g.,
- and are independent.
- and are independent.
S-privacy:
- Let (e.g., ), and consider the query matrix of their mutual files:
- It can be shown that , regardless of perfect privacy.