CSE5313 Coding and information theory for data science (Lecture 9)
Explicit optimal codes
Explicit optimal codes?
- Singleton, Sphere-packing provide restrictions.
- Gilbert-Varshamov provides existence.
Are there explicit optimal codes? That is,
- Easily (polynomial time) encodable, decodable.
Yes! This lecture:
– Gustave Solomon [1930-1996] (Reed-Solomon code) – Irving S. Reed [1923-2012]. – David E. Muller [1924-2008] (Reed-Muller code)
Using Polynomials over
Reed-Solomon code
The fundamental theorem of algebra:
A polynomial of degree has at most roots.
We have two equivalent definitions of a Reed-Solomon code:
- As polynomial evaluations.
- As linear codes (from generator matrix)
Efficient encoding (as linear codes)
Efficient decoding (use Euclidean algorithm)
Definition of Reed-Solomon code from polynomial evaluations
We assume .
Every codeword corresponds to a polynomial of degree at most .
Let ( for all , ).
Fix distinct .
Definition of Reed-Solomon code
A Reed-Solomon code is .
- In words, the set of all evaluations at of polynomials of degree at most .
Example of Reed-Solomon code
Let , , .
Here .
Proposition: Reed-Solomon code is a linear code
A Reed-Solomon code is is a linear code.
Proof
First the code is closed under addition.
Let , then .
Then the code is closed under scalar multiplication.
Let , , then .
The dimension of the code is .
Corollary: The Reed-Solomon code attains the Singleton bound with equality
The Reed-Solomon code has minimum distance .
Proof
Let and .
Since , and is the minimum distance of the code
Let .
By the lemma for minimum distance, we have where .
So is the number of zeros (root) of the polynomial .
So if has more than roots, then .
So , .
Which is the Singleton bound.
Definition of Reed-Solomon code from generator matrix
Every Reed-Solomon code is of the form for some .
Observer that the evaluation map is a linear map.
So, every code word can be constructed by
The generator matrix for Reed-Solomon code is a Vandermonde matrix .
Fact: is invertible if and only if are distinct. (that’s how we choose )
The parity check matrix for Reed-Solomon code is also a Vandermonde matrix with scalar multiples of the columns.
Some technical lemmas:
Let and be the generator and parity-check matrices of (any) linear code . Then:
I. Then . II. Any matrix such that and is a parity-check matrix for (i.e. ).
Reed-Muller code
Reed-Solomon codes: Evaluations of univariate polynomials of deg ≤ .
Reed-Muller codes: Evaluations of multivariate polynomials of deg
Example:
This is a degree 4 polynomial.
Usually we use for binary codes.
So
Definition of Reed-Muller code (binary case)
Facts:
- Length .
- Minimum distance (not shown).
- Dimension = # of free coefficients in a multilinear polynomial of degree at most .
- Dimension = # of subsets of of size at most
- Dimension =
Exercises: Show that
- Parity code.
- Extended Hamming code.
- Augmented Hadamard.
Coding for storage
Requirements/Challenges in Storage Systems
- Challenge 1: Reconstruction.
- The data collector must be able to reconstruct the file, even if some are nonresponsive.
- Minimize reconstruction bandwidth.
- The data collector must be able to reconstruct the file, even if some are nonresponsive.
- Challenge 2: Repair.
- The system must maintain data consistency.
- Failed servers must be repaired:
- By contacting few other servers (locality, due to geographical constraints).
- By minimizing bandwidth.
- Challenge 3: Storage overhead.
- Minimize space consumption.
- Minimize redundancy.
- Minimize space consumption.
Naive solution: Replication
Fragment the file .
- Size of = Whatever fits in a storage server.
Hold copies of each .
- I.e., servers in the system.
Storage overhead?
- .
Repair?
- fails
- failures is lost data.
Reconstruction?
- Possible if any servers fail.
- Impossible for some failures.
Use codes to improve storage efficiency
Reconstruction?
-
Lecture 1: If , every pattern of at most erasures is recoverable.
-
Idea: Treat unavailable servers as erasures.
Is this better/worse than replication?
- Say we wish to reconstruct from any servers.
- What would be the redundancy in replication vs. coding?
Coding:
- Can reconstruct file from any servers.
- Resulting overhead (constant!).
Replication:
To reconstruct from any servers, need
Repair?
- Need low locality (repair by contacting few other servers).
- Need low bandwidth (repair by downloading as few bits as possible).
Repair in a replicated system:
- fails reconstruct from a different copy.
- Locality 1.
- Optimal bandwidth.
Repair in a coded system:
- repair one Reconstruct the entire file.
- Locality , high bandwidth.
- Much worse than replication.
New coding challenges: Minimize locality and bandwidth