CSE5313 Coding and information theory for data science (Lecture 23)
Coded Computing
Motivation
Some facts:
- Moore’s law is saturating.
- Improving CPU performance is hard.
- Modern datasets are growing remarkably large.
- E.g., TikTok, YouTube.
- Learning tasks are computationally heavy.
- E.g., training neural networks.
Solution: Distributed Computing for Scalability
- Offloading computation tasks to multiple computation nodes.
- Gather and accumulate computation results.
- E.g., Apache Hadoop, Apache Spark, MapReduce.
General Framework
- The system involves 1 master node and worker nodes.
- The master has a dataset and wants , where is some function.
- The master partitions , and sends to node .
- Every node computes , where is somem function.
- Finally, the master collects and computes , where is some function.
Challenges
Stragglers
- Nodes that are significantly slower than the others.
Adversaries
- Nodes that return errounous results.
- Computation/communication errors.
- Adversarial attacks.
Privary
- Nodes may be curious about the dataset.
Resemblance to communication channel
Suppose , and let a message.
- is a field element
- could be or , .
Observation: This is a distributed storage system.
- An erasure - node that does not respond.
- An error - node that returns errounous results.
Solution:
- Add redundancy to the message
- Error-correcting codes.
Coded Distributed Computing
- The master partitions and encodes it before sending to workers.
- Workers perform computations on coded data and generate coded results .
- The master decode the coded results and obtain .
Outline
Matrix-Vector Multiplication
- MDS codes.
- Short-Dot codes.
Matrix-Matrix Multiplication
- Polynomial codes.
- MatDot codes.
Polynomial Evaluation
- Lagrange codes.
- Application to BLockchain.
Trivial solution - replication
Why no straggler tolerance?
- We employ an individual worker node to compute .
Replicate the computation?
- Let nodes compute every .
We need worker nodes to tolerate erasures and adversaries.
Use of MDS codes
Let and .
Let be submatrices of such that .
- Worker node 1 conputes .
- Worker node 2 conputes .
- Worker node 3 conputes .
Observation: the results can be obtained from any two worker nodes.
Let be the generator matrix of an MDS code.
The master node computes .
Every worker node computers .
- is the i-th row of .
Notice that is the codeword of .
Node computes an entry in this codeword.
response = entyr of the codeword.
The master does not need all workers to respond to obtain .
- The MDS property allows decoding from any ‘s
- This scheme tolerates erasures, and the recovery threshold .
- We need worker nodes to tolerate stragglers or adversaries.
- With replication, we need worker nodes.
Potential improvements for MDS codes
- The matrix is usually a (trained) model, and is the data (feature vector).
- is transmitted frequently, while the row of (or ) is communicated in advance.
- Every worker needs to receive the entire and compute the dot product.
- Communication-heavy
- Can we design a scheme that allows every node only receive only a part of ?
Short-Dot codes
We want to create a matrix from such that:
- Every node computes .
- Every rows linearly span the row space of .
- Each row of contains at most non-zero entries.
In the MDS method, .
- The recovery threshold .
- Every worker node needs to receive symbols (the entire x)
No free lunch
Can we trade the recovery threshold for a smaller ?
- Every worker node receives less than symbols.
- The master will need more than responses to recover the computation result.
Construction of Short-Dot codes
Choose a super-regular matrix , where is the number of worker nodes.
- A matrix is supper-regular if every square submatrix is invertible.
- Lagrange/Cauchy matrix is super-regular (next lecture).
Create matrix by stacking some below matrix .
Let .
Short-Dot: create matrix such that:
- Every rows linearly span the row space of .
- Each row of contains at most . non-zero entries (sparse).
Recovery of Short-Dot codes
Claim: Every rows of linearly span the row space of .
Proof
Since is supper-regular, it is also MDS, i.e., every submatrix of is invertible.
Hence, every row of can be represented as a linear combination of any rows of .
That is, for every , we can have .
What about the sparsity of ?
- Want each row of to be sparse.
Sparsity of Short-Dot codes
Build square matrix whose each row/column contains non-zero entries.
Concatenate such matrices and obtain
[Missing slides 18]
We now investigate what 𝑍 should look like to construct such a matrix 𝐹. • Recall that each column of 𝐹 must contains 𝐾 − 𝑀 zeros. – They are indexed by the set 𝒰 ∈ 𝑃 , where |𝒰| = 𝐾 − 𝑀. – Let 𝐵 𝒰 ∈ 𝔽 𝐾−𝑀 ×𝐾 be a submatrix of 𝐵 containing rows indexed by 𝒰. • Since 𝐹 = 𝐵𝐴ሚ , it follows that 𝐹𝑗 = 𝐵𝐴ሚ 𝑗 , where 𝐹𝑗 and 𝐴ሚ 𝑗 are the 𝑗-th column of 𝐹 and 𝐴ሚ. • Next, we have 𝐵 𝒰𝐴ሚ 𝑗 = 0 (𝐾−𝑀)×1. • Split 𝐵 𝒰 = [𝐵 1,𝑀 𝒰 ,𝐵[𝑀+1,𝐾] 𝒰 ], 𝐴ሚ 𝑗 = 𝐴𝑗 𝑇 , 𝑍𝑗 𝑇 𝑇 . • 𝐵 𝒰𝐴ሚ 𝑗 = 𝐵 1,𝑀 𝒰 𝐴𝑗 + 𝐵[𝑀+1,𝐾] 𝒰 𝑍𝑗 = 0 (𝐾−𝑀)×1. • 𝑍𝑗= −(𝐵 𝑀+1,𝐾 𝒰 ) −1 𝐵 1,𝑀 𝒰 𝐴𝑗 . • Note that 𝐵 𝑀+1,𝐾 𝒰 ∈ 𝔽 𝐾−𝑀 × 𝐾−𝑀 is invertible. – Since 𝐵 is super-regular.