Skip to Content
CSE5313CSE5313 Coding and information theory for data science (Lecture 23)

CSE5313 Coding and information theory for data science (Lecture 23)

Coded Computing

Motivation

Some facts:

  • Moore’s law is saturating.
    • Improving CPU performance is hard.
  • Modern datasets are growing remarkably large.
    • E.g., TikTok, YouTube.
  • Learning tasks are computationally heavy.
    • E.g., training neural networks.

Solution: Distributed Computing for Scalability

  • Offloading computation tasks to multiple computation nodes.
  • Gather and accumulate computation results.
  • E.g., Apache Hadoop, Apache Spark, MapReduce.

General Framework

  • The system involves 1 master node and PP worker nodes.
  • The master has a dataset DD and wants f(D)f(D), where ff is some function.
  • The master partitions D=(D1,,DP)D=(D_1,\cdots,D_P), and sends DiD_i to node ii.
  • Every node ii computes g(Di)g(D_i), where gg is somem function.
  • Finally, the master collects g(D1),,g(DP)g(D_1),\cdots,g(D_P) and computes f(D)=h(g(D1),,g(DP))f(D)=h(g(D_1),\cdots,g(D_P)), where hh is some function.

Challenges

Stragglers

  • Nodes that are significantly slower than the others.

Adversaries

  • Nodes that return errounous results.
    • Computation/communication errors.
    • Adversarial attacks.

Privary

  • Nodes may be curious about the dataset.

Resemblance to communication channel

Suppose f,g=idf,g=\operatorname{id}, and let D=(D1,,DP)FpD=(D_1,\cdots,D_P)\in \mathbb{F}^p a message.

  • DiD_i is a field element
  • F\mathbb{F} could be R\mathbb{R} or C\mathbb{C}, Fq\mathbb{F}^q.

Observation: This is a distributed storage system.

  • An erasure - node that does not respond.
  • An error - node that returns errounous results.

Solution:

  • Add redundancy to the message
  • Error-correcting codes.

Coded Distributed Computing

  • The master partitions DD and encodes it before sending to PP workers.
  • Workers perform computations on coded data D~\tilde{D} and generate coded results g(D~)g(\tilde{D}).
  • The master decode the coded results and obtain f(D)=h(g(D~))f(D)=h(g(\tilde{D})).

Outline

Matrix-Vector Multiplication

  • MDS codes.
  • Short-Dot codes.

Matrix-Matrix Multiplication

  • Polynomial codes.
  • MatDot codes.

Polynomial Evaluation

  • Lagrange codes.
  • Application to BLockchain.

Trivial solution - replication

Why no straggler tolerance?

  • We employ an individual worker node ii to compute yi=(ai,,aiN)(x1,,xN)Ty_i=(a_i,\ldots,a_{iN})\cdot (x_1,\ldots,x_N)^T.

Replicate the computation?

  • Let r+1r+1 nodes compute every yiy_i.

We need P=rM+MP=rM+M worker nodes to tolerate rr erasures and r2\lfloor \frac{r}{2}\rfloor adversaries.

Use of MDS codes

Let 2M2|M and P=3P=3.

Let A1,A2A_1,A_2 be submatrices of AA such that A=[A1A2]A=[A_1^\top|A_2^\top]^\top.

  • Worker node 1 conputes A1xA_1\cdot x.
  • Worker node 2 conputes A2xA_2\cdot x.
  • Worker node 3 conputes (A1+A2)x(A_1+A_2)\cdot x.

Observation: the results can be obtained from any two worker nodes.

Let GFM×PG\in \mathbb{F}^{M\times P} be the generator matrix of an (P,M)(P,M) MDS code.

The master node computes F=GAFP×NF=G^\top A\in \mathbb{F}^{P\times N}.

Every worker node ii computers FixF_i\cdot x.

  • Fi=(GA)iF_i=(G^\top A)_i is the i-th row of GAG^\top A.

Notice that Fx=GAx=GyFx=G^\top A\cdot x=G^\top y is the codeword of yy.

Node ii computes an entry in this codeword.

11 response = 11 entyr of the codeword.

The master does not need all workers to respond to obtain yy.

  • The MDS property allows decoding from any MM yiy_i‘s
  • This scheme tolerates PMP-M erasures, and the recovery threshold K=MK=M.
  • We need P=r+MP=r+M worker nodes to tolerate rr stragglers or r2\frac{r}{2} adversaries.
    • With replication, we need P=rM+MP=rM+M worker nodes.

Potential improvements for MDS codes

  • The matrix AA is usually a (trained) model, and xx is the data (feature vector).
  • xx is transmitted frequently, while the row of AA (or GAG^\top A) is communicated in advance.
  • Every worker needs to receive the entire xx and compute the dot product.
  • Communication-heavy
  • Can we design a scheme that allows every node only receive only a part of xx?

Short-Dot codes

link to paper 

We want to create a matrix FFP×MF\in \mathbb{F}^{P\times M} from AA such that:

  • Every node computes FixF_i\cdot x.
  • Every KK rows linearly span the row space of AA.
  • Each row of FF contains at most ss non-zero entries.

In the MDS method, F=GAF=G^\top A.

  • The recovery threshold K=MK=M.
  • Every worker node needs to receive s=Ns=N symbols (the entire x)

No free lunch

Can we trade the recovery threshold KK for a smaller ss?

  • Every worker node receives less than NN symbols.
  • The master will need more than MM responses to recover the computation result.

Construction of Short-Dot codes

Choose a super-regular matrix BFP×KB\in \mathbb{F}^{P\times K}, where PP is the number of worker nodes.

  • A matrix is supper-regular if every square submatrix is invertible.
  • Lagrange/Cauchy matrix is super-regular (next lecture).

Create matrix A~\tilde{A} by stacking some ZF(KM)×NZ\in \mathbb{F}^{(K-M)\times N} below matrix AA.

Let F=BA~FP×NF=B\cdot \tilde{A}\in \mathbb{F}^{P\times N}.

Short-Dot: create matrix FFP×NF\in \mathbb{F}^{P\times N} such that:

  • Every KK rows linearly span the row space of AA.
  • Each row of FF contains at most s=PK+MPs=\frac{P-K+M}{P}. NN non-zero entries (sparse).

Recovery of Short-Dot codes

Claim: Every KK rows of FF linearly span the row space of AA.

Proof

Since BB is supper-regular, it is also MDS, i.e., every K×KK\times K submatrix of BB is invertible.

Hence, every row of AA can be represented as a linear combination of any KK rows of FF.

That is, for every X[P],X=K\mathcal{X}\subseteq[P],|\mathcal{X}|=K, we can have A~=(BX)1FX\tilde{A}=(B^{\mathcal{X}})^{-1}F^{\mathcal{X}}.

What about the sparsity of FF?

  • Want each row of FF to be sparse.

Sparsity of Short-Dot codes

Build P×PP\times P square matrix whose each row/column contains PK+MP-K+M non-zero entries.

Concatenate NP\frac{N}{P} such matrices and obtain

[Missing slides 18]

We now investigate what 𝑍 should look like to construct such a matrix 𝐹. • Recall that each column of 𝐹 must contains 𝐾 − 𝑀 zeros. – They are indexed by the set 𝒰 ∈ 𝑃 , where |𝒰| = 𝐾 − 𝑀. – Let 𝐵 𝒰 ∈ 𝔽 𝐾−𝑀 ×𝐾 be a submatrix of 𝐵 containing rows indexed by 𝒰. • Since 𝐹 = 𝐵𝐴ሚ , it follows that 𝐹𝑗 = 𝐵𝐴ሚ 𝑗 , where 𝐹𝑗 and 𝐴ሚ 𝑗 are the 𝑗-th column of 𝐹 and 𝐴ሚ. • Next, we have 𝐵 𝒰𝐴ሚ 𝑗 = 0 (𝐾−𝑀)×1. • Split 𝐵 𝒰 = [𝐵 1,𝑀 𝒰 ,𝐵[𝑀+1,𝐾] 𝒰 ], 𝐴ሚ 𝑗 = 𝐴𝑗 𝑇 , 𝑍𝑗 𝑇 𝑇 . • 𝐵 𝒰𝐴ሚ 𝑗 = 𝐵 1,𝑀 𝒰 𝐴𝑗 + 𝐵[𝑀+1,𝐾] 𝒰 𝑍𝑗 = 0 (𝐾−𝑀)×1. • 𝑍𝑗= −(𝐵 𝑀+1,𝐾 𝒰 ) −1 𝐵 1,𝑀 𝒰 𝐴𝑗 . • Note that 𝐵 𝑀+1,𝐾 𝒰 ∈ 𝔽 𝐾−𝑀 × 𝐾−𝑀 is invertible. – Since 𝐵 is super-regular.

Last updated on