Skip to Content
CSE5313CSE5313 Coding and information theory for data science (Lecture 24)

CSE5313 Coding and information theory for data science (Lecture 24)

Continue on coded computing

Coded computing scheme

Matrix-vector multiplication: y=Axy=Ax, where AFM×N,xFNA\in \mathbb{F}^{M\times N},x\in \mathbb{F}^N

  • MDS codes.
    • Recover threshold K=MK=M.
  • Short-dot codes.
    • Recover threshold KMK\geq M.
    • Every node receives at most s=PK+MPs=\frac{P-K+M}{P}. NN elements of xx.

Matrix-matrix multiplication

Problem Formulation:

  • A=[A0A1AM1]FL×LA=[A_0 A_1\ldots A_{M-1}]\in \mathbb{F}^{L\times L}, B=[B0,B1,,BM1]FL×LB=[B_0,B_1,\ldots,B_{M-1}]\in \mathbb{F}^{L\times L}
  • Am,BmA_m,B_m are submatrices of A,BA,B.
  • We want to compute C=ABC=A^\top B.

Trivial solution:

  • Index each worker node by m,n[0,M1]m,n\in [0,M-1].
  • Worker node (m,n)(m,n) performs matrix multiplication AmBnA_m^\top\cdot B_n.
  • Need P=M2P=M^2 nodes.
  • No erasure tolerance.

Can we do better?

1-D MDS Method

Create [A~0,A~1,,A~S1][\tilde{A}_0,\tilde{A}_1,\ldots,\tilde{A}_{S-1}] by encoding [A0,A1,,AM1][A_0,A_1,\ldots,A_{M-1}]. with some (S,M)(S,M) MDS code.

Need P=SMP=SM worker nodes, and index each one by s[0,S1],n[0,M1]s\in [0,S-1], n\in [0,M-1].

Worker node (s,n)(s,n) performs matrix multiplication A~sBn\tilde{A}_s^\top\cdot B_n.

[A0A1A0+A1][B0B1]\begin{bmatrix} A_0^\top\\ A_1^\top\\ A_0^\top+A_1^\top \end{bmatrix} \begin{bmatrix} B_0 & B_1 \end{bmatrix}

Need SMS-M responses from each column.

The recovery threshold K=PS+MK=P-S+M nodes.

This is trivially parity check code with 1 recovery threshold.

2-D MDS Method

Encode [A0,A1,,AM1][A_0,A_1,\ldots,A_{M-1}] with some (S,M)(S,M) MDS code.

Encode [B0,B1,,BM1][B_0,B_1,\ldots,B_{M-1}] with some (S,M)(S,M) MDS code.

Need P=S2P=S^2 nodes.

[A0A1A0+A1][B0B1B0+B1]\begin{bmatrix} A_0^\top\\ A_1^\top\\ A_0^\top+A_1^\top \end{bmatrix} \begin{bmatrix} B_0 & B_1 & B_0+B_1 \end{bmatrix}

Decodability depends on the pattern.

  • Consider an S×SS\times S bipartite graph (rows on left, columns on right).
  • Draw an (i,j)(i,j) edge if A~iB~j\tilde{A}_i^\top\cdot \tilde{B}_j is missing
  • Row ii is decodable if and only if the degree of ii‘th left node SM\leq S-M.
  • Column jj is decodable if and only if the degree of jj‘th right node SM\leq S-M.

Peeling algorithm:

  • Traverse the graph.
  • If v\exists v,degvSM\deg v\leq S-M, remove edges.
  • Repeat.

Corollary:

  • A pattern is decodable if and only if the above graph does not contain a subgraph with all degree larger than SMS-M.
Note
  1. K1DMDS=PS+M=Θ(P)K_{1D-MDS}=P-S+M=\Theta(P) (linearly)
  2. K2DMDS=P(SM+1)2+1K_{2D-MDS}=P-(S-M+1)^2+1.
    • Consider S×SS\times S bipartite graph with (SM+1)×(SM+1)(S-M+1)\times (S-M+1) complete subgraph.
    • There exists subgraph with all degrees larger than SM    S-M\implies not decodable.
    • On the other hand: Fewer than (SM+1)2(S-M+1)^2 edges cannot form a subgraph with all degrees >SM>S-M.
    • KK scales sub-linearly with PP.
  3. Kproduct<PM2=S2M2=Θ(P)K_{product}<P-M^2=S^2-M^2=\Theta(\sqrt{P})

Our goal is to get rid of PP.

Polynomial codes

Polynomial representation

Coefficient representation of a polynomial:

  • f(x)=fdxd+fd1xd1++f1x+f0f(x)=f_dx^d+f_{d-1}x^{d-1}+\cdots+f_1x+f_0
  • Uniquely defined by coefficients [fd,fd1,,f0][f_d,f_{d-1},\ldots,f_0].

Value presentation of a polynomial:

  • Theorem: A polynomial of degree dd is uniquely determined by d+1d+1 points.
  • Proof Outline: First create a polynomial of degree dd from the d+1d+1 points using Lagrange interpolation, and show such polynomial is unique.
  • Uniquely defined by evaluations [(α1,f(α1)),,(αd,f(αd))][(\alpha_1,f(\alpha_1)),\ldots,(\alpha_{d},f(\alpha_{d}))]

Why should we want value representation?

  • With coefficient representation, polynomial product takes O(d2)O(d^2) multiplications.
  • With value representation, polynomial product takes 2d+12d+1 multiplications.

Definition of a polynomial code

link to paper 

Problem formulation:

A=[A0,A1,,AM1]FL×L,B=[B0,B1,,BM1]FL×LA=[A_0,A_1,\ldots,A_{M-1}]\in \mathbb{F}^{L\times L}, B=[B_0,B_1,\ldots,B_{M-1}]\in \mathbb{F}^{L\times L}

We want to compute C=ABC=A^\top B.

Define matrix polynomials:

pA(x)=i=0M1Aixip_A(x)=\sum_{i=0}^{M-1} A_i x^i, degree M1M-1

pB(x)=i=0M1BixiMp_B(x)=\sum_{i=0}^{M-1} B_i x^{iM}, degree M(M1)M(M-1)

where each Ai,BiA_i,B_i are matrices

We have

h(x)=pA(x)pB(x)=i=0M1j=0M1AiBjxi+jMh(x)=p_A(x)p_B(x)=\sum_{i=0}^{M-1}\sum_{j=0}^{M-1} A_i B_j x^{i+jM}

degh(x)M(M1)+M1=M21\deg h(x)\leq M(M-1)+M-1=M^2-1

Observe that

xi1+j1M=xi2+j2Mx^{i_1+j_1M}=x^{i_2+j_2M}

if and only if m1=n1m_1=n_1 and m2=n2m_2=n_2.

The coefficient of xi+jMx^{i+jM} is AiBjA_i^\top B_j.

Computing C=ABC=A^\top B is equivalent to find the coefficient representation of h(x)h(x).

Encoding of polynomial codes

The master choose ω0,ω1,,ωP1F\omega_0,\omega_1,\ldots,\omega_{P-1}\in \mathbb{F}.

  • Note that this requires FP|\mathbb{F}|\geq P.

For every node i[0,P1]i\in [0,P-1], the master computes A~i=pA(ωi)\tilde{A}_i=p_A(\omega_i)

  • Equivalent to multiplying [A0,A1,,AM1][A_0^\top,A_1^\top,\ldots,A_{M-1}^\top] by Vandermonde matrix over ω0,ω1,,ωP1\omega_0,\omega_1,\ldots,\omega_{P-1}.
  • Can be speed up using FFT.

Similarly, the master computes B~i=pB(ωi)\tilde{B}_i=p_B(\omega_i) for every node i[0,P1]i\in [0,P-1].

Every node i[0,P1]i\in [0,P-1] computes and returns ci=pA(ωi)pB(ωi)c_i=p_A(\omega_i)p_B(\omega_i) to the master.

cic_i is the evaluation of polynomial h(x)=pA(x)pB(x)h(x)=p_A(x)p_B(x) at ωi\omega_i.

Recall that h(x)=i=0M1j=0M1AiBjxi+jMh(x)=\sum_{i=0}^{M-1}\sum_{j=0}^{M-1} A_i^\top B_j x^{i+jM}.

  • Computing C=ABC=A^\top B is equivalent to finding the coefficient representation of h(x)h(x).

Recall that a polynomial of degree dd can be uniquely defined by d+1d+1 points.

  • With MNMN evaluations of h(x)h(x), we can recover the coefficient representation for polynomial h(x)h(x).

The recovery threshold K=M2K=M^2, independent of PP, the number of worker nodes.

Done.

MatDot Codes

link to paper 

Problem formulation:

  • We want to compute C=ABC=A^\top B.

  • Unlike polynomial codes, we let A=[A0A1AM1]A=\begin{bmatrix} A_0\\ A_1\\ \vdots\\ A_{M-1} \end{bmatrix} and B=[B0B1BM1]B=\begin{bmatrix} B_0\\ B_1\\ \vdots\\ B_{M-1} \end{bmatrix}. And A,BFL×LA,B\in \mathbb{F}^{L\times L}.

  • In polynomial codes, A=[A0A1AM1]A=\begin{bmatrix} A_0 A_1\ldots A_{M-1} \end{bmatrix} and B=[B0B1BM1]B=\begin{bmatrix} B_0 B_1\ldots B_{M-1} \end{bmatrix}.

Key observation:

AmA_m^\top is an L×LML\times \frac{L}{M} matrix, and BmB_m is an LM×L\frac{L}{M}\times L matrix. Hence, AmBmA_m^\top B_m is an L×LL\times L matrix.

Let C=AB=m=0M1AmBmC=A^\top B=\sum_{m=0}^{M-1} A_m^\top B_m.

Let pA(x)=m=0M1Amxmp_A(x)=\sum_{m=0}^{M-1} A_m x^m, degree M1M-1.

Let pB(x)=m=0M1Bmxmp_B(x)=\sum_{m=0}^{M-1} B_m x^m, degree M1M-1.

Both have degree M1M-1.

And h(x)=pA(x)pB(x)h(x)=p_A(x)p_B(x).

degh(x)M1+M1=2M2\deg h(x)\leq M-1+M-1=2M-2

Key observation:

  • The coefficient of the term xM1x^{M-1} in h(x)h(x) is m=0M1AmBm\sum_{m=0}^{M-1} A_m^\top B_m.

Recall that C=AB=m=0M1AmBmC=A^\top B=\sum_{m=0}^{M-1} A_m^\top B_m.

Finding this coefficient is equivalent to finding the result of ABA^\top B.

Here we sacrifice the bandwidth of the network for the computational power.

General Scheme for MatDot Codes

The master choose ω0,ω1,,ωP1F\omega_0,\omega_1,\ldots,\omega_{P-1}\in \mathbb{F}.

  • Note that this requires FP|\mathbb{F}|\geq P.

For every node i[0,P1]i\in [0,P-1], the master computes A~i=pA(ωi)\tilde{A}_i=p_A(\omega_i) and B~i=pB(ωi)\tilde{B}_i=p_B(\omega_i).

  • pA(x)=m=0M1Amxmp_A(x)=\sum_{m=0}^{M-1} A_m x^m, degree M1M-1.
  • pB(x)=m=0M1Bmxmp_B(x)=\sum_{m=0}^{M-1} B_m x^m, degree M1M-1.

The master sends A~i,B~i\tilde{A}_i,\tilde{B}_i to node ii.

Every node i[0,P1]i\in [0,P-1] computes and returns ci=pA(ωi)pB(ωi)c_i=p_A(\omega_i)p_B(\omega_i) to the master.

The master needs degh(x)+1=2M1\deg h(x)+1=2M-1 evaluations to obtain h(x)h(x).

  • The recovery threshold is K=2M1K=2M-1

Recap on Matrix-Matrix multiplication

A,BFL×LA,B\in \mathbb{F}^{L\times L}, we want to compute C=ABC=A^\top B with PP nodes.

Every node receives 1m\frac{1}{m} of AA and 1m\frac{1}{m} of BB.

CodeRecovery threshold KK
1D-MDSΘ(P)\Theta(P)
2D-MDSΘ(P)\leq \Theta(\sqrt{P})
Polynomial codesΘ(M2)\Theta(M^2)
MatDot codesΘ(M)\Theta(M)

Polynomial Evaluation

Problem formulation:

  • We have KK datasets X1,X2,,XKX_1,X_2,\ldots,X_K.
  • Want to compute some polynomial function ff of degree dd on each dataset.
    • Want f(X1),f(X2),,f(XK)f(X_1),f(X_2),\ldots,f(X_K).
  • Examples:
    • X1,X2,,XKX_1,X_2,\ldots,X_K are points in FM×M\mathbb{F}^{M\times M}, and f(X)=X8+3X2+1f(X)=X^8+3X^2+1.
    • Xk=(Xk(1),Xk(2))X_k=(X_k^{(1)},X_k^{(2)}), both in FM×M\mathbb{F}^{M\times M}, and f(X)=Xk(1)Xk(2)f(X)=X_k^{(1)}X_k^{(2)}.
    • Gradient computation.

PP worker nodes:

  • Some are stragglers, i.e., not responsive.
  • Some are adversaries, i.e., return erroneous results.
  • Privacy: We do not want to expose datasets to worker nodes.

Replication code

Suppose P=(r+1)KP=(r+1)\cdot K.

  • Partition the PP nodes to KK groups of size r+1r+1 each.
  • Node in group ii computes and returns f(Xi)f(X_i) to the master.
  • Replication tolerates rr stragglers, or r2\lfloor \frac{r}{2} \rfloor adversaries.

Linear codes

Recall previous linear computations (matrix-vector):

  • [A~1,A~2,A~3]=[A1,A2,A1+A2][\tilde{A}_1,\tilde{A}_2,\tilde{A}_3]=[A_1,A_2,A_1+A_2] is the corresponding codeword of [A1,A2][A_1,A_2].
  • Every worker node ii computes f(A~i)=A~ixf(\tilde{A}_i)=\tilde{A}_i x.
  • [A~1x,A~2x,A~3x]=[A1x,A2x,A1x+A2x][\tilde{A}_1x, \tilde{A}_2x, \tilde{A}_3x]=[A_1x,A_2x,A_1x+A_2x] is the corresponding codeword of [A1x,A2x][A_1x,A_2x].
    • This enables to decode [A1x,A2x][A_1x,A_2x] from [A~1x,A~2x,A~3x][\tilde{A}_1x,\tilde{A}_2x,\tilde{A}_3 x].

However, ff is a polynomial of degree dd, not a linear transformation unless d=1d=1.

  • f(cX)cf(X)f(cX)\neq cf(X), where cc is a constant.
  • f(X1+X2)f(X1)+f(X2)f(X_1+X_2)\neq f(X_1)+f(X_2).
Caution

[f(X~1),f(X~2),,f(X~K)][f(\tilde{X}_1),f(\tilde{X}_2),\ldots,f(\tilde{X}_K)] is not the codeword corresponding to [f(X1),f(X2),,f(XK)][f(X_1),f(X_2),\ldots,f(X_K)] in any linear code.

Our goal is to create an encoder/decode such that:

  • Linear encoding: is the codeword of [X1,X2,,XK][X_1,X_2,\ldots,X_K] for some linear code.
    • i.e., [X~1,X~2,,X~K]=[X1,X2,,XK]G[\tilde{X}_1,\tilde{X}_2,\ldots,\tilde{X}_K]=[X_1,X_2,\ldots,X_K]G for some generator matrix GG.
    • Every X~i\tilde{X}_i is some linear combination of X1,,XKX_1,\ldots,X_K.
  • The f(Xi)f(X_i) are decodable from some subset of f(X~i)f(\tilde{X}_i)‘s.
    • Some of coded results are missing, erroneous.
  • XiX_i’s are kept private.

Lagrange Coded Computing

Let (z)\ell(z) be a polynomial whose evaluations at ω1,,ωK\omega_1,\ldots,\omega_{K} are X1,,XKX_1,\ldots,X_K.

  • That is, (ωi)=Xi\ell(\omega_i)=X_i for every ωiF,i[K]\omega_i\in \mathbb{F}, i\in [K].

Some example constructions:

Given X1,,XKX_1,\ldots,X_K with corresponding ω1,,ωK\omega_1,\ldots,\omega_K

  • (z)=i=1KXiLi(z)\ell(z)=\sum_{i=1}^K X_iL_i(z), where Li(z)=j[K],jizωjωiωj={0if ji1if j=iL_i(z)=\prod_{j\in[K],j\neq i} \frac{z-\omega_j}{\omega_i-\omega_j}=\begin{cases} 0 & \text{if } j\neq i \\ 1 & \text{if } j=i \end{cases}.

Then every f(Xi)=f((ωi))f(X_i)=f(\ell(\omega_i)) is an evaluation of polynomial f(z)f\circ \ell(z) at ωi\omega_i.

If the master obtains the composition h=fh=f\circ \ell, it can obtain every f(Xi)=h(ωi)f(X_i)=h(\omega_i).

Goal: The master wished to obtain the polynomial h(z)=f((z))h(z)=f(\ell(z)).

Intuition:

  • Encoding is performed by evaluating (z)\ell(z) at α1,,αPF\alpha_1,\ldots,\alpha_P\in \mathbb{F}, and P>KP>K for redundancy.
  • Nodes apply ff on an evaluation of \ell and obtain an evaluation of hh.
  • The master receives some potentially noisy evaluations, and finds hh.
  • The master evaluates hh at ω1,,ωK\omega_1,\ldots,\omega_K to obtain f(X1),,f(XK)f(X_1),\ldots,f(X_K).

Encoding for Lagrange coded computing

Need polynomial (z)\ell(z) such that:

  • Xk=(ωk)X_k=\ell(\omega_k) for every k[K]k\in [K].

Having obtained such \ell we let X~i=(αi)\tilde{X}_i=\ell(\alpha_i) for every i[P]i\in [P].

spanX~1,X~2,,X~P=span1(x),2(x),,P(x)span{\tilde{X}_1,\tilde{X}_2,\ldots,\tilde{X}_P}=span{\ell_1(x),\ell_2(x),\ldots,\ell_P(x)}.

Want Xk=(ωk)X_k=\ell(\omega_k) for every k[K]k\in [K].

Tool: Lagrange interpolation.

  • k(z)=ikzωjωkωj\ell_k(z)=\prod_{i\neq k} \frac{z-\omega_j}{\omega_k-\omega_j}.
  • k(ωk)=1\ell_k(\omega_k)=1 and k(ωk)=0\ell_k(\omega_k)=0 for every jkj\neq k.
  • degk(z)=K1\deg \ell_k(z)=K-1.

Let (z)=k=1KXkk(z)\ell(z)=\sum_{k=1}^K X_k\ell_k(z).

  • degK1\deg \ell\leq K-1.
  • (ωk)=Xk\ell(\omega_k)=X_k for every k[K]k\in [K].

Let X~i=(αi)=k=1KXkk(αi)\tilde{X}_i=\ell(\alpha_i)=\sum_{k=1}^K X_k\ell_k(\alpha_i).

Every X~i\tilde{X}_i is a linear combination of X1,,XKX_1,\ldots,X_K.

(X~1,X~2,,X~P)=(X1,,XK)G=(X1,,XK)[1(α1)1(α2)1(αP)2(α1)2(α2)2(αP)K(α1)K(α2)K(αP)](\tilde{X}_1,\tilde{X}_2,\ldots,\tilde{X}_P)=(X_1,\ldots,X_K)\cdot G=(X_1,\ldots,X_K)\begin{bmatrix} \ell_1(\alpha_1) & \ell_1(\alpha_2) & \cdots & \ell_1(\alpha_P) \\ \ell_2(\alpha_1) & \ell_2(\alpha_2) & \cdots & \ell_2(\alpha_P) \\ \vdots & \vdots & \ddots & \vdots \\ \ell_K(\alpha_1) & \ell_K(\alpha_2) & \cdots & \ell_K(\alpha_P) \end{bmatrix}

This GG is called a Lagrange matrix with respect to

  • ω1,,ωK\omega_1,\ldots,\omega_K. (interpolation points)
  • α1,,αP\alpha_1,\ldots,\alpha_P. (evaluation points)

Continue next lecture

Last updated on