Skip to Content
CSE5313CSE5313 Coding and information theory for data science (Lecture 13)

CSE5313 Coding and information theory for data science (Lecture 13)

Recap from last lecture

Local recoverable codes:

An [n,k]q[n, k]_q code is called rr-locally recoverable if

  • every codeword symbol yjy_j has a recovering set Rj[n]jR_j \subseteq [n] \setminus j ([n]={1,2,,n}[n]=\{1,2,\ldots,n\}),
  • such that yjy_j is computable from yiy_i for all iRji \in R_j.
  • Rjr|R_j| \leq r for every jnj \in n.

Bounds for locally recoverable codes

Bound 1: knrr+1\frac{k}{n}\leq \frac{r}{r+1}.

Bound 2: dnkkr+2d\leq n-k-\lceil\frac{k}{r}\rceil +2. (r=kr=k gives singleton bound)

Turan’s Lemma

Let GG be a graph with nn vertices. Then there exists an induced directed acyclic subgraph (DAG) of GG on at least n1+avgi(diout)\frac{n}{1+\operatorname{avg}_i(d^{out}_i)} nodes, where dioutd^{out}_i is the out-degree of vertex ii.

Proof using probabilistic method.

Proof via the probabilistic method

Useful for showing the existence of a large acyclic subgraph, but not for finding it.

Tip

Show that E[X]something\mathbb{E}[X]\geq something, and therefore there exists UπU_\pi with Uπsomething|U_\pi|\geq something, using pigeonhole principle.

For a permutation π\pi of [n][n], define Uπ={π(i):i[n]}U_\pi = \{\pi(i): i \in [n]\}.

Let iUπi\in U_\pi if each of the dioutd_i^{out} outgoing edges from ii connect to a node jj with π(j)>π(i)\pi(j)>\pi(i).

In other words, we select a subset of nodes UπU_\pi such that each node in UπU_\pi has an outgoing edge to a node in UπU_\pi with a larger index. All edges going to right.

This graph is clearly acyclic.

Choose π\pi at random and Let X=UπX=|U_\pi| be a random variable.

Let XiX_i be the indicator random variable for iUπi\in U_\pi.

So X=i=1nXiX=\sum_{i=1}^{n} X_i.

Using linearity of expectation, we have

E[X]=i=1nE[Xi]E[X]=\sum_{i=1}^{n} E[X_i]

E[Xi]E[X_i] is the probability that π\pi places ii before any of its out-neighbors.

For each node, there are (diout+1)!(d_i^{out}+1)! ways to place the node and its out-neighbors.

For each node, there are diout!d_i^{out}! ways to place the out-neighbors.

So, E[Xi]=diout!(diout+1)!=1diout+1E[X_i]=\frac{d_i^{out}!}{(d_i^{out}+1)!}=\frac{1}{d_i^{out}+1}.

Since X=i=1nXiX=\sum_{i=1}^{n} X_i, we have

Tip

Recall Arithmetic mean (1ni=1nxi\frac{1}{n}\sum_{i=1}^{n} x_i) is greater than or equal to harmonic mean (ni=1n1xi\frac{n}{\sum_{i=1}^{n} \frac{1}{x_i}}).

E[X]=i=1nE[Xi]=i=1n1diout+1=1ni=1n1diout+1n1n(1+i=1ndiout)=n1+avgi(diout)\begin{aligned} E[X]&=\sum_{i=1}^{n} E[X_i]\\ &=\sum_{i=1}^{n} \frac{1}{d_i^{out}+1}\\ &=\frac{1}{n}\sum_{i=1}^{n} \frac{1}{d_i^{out}+1}\\ &\geq \frac{n}{\frac{1}{n}(1+\sum_{i=1}^{n} d_i^{out})}\\ &=\frac{n}{1+\operatorname{avg}_i(d_i^{out})}\\ \end{aligned}

Locally recoverable codes

Bound 1

knrr+1\frac{k}{n}\leq \frac{r}{r+1}

Proof via Turan's Lemma

Let GG be a directed graph such that iji\to j is an edge if jRij\in R_i. (jj required to repair ii)

dioutrd_i^{out}\leq r for every ii, so avgi(diout)r\operatorname{avg}_i(d_i^{out})\leq r.

By Turan’s Lemma, there exists an induced directed acyclic subgraph (DAG) GUG_U on GG with UU nodes such that Un1+avgi(diout)n1+r|U|\geq \frac{n}{1+\operatorname{avg}_i(d_i^{out})}\geq \frac{n}{1+r}.

Since GUG_U is a DAG, it is acyclic. So there exists a vertex u1Uu_1\in U with no outgoing edges inside GUG_U. (leaf node in GUG_U)

Note that if we remove u1u_1 from GUG_U, the remaining graph is still a DAG.

Let GU1G_{U_1} be the induced graph on U1=U{u1}U_1=U\setminus \{u_1\}.

We can find a vertex u2U1u_2\in U_1 with no outgoing edges inside GU1G_{U_1}.

We can continue this process until we have all the vertices in UU removed.

So all symbols in UU are redundant.

Note the number of redundant symbols =nkn1+r=n-k\geq \frac{n}{1+r}.

So krnr+1k\leq \frac{rn}{r+1}.

So knrr+1\frac{k}{n}\leq \frac{r}{r+1}.

Bound 2

dnkkr+2d\leq n-k-\lceil\frac{k}{r}\rceil +2

Definition for code reduction

For C=[n,k]qC=[n,k]_q code, let I[n]I\subseteq [n] be a set of indices, e.g., I={1,3,5,6,7,8}I=\{1,3,5,6,7,8\}.

Let CIC_I be the code obtained by deleting the symbols that is not in II. (only keep the symbols with indices in II)

Then CIC_I is a code of length I|I|, CIC_I is generated by GIFqk×IG_I\in \mathbb{F}^{k\times |I|}_q (only keep the rows of GG with indices in II).

CI=qrank(GI)|C_I|=q^{\operatorname{rank}(G_I)}.

Note GIG_I not necessarily of rank kk.

Reduction lemma

Let C\mathcal{C} be an [n,k]q[n, k]_q code. Then d=nmax{I:CI<qk,I[n]}d=n-\max\{|I|:C_I<q^k,I\subseteq [n]\}.

Proof for reduction lemma

We proceed from two inequalities.

First dnmax{I:CI<qk,I[n]}d\leq n-\max\{|I|:C_I<q^k,I\subseteq [n]\}.

Let I[n]I\subseteq [n] with CIqk|C_I|\leq q^k.

Then there exists m1,m2Fqkm_1,m_2\in \mathbb{F}_q^k such that m1GI=m2GIm_1G_I=m_2G_I.

Let y1=m1GIy_1=m_1G_I and y2=m2GIy_2=m_2G_I.

Then y1,y2y_1,y_2 have at least I|I| entries in common.

So dH(y1,y2)nId_H(y_1,y_2)\leq n-|I|.

So dnId\leq n-|I| for every II such that CIqk|C_I|\leq q^k.

So dnmax{I:CI<qk,I[n]}d\leq n-\max\{|I|:C_I<q^k,I\subseteq [n]\}.


Now we show the other inequality.

dnmax{I:CI<qk,I[n]}d\geq n-\max\{|I|:C_I<q^k,I\subseteq [n]\}.

Let y1,y2Cy_1,y_2\in \mathcal{C} such that dH(y1,y2)=dd_H(y_1,y_2)=d.

Let JJ be the set on which y1y_1 and y2y_2 identical.

So J=nd|J|=n-d.

CJ<qk|C_J|<q^k since y1y_1 and y2y_2 have at least dd entries in common.

So max{I:CI<qk,I[n]}nd\max\{|I|:C_I<q^k,I\subseteq [n]\}\geq n-d. (by existence of JJ)

So dnmax{I:CI<qk,I[n]}d\geq n-\max\{|I|:C_I<q^k,I\subseteq [n]\}.

We prove bound 2 using the same subgraph from proof of bound 1.

Proof for bound 2

Let GG be a directed graph such that iji\to j is an edge if jRij\in R_i. (jj required to repair ii)

dioutrd_i^{out}\leq r for every ii, so avgi(diout)r\operatorname{avg}_i(d_i^{out})\leq r.

By Turan’s Lemma, there exists an induced directed acyclic subgraph (DAG) GUG_U on GG with UU nodes such that Un1+avgi(diout)n1+r|U|\geq \frac{n}{1+\operatorname{avg}_i(d_i^{out})}\geq \frac{n}{1+r}.

Let UUU'\subset U with size k1r\lfloor\frac{k-1}{r}\rfloor.

UU' exists since nr+1>kr\frac{n}{r+1}>\frac{k}{r} by bound 1, and kr>k1r\frac{k}{r}>\lfloor\frac{k-1}{r}\rfloor.

So GUG_{U'} is also acyclic.

Let N[n]UN\subseteq [n]\setminus U' be the set of neighbors of UU' in GUG_U.

NrUk1|N|\leq r|U'|\leq k-1.

Complete nn to be of the size k1k-1 by adding elements not in UU'.

CNqk1|C_N|\leq q^{k-1}

Also NU=k1+k1r|N\cup U'|=k-1+\lfloor\frac{k-1}{r}\rfloor

Note that all nodes in GUG_U can be recovered from nodes in NN.

So CNU=CNqk1|C_{N\cup U'}|=|C_N|\leq q^{k-1}.

Therefore, max{I:CI<qk,I[n]}NU=k1+k1r\max\{|I|:C_I<q^k,I\subseteq [n]\}\geq |N\cup U'|=k-1+\lfloor\frac{k-1}{r}\rfloor.

Using reduction lemma, we have d=nmax{I:CI<qk,I[n]}nk1k1r+2=nkkr+2d= n-\max\{|I|:C_I<q^k,I\subseteq [n]\}\leq n-k-1-\lfloor\frac{k-1}{r}\rfloor+2=n-k-\lceil\frac{k}{r}\rceil +2.

Construction of locally recoverable codes

Recall Reed-Solomon code:

{f(a1),f(a2),,f(an)f(x)Fqk1[x]}\{f(a_1),f(a_2),\ldots,f(a_n)|f(x)\in \mathbb{F}_q^{k-1}[x]\}.

  • dim=k\dim=k
  • d=nk+1d=n-k+1
  • Need nqn\geq q.
  • To encode a=a1,a2,,ana=a_1,a_2,\ldots,a_n, we need to evaluate fa(x)=i=0k1fa(ai)xif_a(x)=\sum_{i=0}^{k-1}f_a(a_i)x^i.

Assume r+1nr+1|n and rkr|k.

Choose A={α1,α2,,αn}A=\{\alpha_1,\alpha_2,\ldots,\alpha_n\}

Partition AA into nr+1\frac{n}{r+1} subsets of size r+1r+1.

Definition for good polynomial

A polynomial gg over Fq\mathbb{F}_q is called good if

  • deg(g)=r+1\deg(g)=r+1
  • gg is constant for every AiA_i., i.e., x,yAi,g(x)=g(y)\forall x,y\in A_i, g(x)=g(y).

To encode a=(a0,a1,,an)a=(a_0,a_1,\ldots,a_n), we consider aFqr×(k/r)a\in \mathbb{F}_q^{r\times (k/r)}

Let fa(x)=i=0r1fa,i(x)xif_a(x)=\sum_{i=0}^{r-1} f_{a,i}(x)\cdot x^i

fa,i(x)=j=0k/r1ai,jg(x)jf_{a,i}(x)=\sum_{j=0}^{k/r-1} a_{i,j}\cdot g(x)^j, where gg is a good polynomial.

Proof for valid codeword

If aba\neq b, then (fa(αi))i=1n(fb(αi))i=1n(f_a(\alpha_i))_{i=1}^{n}\neq (f_b(\alpha_i))_{i=1}^{n}.

  • degfa=k2\deg f_a = k-2
  • Maximum degree monomial in faf_a: ar1,k/r1g(x)k/r1xr1a_{r-1,k/r-1}\cdot g(x)^{k/r-1}\cdot x^{r-1}.
  • degfakr1+kr1=k2\deg f_a \leq \frac{k}{r}-1+\frac{k}{r}-1=k-2.

By Bound 1: k2n2k-2\leq n-2.

If fa(αi)i=1n=fb(αi)i=1nf_a(\alpha_i)_{i=1}^{n}=f_b(\alpha_i)_{i=1}^{n}. then

  • faf_a and fbf_b agree on more points than their degree.
  • a=ba=b, a contradiction.

Corollary: C=qk|C|=q^k.

In other words,

C={(fa(αi))i=1naFqk}\mathcal{C}=\{(f_a(\alpha_i))_{i=1}^{n}|a\in \mathbb{F}_q^k\}

For data aFqka\in \mathbb{F}_q^k, the iith server stores fa(αi)f_a(\alpha_i).

Note C\mathcal{C} is a linear code. and dimC=k\dim \mathcal{C}=k.

Let α{α1,α2,,αn}\alpha\in \{\alpha_1,\alpha_2,\ldots,\alpha_n\}.

Suppose αA1\alpha\in A_1.

Locality is ensured by recovering failed server α\alpha by fa(αi)f_a(\alpha_i) from {fa(β):βA1{α}}\{f_a(\beta):\beta\in A_1\setminus \{\alpha\}\}.

Note gg is constant on A1A_1.

Denote the constant by fi(A1)f_i(A_1).

Let δ1(x)=i=0r1fa,i(A1)xi\delta_1(x)=\sum_{i=0}^{r-1} f_{a,i}(A_1)x^i.

δ1(x)=fa(x)\delta_1(x)=f_a(x) for all xA1x\in A_1.

Constructing good polynomial

Let (Fq,)(\mathbb{F}_q^*,\cdot) be the multiplicative group of Fq\mathbb{F}_q.

For a subgroup HH of this group, a multiplicative coset is Ha={hahH,aFq}Ha=\{ha|h\in H,a\in \mathbb{F}_q^*\}.

The family of all cosets of H:{HaaFq}H:\{Ha|a\in \mathbb{F}_q^*\}.

This is a partition of Fq\mathbb{F}_q^*.

Definition of annihilator

The annihilator of HH is g(x)=hH(xh)g(x)=\prod_{h\in H}(x-h).

Note that g(h)=0g(h)=0 for all hHh\in H.

Annihilator is a good polynomial

Fix HH, consider the partition {HaaFq}\{Ha|a\in \mathbb{F}_q^*\} and let g(x)=aFq(xa)g(x)=\prod_{a\in \mathbb{F}_q^*}(x-a). Then gg is a good polynomial.

Proof

Need to show if x,yHax,y\in Ha, then g(x)=g(y)g(x)=g(y).

Since xHax\in Ha, there exists h,hHh',h''\in H such that x=hax=h'a and y=hay=h''a.

So g(x)=g(ha)=hH(hah)=(h)HhH(ah/h)g(x)=g(h'a)=\prod_{h\in H}(h'a-h)=(h')^{|H|}\prod_{h\in H}(a-h/h').

By Fermat’s Little Theorem, (h)H1(h')^{|H|}\equiv 1 for every hHh'\in H.

So {h/hhH}=H\{h/h'|h\in H\}=H for every hHh'\in H.

So g(x)=(h)HhH(ah/h)=hH(ah/h)=g(a)g(x)=(h')^{|H|}\prod_{h\in H}(a-h/h')=\prod_{h\in H}(a-h/h')=g(a).

Similarly, g(y)=g(ha)=hH(hah)=(h)HhH(ah/h)=g(a)g(y)=g(h''a)=\prod_{h\in H}(h''a-h)=(h'')^{|H|}\prod_{h\in H}(a-h/h'')=g(a).

Last updated on