CSE5313 Coding and information theory for data science (Lecture 13)

Recap from last lecture

Local recoverable codes:

An $[n, k]_q$ code is called $r$ -locally recoverable if

every codeword symbol $y_j$ has a recovering set $R_j \subseteq [n] \setminus j$ ( $[n]=\{1,2,\ldots,n\}$ ),
such that $y_j$ is computable from $y_i$ for all $i \in R_j$ .
$|R_j| \leq r$ for every $j \in n$ .

Bounds for locally recoverable codes

Bound 1: $\frac{k}{n}\leq \frac{r}{r+1}$ .

Bound 2: $d\leq n-k-\lceil\frac{k}{r}\rceil +2$ . ( $r=k$ gives singleton bound)

Turan’s Lemma

Let $G$ be a graph with $n$ vertices. Then there exists an induced directed acyclic subgraph (DAG) of $G$ on at least $\frac{n}{1+\operatorname{avg}_i(d^{out}_i)}$ nodes, where $d^{out}_i$ is the out-degree of vertex $i$ .

Proof using probabilistic method.

Proof via the probabilistic method

Useful for showing the existence of a large acyclic subgraph, but not for finding it.

Tip

Show that $\mathbb{E}[X]\geq something$ , and therefore there exists $U_\pi$ with $|U_\pi|\geq something$ , using pigeonhole principle.

For a permutation $\pi$ of $[n]$ , define $U_\pi = \{\pi(i): i \in [n]\}$ .

Let $i\in U_\pi$ if each of the $d_i^{out}$ outgoing edges from $i$ connect to a node $j$ with $\pi(j)>\pi(i)$ .

In other words, we select a subset of nodes $U_\pi$ such that each node in $U_\pi$ has an outgoing edge to a node in $U_\pi$ with a larger index. All edges going to right.

This graph is clearly acyclic.

Choose $\pi$ at random and Let $X=|U_\pi|$ be a random variable.

Let $X_i$ be the indicator random variable for $i\in U_\pi$ .

So $X=\sum_{i=1}^{n} X_i$ .

Using linearity of expectation, we have

E[X]=\sum_{i=1}^{n} E[X_i]

$E[X_i]$ is the probability that $\pi$ places $i$ before any of its out-neighbors.

For each node, there are $(d_i^{out}+1)!$ ways to place the node and its out-neighbors.

For each node, there are $d_i^{out}!$ ways to place the out-neighbors.

So, $E[X_i]=\frac{d_i^{out}!}{(d_i^{out}+1)!}=\frac{1}{d_i^{out}+1}$ .

Since $X=\sum_{i=1}^{n} X_i$ , we have

Tip

Recall Arithmetic mean ( $\frac{1}{n}\sum_{i=1}^{n} x_i$ ) is greater than or equal to harmonic mean ( $\frac{n}{\sum_{i=1}^{n} \frac{1}{x_i}}$ ).

\begin{aligned} E[X]&=\sum_{i=1}^{n} E[X_i]\\ &=\sum_{i=1}^{n} \frac{1}{d_i^{out}+1}\\ &=\frac{1}{n}\sum_{i=1}^{n} \frac{1}{d_i^{out}+1}\\ &\geq \frac{n}{\frac{1}{n}(1+\sum_{i=1}^{n} d_i^{out})}\\ &=\frac{n}{1+\operatorname{avg}_i(d_i^{out})}\\ \end{aligned}

Locally recoverable codes

Bound 1

\frac{k}{n}\leq \frac{r}{r+1}

Proof via Turan's Lemma

Let $G$ be a directed graph such that $i\to j$ is an edge if $j\in R_i$ . ( $j$ required to repair $i$ )

$d_i^{out}\leq r$ for every $i$ , so $\operatorname{avg}_i(d_i^{out})\leq r$ .

By Turan’s Lemma, there exists an induced directed acyclic subgraph (DAG) $G_U$ on $G$ with $U$ nodes such that $|U|\geq \frac{n}{1+\operatorname{avg}_i(d_i^{out})}\geq \frac{n}{1+r}$ .

Since $G_U$ is a DAG, it is acyclic. So there exists a vertex $u_1\in U$ with no outgoing edges inside $G_U$ . (leaf node in $G_U$ )

Note that if we remove $u_1$ from $G_U$ , the remaining graph is still a DAG.

Let $G_{U_1}$ be the induced graph on $U_1=U\setminus \{u_1\}$ .

We can find a vertex $u_2\in U_1$ with no outgoing edges inside $G_{U_1}$ .

We can continue this process until we have all the vertices in $U$ removed.

So all symbols in $U$ are redundant.

Note the number of redundant symbols $=n-k\geq \frac{n}{1+r}$ .

So $k\leq \frac{rn}{r+1}$ .

So $\frac{k}{n}\leq \frac{r}{r+1}$ .

Bound 2

d\leq n-k-\lceil\frac{k}{r}\rceil +2

Definition for code reduction

For $C=[n,k]_q$ code, let $I\subseteq [n]$ be a set of indices, e.g., $I=\{1,3,5,6,7,8\}$ .

Let $C_I$ be the code obtained by deleting the symbols that is not in $I$ . (only keep the symbols with indices in $I$ )

Then $C_I$ is a code of length $|I|$ , $C_I$ is generated by $G_I\in \mathbb{F}^{k\times |I|}_q$ (only keep the rows of $G$ with indices in $I$ ).

$|C_I|=q^{\operatorname{rank}(G_I)}$ .

Note $G_I$ not necessarily of rank $k$ .

Reduction lemma

Let $\mathcal{C}$ be an $[n, k]_q$ code. Then $d=n-\max\{|I|:C_I<q^k,I\subseteq [n]\}$ .

Proof for reduction lemma

We proceed from two inequalities.

First $d\leq n-\max\{|I|:C_I<q^k,I\subseteq [n]\}$ .

Let $I\subseteq [n]$ with $|C_I|\leq q^k$ .

Then there exists $m_1,m_2\in \mathbb{F}_q^k$ such that $m_1G_I=m_2G_I$ .

Let $y_1=m_1G_I$ and $y_2=m_2G_I$ .

Then $y_1,y_2$ have at least $|I|$ entries in common.

So $d_H(y_1,y_2)\leq n-|I|$ .

So $d\leq n-|I|$ for every $I$ such that $|C_I|\leq q^k$ .

So $d\leq n-\max\{|I|:C_I<q^k,I\subseteq [n]\}$ .

Now we show the other inequality.

$d\geq n-\max\{|I|:C_I<q^k,I\subseteq [n]\}$ .

Let $y_1,y_2\in \mathcal{C}$ such that $d_H(y_1,y_2)=d$ .

Let $J$ be the set on which $y_1$ and $y_2$ identical.

So $|J|=n-d$ .

$|C_J|<q^k$ since $y_1$ and $y_2$ have at least $d$ entries in common.

So $\max\{|I|:C_I<q^k,I\subseteq [n]\}\geq n-d$ . (by existence of $J$ )

So $d\geq n-\max\{|I|:C_I<q^k,I\subseteq [n]\}$ .

We prove bound 2 using the same subgraph from proof of bound 1.

Proof for bound 2

Let $G$ be a directed graph such that $i\to j$ is an edge if $j\in R_i$ . ( $j$ required to repair $i$ )

$d_i^{out}\leq r$ for every $i$ , so $\operatorname{avg}_i(d_i^{out})\leq r$ .

By Turan’s Lemma, there exists an induced directed acyclic subgraph (DAG) $G_U$ on $G$ with $U$ nodes such that $|U|\geq \frac{n}{1+\operatorname{avg}_i(d_i^{out})}\geq \frac{n}{1+r}$ .

Let $U'\subset U$ with size $\lfloor\frac{k-1}{r}\rfloor$ .

$U'$ exists since $\frac{n}{r+1}>\frac{k}{r}$ by bound 1, and $\frac{k}{r}>\lfloor\frac{k-1}{r}\rfloor$ .

So $G_{U'}$ is also acyclic.

Let $N\subseteq [n]\setminus U'$ be the set of neighbors of $U'$ in $G_U$ .

$|N|\leq r|U'|\leq k-1$ .

Complete $n$ to be of the size $k-1$ by adding elements not in $U'$ .

$|C_N|\leq q^{k-1}$

Also $|N\cup U'|=k-1+\lfloor\frac{k-1}{r}\rfloor$

Note that all nodes in $G_U$ can be recovered from nodes in $N$ .

So $|C_{N\cup U'}|=|C_N|\leq q^{k-1}$ .

Therefore, $\max\{|I|:C_I<q^k,I\subseteq [n]\}\geq |N\cup U'|=k-1+\lfloor\frac{k-1}{r}\rfloor$ .

Using reduction lemma, we have $d= n-\max\{|I|:C_I<q^k,I\subseteq [n]\}\leq n-k-1-\lfloor\frac{k-1}{r}\rfloor+2=n-k-\lceil\frac{k}{r}\rceil +2$ .

Construction of locally recoverable codes

Recall Reed-Solomon code:

$\{f(a_1),f(a_2),\ldots,f(a_n)|f(x)\in \mathbb{F}_q^{k-1}[x]\}$ .

$\dim=k$
$d=n-k+1$
Need $n\geq q$ .
To encode $a=a_1,a_2,\ldots,a_n$ , we need to evaluate $f_a(x)=\sum_{i=0}^{k-1}f_a(a_i)x^i$ .

Assume $r+1|n$ and $r|k$ .

Choose $A=\{\alpha_1,\alpha_2,\ldots,\alpha_n\}$

Partition $A$ into $\frac{n}{r+1}$ subsets of size $r+1$ .

Definition for good polynomial

A polynomial $g$ over $\mathbb{F}_q$ is called good if

$\deg(g)=r+1$
$g$ is constant for every $A_i$ ., i.e., $\forall x,y\in A_i, g(x)=g(y)$ .

To encode $a=(a_0,a_1,\ldots,a_n)$ , we consider $a\in \mathbb{F}_q^{r\times (k/r)}$

Let $f_a(x)=\sum_{i=0}^{r-1} f_{a,i}(x)\cdot x^i$

$f_{a,i}(x)=\sum_{j=0}^{k/r-1} a_{i,j}\cdot g(x)^j$ , where $g$ is a good polynomial.

Proof for valid codeword
If $a\neq b$ , then $(f_a(\alpha_i))_{i=1}^{n}\neq (f_b(\alpha_i))_{i=1}^{n}$ .

$\deg f_a = k-2$

Maximum degree monomial in $f_a$ : $a_{r-1,k/r-1}\cdot g(x)^{k/r-1}\cdot x^{r-1}$ .

$\deg f_a \leq \frac{k}{r}-1+\frac{k}{r}-1=k-2$ .

By Bound 1: $k-2\leq n-2$ .

If $f_a(\alpha_i)_{i=1}^{n}=f_b(\alpha_i)_{i=1}^{n}$ . then

$f_a$ and $f_b$ agree on more points than their degree.

$a=b$ , a contradiction.

Corollary: $|C|=q^k$ .

In other words,

\mathcal{C}=\{(f_a(\alpha_i))_{i=1}^{n}|a\in \mathbb{F}_q^k\}

For data $a\in \mathbb{F}_q^k$ , the $i$ th server stores $f_a(\alpha_i)$ .

Note $\mathcal{C}$ is a linear code. and $\dim \mathcal{C}=k$ .

Let $\alpha\in \{\alpha_1,\alpha_2,\ldots,\alpha_n\}$ .

Suppose $\alpha\in A_1$ .

Locality is ensured by recovering failed server $\alpha$ by $f_a(\alpha_i)$ from $\{f_a(\beta):\beta\in A_1\setminus \{\alpha\}\}$ .

Note $g$ is constant on $A_1$ .

Denote the constant by $f_i(A_1)$ .

Let $\delta_1(x)=\sum_{i=0}^{r-1} f_{a,i}(A_1)x^i$ .

$\delta_1(x)=f_a(x)$ for all $x\in A_1$ .

Constructing good polynomial

Let $(\mathbb{F}_q^*,\cdot)$ be the multiplicative group of $\mathbb{F}_q$ .

For a subgroup $H$ of this group, a multiplicative coset is $Ha=\{ha|h\in H,a\in \mathbb{F}_q^*\}$ .

The family of all cosets of $H:\{Ha|a\in \mathbb{F}_q^*\}$ .

This is a partition of $\mathbb{F}_q^*$ .

Definition of annihilator

The annihilator of $H$ is $g(x)=\prod_{h\in H}(x-h)$ .

Note that $g(h)=0$ for all $h\in H$ .

Annihilator is a good polynomial

Fix $H$ , consider the partition $\{Ha|a\in \mathbb{F}_q^*\}$ and let $g(x)=\prod_{a\in \mathbb{F}_q^*}(x-a)$ . Then $g$ is a good polynomial.

Proof

Need to show if $x,y\in Ha$ , then $g(x)=g(y)$ .

Since $x\in Ha$ , there exists $h',h''\in H$ such that $x=h'a$ and $y=h''a$ .

So $g(x)=g(h'a)=\prod_{h\in H}(h'a-h)=(h')^{|H|}\prod_{h\in H}(a-h/h')$ .

By Fermat’s Little Theorem, $(h')^{|H|}\equiv 1$ for every $h'\in H$ .

So $\{h/h'|h\in H\}=H$ for every $h'\in H$ .

So $g(x)=(h')^{|H|}\prod_{h\in H}(a-h/h')=\prod_{h\in H}(a-h/h')=g(a)$ .

Similarly, $g(y)=g(h''a)=\prod_{h\in H}(h''a-h)=(h'')^{|H|}\prod_{h\in H}(a-h/h'')=g(a)$ .