CSE5313 Coding and information theory for data science (Lecture 8)

Review on Linear codes

The code	Dimension $k$ (effective message length)	Minimum distance $d$	Dual code	Minimum distance of dual code
$\mathbb{F}^n$	$n$	$1$	$\{0\}$	$0$
Parity code	$n-1$	$2$	Repetition code	$n$
Hamming code	$2^m-m-1$	$3$	Punctured Hadamard code	$2^{m-1}$

More on linear codes

Extended Hamming code

Consider the Hamming code $[2^m-1,2^m-m-1,3]_{\mathbb{F}_2}$ .

Extend it to a cod eof length $2^m$ by adding a parity bit.

Recall the hamming code $[7,4,3]_{2}$ .

H_{HAM}= \begin{pmatrix} 1 & 0 & 0 & 1 & 1 & 0 & 1\\ 0 & 1 & 0 & 1 & 0 & 1 & 1\\ 0 & 0 & 1 & 0 & 1 & 1 & 1\\ \end{pmatrix}

The extended Hamming code is:

H_{EXT}= \begin{pmatrix} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\\ 1 & 0 & 0 & 1 & 1 & 0 & 1 & 0\\ 0 & 1 & 0 & 1 & 0 & 1 & 1 & 0\\ 0 & 0 & 1 & 0 & 1 & 1 & 1 & 0\\ \end{pmatrix}

The minimum distance of the extended Hamming code is 4.

Proof for minimum distance

It is sufficient to show that every 3 columns are linearly independent.

Using the lemma for minimum distance, we have that the minimum distance is 4.

Notice that in $\mathbb{F}_2$ , multiplication is equivalent to AND operation.

By simple observations, we know that every 2 columns are linearly independent.

Consider the following linear combination:

a_1\begin{pmatrix} 1\\ \vdots\\ \end{pmatrix}+a_2\begin{pmatrix} 1\\ \vdots\\ \end{pmatrix}+a_3\begin{pmatrix} 1\\ \vdots\\ \end{pmatrix}=0

Therefore, every 3 columns are linearly independent since the top row will always be 1. if $a_1,a_2,a_3$ are $1$ .

Therefore, the minimum distance is 4.

For the dimension of this code, we have $k=2^m-m-1$ , with total code length $m$ .

Augmented Hadamard code

Consider what is generated by the parity check matrix of the extended Hamming code.

Let $xH_{EXT}$

If $xH_{EXT}=0$ , then $(0,x_2,\ldots, x_m)H_{HAM}=(y,0)$ , where $y$ is the punctured hadamard codeword.
If $xH_{EXT}=1$ , then $(1,x_2,\ldots, x_m)H_{HAM}=(1,1,\ldots,1)+(y,0)$ , where $y$ is the punctured hadamard codeword.

Therefore, the code generated by $H_{EXT}$ is given by:

Let $\mathcal{C}$ be the hadamard code.

Let $\mathcal{C}+\mathbb{I}$ be its shift by the all 1’s vector (flip all bits of all words).

– Augmented Hadamard code = $\mathcal{C} \cup \mathcal{C} + \mathbb{I}$ .

The length of the code is $2^m$ .

The dimension of the code is $m+1$ .

Since the code is still linear, the minimum distance is the minimum weight of the codewords.

So minimum distance of $\mathcal{C}+\mathbb{I}$ is $2^{m-1}$ .

Summary for simple linear codes

The code	Dimension $k$ (effective message length)	Minimum distance $d$	Dual code	Minimum distance of dual code
$\mathbb{F}^n$	$n$	$1$	$\{0\}$	$0$
Parity code	$n-1$	$2$	Repetition code	$n$
Hamming code	$2^m-m-1$ (length $2^m-1$ )	$3$	Punctured Hadamard code	$2^{m-1}$
Extended Hamming code	$2^m-m-1$ (length $2^m$ )	$4$	Augmented Hadamard code	$2^{m-1}$

Boundary of linear codes

Natural questions:

Can we extend the table of linear codes infinitely?
What set of configuration $(n,k,d)_q$ are impossible?
What set of configuration $(n,k,d)_q$ are possible, even if we don’t know how to construct them?

Boundary I: Singleton bound

Singleton is a name for the person who discovered this bound.

Theorem: For any linear code $\mathcal{C}\subseteq \mathbb{F}^n_q$ , we have $d\leq n-k+1$ .

Proof

Idea: Using the Pigeonhole principle.

Assume an code $[n,k,d]_q$ exists.

Pigeons: All $q^k$ possible code word of $\mathcal{C}$ .

Holes: All $q^\ell$ values of the first $\ell$ entries of a codeword (for some $\ell<n$ ).

If $q^\ell<q^k$ , then by Pigeonhole principle, there exists two codewords in $\mathcal{C}$ that agrees on the first $\ell$ entries.

So $d\leq n-\ell$ .

So the largest $\ell$ value for which this arguments works is $n-k+1$ .

Definition of Maximum Distance Separable (MDS) code

A code $\mathcal{C} = [n,k,d]_q$ with $d = n - k + 1$ is called a Maximum Distance Separable (MDS) code.

Examples for singleton bound

$\mathbb{F}^n$ : any $n, k = n, d = 1$ .

Attains with equality!

Parity: any $n, k = n - 1, d = 2$ .

Attains with equality!

Hamming: $n = 2^m - 1, k = 2^m - m - 1, d = 3$ .

$n - k + 1 = m + 1 > 3$ .

This creates some trade-off between $k$ and $d$ .

Boundary II: The Sphere Packing Bound

Let $r=\lfloor \frac{d-1}{2}\rfloor$ , then $\sum_{i=0}^{r}\binom{n}{i}(q-1)^i\leq q^{n-k}$ .

Proof

Let $c=(c_1,c_2,\ldots,c_n)\in \mathbb{F}^n_q$ , and let $B(c,r)=\{y\in \mathbb{F}^n_q: d_H(c,y)\leq r\}$ for some $r\leq n$ .

Computer $|B(c,r)|$ .

$|B(c,0)|=1$

$|\{y\in \mathbb{F}^n_q: d_H(c,y)=1\}|=n(q-1)$ .

$|\{y\in \mathbb{F}^n_q: d_H(c,y)=2\}|=\binom{n}{2}(q-1)^2=\frac{n(n-1)}{2}(q-1)^2$ .

So, $|B(c,r)|=\sum_{i=0}^{r}\binom{n}{i}(q-1)^i$ .

Recall that $\mathcal{C}$ of minimum distance $d$ if and only if $\forall c_1,c_2\in \mathcal{C}, B(c_1,\lfloor \frac{d-1}{2}\rfloor)\cap B(c_2,\lfloor \frac{d-1}{2}\rfloor)=\emptyset$ .

Therefore, let $r=\lfloor \frac{d-1}{2}\rfloor$ , we have $\sum_{i=0}^{r}\binom{n}{i}(q-1)^i\leq q^{n-k}$ .

Definition for perfect code

A code $\mathcal{C}$ is called a perfect code if $|C|=q^{n-k}$ .

Examples for sphere packing bound

Let $q=2$ .

$\mathbb{F}_2^n$ : any $n, k = n, d = 1$ .

$r = \frac{d-1}{2} = 0$ .
$\Rightarrow \sum_{i=0}^{0}\binom{n}{i}(q-1)^i = 1 \leq q^{n-k} = 2^{n-n} = 1$ .
Attained with equality!

Parity: any $n, k = n - 1, d = 2$ .

$r = \frac{d-1}{2} = 0$ .
$\Rightarrow \sum_{i=0}^{0}\binom{n}{i}(q-1)^i = 1 \leq q^{n-k} = 2^{n-k} = 2^{n-(n-1)} = 2$ .
$q \geq 2 \Rightarrow$ NOT attained with equality.

Exercise: Equality is attained for the repetition code (dual of parity) for odd $n$ .

Hamming: $n = 2^m - 1, k = 2^m - m - 1, d = 3$ .

$r = \frac{d-1}{2} = 1$ .
$\Rightarrow \sum_{i=0}^{1}\binom{n}{i}(q-1)^i = 1 + (2^{m}-1) = 2^{m}$ .
$\Rightarrow q^{n-k} = 2^{m}$ .
Attained with equality!

• Attained with equality!

Fortunately, there are only 4 types of binary linear perfect codes:

$\mathbb{F}^n$
Repetition code
Hamming code
$[23,12,7]_2$ Golay code

Boundary III: The Gilbert-Varshamov Bound

Let $n,k,d,q$ such that $V_q(n-k, d-2)\leq q^{n-k}$ , then there exists an $[n,k,d]_q$ code.

Singleton, sphere-packing provide necessary conditions for existence of codes.

Are there sufficient conditions?

Recall:

Lemma: The minimum distance of $\mathcal{C}$ is the maximum integer such that every $d-1$ columns of the parity-check matrix $H$ are linearly independent.

Idea:

Construct $H$ column by column, ensuring that no dependencies occur.

Idea:

Construct $H$ column by column, ensuring that no dependencies occur.

Algorithm:

Begin with $(n-k)\times (n-k)$ identity matrix.
Assume we choose columns $h_1,h_2,\ldots,h_\ell$ (each $h_i$ is in $\mathbb{F}^n_q$ )
Then next column $h_{\ell}$ $h_{ℓ}$ must not be in the space of any previous $d-2$ $d - 2$ columns.
- $h_{\ell}$ cannot be written as $[h_1,h_2,\ldots,h_{\ell-1}]x^\top$ for $x$ of Hamming weight at most $d-2$ .
- So the ineligible candidates for $h_{\ell}$ $h_{ℓ}$ is:
  - $B_{\ell-1}(0,d-2)=\{x\in \mathbb{F}^{\ell-1}_q: d_H(0,x)\leq d-2\}$ .
  - $|B_{\ell-1}(0,d-2)|=\sum_{i=0}^{d-2}\binom{\ell-1}{i}(q-1)^i$ , denoted by $V_q(\ell-1, d-2)$ .
- So the candidates for $h_{\ell}$ $h_{ℓ}$ is:
  - Out of which $V_q(\ell-1, d-2)$ are ineligible.
  - Need $n$ columns overall, so we need $V_q(n-k, d-2)\leq q^{n-k}$ .