CSE5313 Coding and information theory for data science (Lecture 6)

Recap

Vector spaces and subspaces over finite fields

$\mathbb{F}^n$ is a vector space over $\mathbb{F}$ .

With point-wise vector addition and scalar multiplication.

Example

$\mathbb{F}_2^4$ is a vector space over $\mathbb{F}_2$ .

Let $v=\begin{pmatrix} 1 & 1 & 1 & 1 \end{pmatrix}$

Then $v$ is a vector in $\mathbb{F}_2^4$ that’s “orthogonal” to itself.

$v\cdot v=1+1+1+1=4=0$ in $\mathbb{F}_2$ .

In general field, the dual space and space may intersect non-trivially.

Let $V$ be a subspace of $\mathbb{F}^n$ .

$V$ is a subgroup of $\mathbb{F}^n$ under vector addition $(\mathbb{F}^n,+)$ .

Apply the theorem: If $H$ is finite, non-empty, and closed under the operation of $G$ , then $H$ is a subgroup of $G$ .

Proof

Since $H\subseteq G$, $H$ is non-empty and closed under the operation of $G$ and finite, then $H\leq G$.

left to show:

Associativity: inherited from $G$ .

Unit element: $0\in H$ .

Consider $a\in H$ , $a,a^2,a^3,\cdots$ are in $H$ . Since $H$ is finite, there exists $i,j\in\mathbb{N}$ such that $a^i=a^j$ .

Then $a^i=a^j\iff a^{i-j}=e\in H$ .

Inverses: $a^{-1}\in H$ .

Automatically holds for unit element traversing.

Is every subgroup of $\mathbb{F}^n$ a subspace?

Answer

No.

Consider $F_4=\{0,1,x,x+1\}$ (field extension of $\mathbb{F}_2$ with $p(x)=x$ ).

$F_4^2=\{(a,b):a,b\in F_4\}$ , $\{(0,0),(1,1)\}$ is a subgroup of $(F_4^2,+)$ .

But the span of $F_4\{(1,1)\}$ is $\{(0,0),(1,1),(x,x),(x+1,x+1)\}\neq \{(0,0),(1,1)\}$ , which is not a subspace of $F_4^2$ .

Cosets in this definition are called Affine subspaces.

V+a=\{v+a:v\in V\}\text{ for some }a\in \mathbb{F}^n

New content

Linear codes

A linear code $\mathcal{C}$ is a subspace of $\mathbb{F}^n$ over $\mathbb{F}$ .

The dimension of $\mathcal{C}$ is denoted by $k$ .
The minimum Hamming distance of $\mathcal{C}$ is denoted by $d$ .
Notation $\mathcal{C}= [n,k,d]_{\mathbb{F}}$ .

Two equivalent ways to constructing a linear code:

A generator matrix $G\in \mathbb{F}^{k\times n}$ with $k$ rows and $n$ columns.
$\mathcal{C}=\{xG:x\in \mathbb{F}^k\}$
- The left image of $G$ is $\mathcal{C}$ .
- The rows of $G$ are a basis for $\mathcal{C}$ .
A parity check matrix $H\in \mathbb{F}^{(n-k)\times n}$ with $(n-k)$ rows and $n$ columns.
$\mathcal{C}=\{c\in \mathbb{F}^n:Hc^\top=0\}$
- The right kernel of $H$ is $\mathcal{C}$ .
- Multiplying $c^\top$ by $H$ “checks” if $c\in \mathcal{C}$ .

Encoding of linear codes

Reminder:

Encoding is the process of mapping a message $u\in \mathbb{F}^k$ to a codeword $c\in \mathcal{C}\subseteq \mathbb{F}^n$ .

$E: \mathbb{F}^k\to \mathcal{C}$ is a linear map.

Let $\mathcal{C}= [n,k,d]_{\mathbb{F}}$ be a linear code with generator matrix $G\in \mathbb{F}^{k\times n}$ .

Encoding is given by $E(x)=xG$ .
It is injective (1-1). Suppose otherwise, then there exists $x_1,x_2\in \mathbb{F}^k$ such that $x_1G=x_2G$ . Then $x_1G-x_2G=0\implies (x_1-x_2)G=0\implies x_1-x_2=0\implies x_1=x_2$ . Therefore, $E$ is injective.

So linear codes implies linear encoding: $E(x)+E(y)=E(x+y)$ .

Systematic codes

Fact: Every $G\in \mathbb{F}^{k\times n}$ can be brought to the form $G_{sys}=(I|A)$ by

Row operations.
Permutation of columns.

Fact $\{xG|x\in \mathbb{F}^k\}$ and $\{xG_{sys}|x\in \mathbb{F}^k\}$ are equivalent.

Same length $n$ .
Same dimension $k$ .
Same minimum Hamming distance $d$ .

Encoding a systematic code:

The input is a part of the output.
Efficient encoding
Immediate decoding (if no errors).

Codes, cosets, encoding, decoding

Linear code $[n,k,d]_{\mathbb{F}}$ is a $k$ dimensional subspace of $\mathbb{F}^n$ .

Size of the code is $|\mathbb{F}|^k$ .

Encoding: $x\to xG$ .

Decoding: $(y+e)\to x$ , $y=xG$ .

Use syndrome to identify which coset $\mathcal{C}_i$ that the noisy-code to $\mathcal{C}_i+e$ belongs to.

H(y+e)^\top=H(y+e)=Hx+He=He

Syndrome decoding

Heavily depends onn the linear structure of the code.

Linear code $\mathcal{C}= [n,k,d]_{\mathbb{F}}$ is a $k$ -dimensional subspace of $(\mathbb{F}^n,+)$ .

Shift of Linear code $[n,k,d]_{\mathbb{F}}$ is a $k$ -dimensional affine subspace of $\mathbb{F}^n$ .

All cosets of the same size

If $w_H(e)\leq \lfloor \frac{d-1}{2}\rfloor$ , then it is possible to extract $y$ from $y+e$ .

by syndrome decoding, we can do better than exhaustive search.

Idea:

Let $y+e$ belogns to the coset $\mathcal{C}+e$ .

Moreover, $y_1+e$ and $y_2+e$ are in the same coset.

Standard Array

Let $\mathcal{C}= [n,k,d]_{\mathbb{F}}$ and denote $|F|=q$ .

Then $|\mathcal{C}|=q^k$ .
The number of cosets is $q^{n-k}$ .

Then we arrange all $q^n$ elements of $\mathbb{F}^n$ into a $q^{n-k}\times q^k$ array.

So that every row is a coset (including $\mathcal{C}$ itself)
Lightest word in each cosets on the leftmost column

Example

Let $\mathbb{F}=\mathbb{Z}_2$ and $C=\{xG|x\in \mathbb{F}_2\}$

G=\begin{pmatrix} 1 & 0 & 1 & 1 & 0\\ 0 & 1 & 1 & 0 & 1 \end{pmatrix}

So $\mathcal{C}=\{00000,10110,01011,11101\}$ .

Then $G=[5,2,3]_2$ .

The standard array is:

First row is $\mathcal{C}$ .

Second row is $\mathcal{C}+(00001)$ ,

Third row is $\mathcal{C}+(00010)$ .

Fourth row is $\mathcal{C}+(00100)$ .

00000	10110	01011	11101
00001	10111	01010	11100
00010	10100	01001	11110
00100	10010	01101	11000

Any two elements in a row are of the form $y_1'=y_1+e$ and $y_2'=y_2+e$ for some $e\in \mathbb{F}^n$ .

Same syndrome if $H(y_1'+e)^\top=H(y_2'+e)^\top$ .

Entries in different rows have different syndrome.

Proof

Choose the lightest word in each coset on the leftmost column.

Time complexity: $O(n(n-k))$ . Space complexity: $n|F|^n$ space.

Compare with exhaustive search: Time: $O(|F|^n)$ .

Syndrome decoding - Intuition

Given $y'$ , we identify the set $\mathcal{C} + e$ to which $y'$ belongs by computing the syndrome.

We identify $e$ as the coset leader (leftmost entry) of the row $\mathcal{C} + e$ .
We output the codeword in $\mathcal{C}$ which is closest ( $c'$ ) by subtracting $e$ from $y'$ .

Syndrome decoding - Formal

Given $y'\in \mathbb{F}^n$ , we identify the set $\mathcal{C}+e$ to which $y'$ belongs by computing the syndrome.

We identify $e$ as the coset leader (leftmost entry) of the row $\mathcal{C}+e$ .

We output the codeword in $\mathcal{C}$ which is closest (example $c_3$ ) by subtracting $e$ from $y'$ .