Skip to Content
CSE5313CSE5313 Coding and information theory for data science (Lecture 6)

CSE5313 Coding and information theory for data science (Lecture 6)

Recap

Vector spaces and subspaces over finite fields

Fn\mathbb{F}^n is a vector space over F\mathbb{F}.

With point-wise vector addition and scalar multiplication.

Example

F24\mathbb{F}_2^4 is a vector space over F2\mathbb{F}_2.

Let v=(1111)v=\begin{pmatrix} 1 & 1 & 1 & 1 \end{pmatrix}

Then vv is a vector in F24\mathbb{F}_2^4 that’s “orthogonal” to itself.

vv=1+1+1+1=4=0v\cdot v=1+1+1+1=4=0 in F2\mathbb{F}_2.

In general field, the dual space and space may intersect non-trivially.

Let VV be a subspace of Fn\mathbb{F}^n.

VV is a subgroup of Fn\mathbb{F}^n under vector addition (Fn,+)(\mathbb{F}^n,+).

  • Apply the theorem: If HH is finite, non-empty, and closed under the operation of GG, then HH is a subgroup of GG.

Proof

Since $H\subseteq G$, $H$ is non-empty and closed under the operation of $G$ and finite, then $H\leq G$.

left to show:

Associativity: inherited from GG.

Unit element: 0H0\in H.

Consider aHa\in H, a,a2,a3,a,a^2,a^3,\cdots are in HH. Since HH is finite, there exists i,jNi,j\in\mathbb{N} such that ai=aja^i=a^j.

Then ai=aj    aij=eHa^i=a^j\iff a^{i-j}=e\in H.

Inverses: a1Ha^{-1}\in H.

Automatically holds for unit element traversing.

Is every subgroup of Fn\mathbb{F}^n a subspace?

Answer

No.

Consider F4={0,1,x,x+1}F_4=\{0,1,x,x+1\} (field extension of F2\mathbb{F}_2 with p(x)=xp(x)=x).

F42={(a,b):a,bF4}F_4^2=\{(a,b):a,b\in F_4\}, {(0,0),(1,1)}\{(0,0),(1,1)\} is a subgroup of (F42,+)(F_4^2,+).

But the span of F4{(1,1)}F_4\{(1,1)\} is {(0,0),(1,1),(x,x),(x+1,x+1)}{(0,0),(1,1)}\{(0,0),(1,1),(x,x),(x+1,x+1)\}\neq \{(0,0),(1,1)\}, which is not a subspace of F42F_4^2.

Cosets in this definition are called Affine subspaces.

V+a={v+a:vV} for some aFnV+a=\{v+a:v\in V\}\text{ for some }a\in \mathbb{F}^n

New content

Linear codes

A linear code C\mathcal{C} is a subspace of Fn\mathbb{F}^n over F\mathbb{F}.

  • The dimension of C\mathcal{C} is denoted by kk.
  • The minimum Hamming distance of C\mathcal{C} is denoted by dd.
  • Notation C=[n,k,d]F\mathcal{C}= [n,k,d]_{\mathbb{F}}.

Two equivalent ways to constructing a linear code:

  • A generator matrix GFk×nG\in \mathbb{F}^{k\times n} with kk rows and nn columns.

    C={xG:xFk}\mathcal{C}=\{xG:x\in \mathbb{F}^k\}
    • The left image of GG is C\mathcal{C}.
    • The rows of GG are a basis for C\mathcal{C}.
  • A parity check matrix HF(nk)×nH\in \mathbb{F}^{(n-k)\times n} with (nk)(n-k) rows and nn columns.

    C={cFn:Hc=0}\mathcal{C}=\{c\in \mathbb{F}^n:Hc^\top=0\}
    • The right kernel of HH is C\mathcal{C}.
    • Multiplying cc^\top by HH “checks” if cCc\in \mathcal{C}.

Encoding of linear codes

Reminder:

  • Encoding is the process of mapping a message uFku\in \mathbb{F}^k to a codeword cCFnc\in \mathcal{C}\subseteq \mathbb{F}^n.

E:FkCE: \mathbb{F}^k\to \mathcal{C} is a linear map.

Let C=[n,k,d]F\mathcal{C}= [n,k,d]_{\mathbb{F}} be a linear code with generator matrix GFk×nG\in \mathbb{F}^{k\times n}.

  • Encoding is given by E(x)=xGE(x)=xG.
  • It is injective (1-1). Suppose otherwise, then there exists x1,x2Fkx_1,x_2\in \mathbb{F}^k such that x1G=x2Gx_1G=x_2G. Then x1Gx2G=0    (x1x2)G=0    x1x2=0    x1=x2x_1G-x_2G=0\implies (x_1-x_2)G=0\implies x_1-x_2=0\implies x_1=x_2. Therefore, EE is injective.

So linear codes implies linear encoding: E(x)+E(y)=E(x+y)E(x)+E(y)=E(x+y).

Systematic codes

Fact: Every GFk×nG\in \mathbb{F}^{k\times n} can be brought to the form Gsys=(IA)G_{sys}=(I|A) by

  • Row operations.
  • Permutation of columns.

Fact {xGxFk}\{xG|x\in \mathbb{F}^k\} and {xGsysxFk}\{xG_{sys}|x\in \mathbb{F}^k\} are equivalent.

  • Same length nn.
  • Same dimension kk.
  • Same minimum Hamming distance dd.

Encoding a systematic code:

  • The input is a part of the output.
  • Efficient encoding
  • Immediate decoding (if no errors).

Codes, cosets, encoding, decoding

Linear code [n,k,d]F[n,k,d]_{\mathbb{F}} is a kk dimensional subspace of Fn\mathbb{F}^n.

Size of the code is Fk|\mathbb{F}|^k.

Encoding: xxGx\to xG.

Decoding: (y+e)x(y+e)\to x, y=xGy=xG.

Use syndrome to identify which coset Ci\mathcal{C}_i that the noisy-code to Ci+e\mathcal{C}_i+e belongs to.

H(y+e)=H(y+e)=Hx+He=HeH(y+e)^\top=H(y+e)=Hx+He=He

Syndrome decoding

  • Heavily depends onn the linear structure of the code.

Linear code C=[n,k,d]F\mathcal{C}= [n,k,d]_{\mathbb{F}} is a kk-dimensional subspace of (Fn,+)(\mathbb{F}^n,+).

Shift of Linear code [n,k,d]F[n,k,d]_{\mathbb{F}} is a kk-dimensional affine subspace of Fn\mathbb{F}^n.

All cosets of the same size

If wH(e)d12w_H(e)\leq \lfloor \frac{d-1}{2}\rfloor, then it is possible to extract yy from y+ey+e.

by syndrome decoding, we can do better than exhaustive search.

Idea:

Let y+ey+e belogns to the coset C+e\mathcal{C}+e.

Moreover,y1+ey_1+e and y2+ey_2+e are in the same coset.

Standard Array

Let C=[n,k,d]F\mathcal{C}= [n,k,d]_{\mathbb{F}} and denote F=q|F|=q.

  • Then C=qk|\mathcal{C}|=q^k.
  • The number of cosets is qnkq^{n-k}.

Then we arrange all qnq^n elements of Fn\mathbb{F}^n into a qnk×qkq^{n-k}\times q^k array.

  • So that every row is a coset (including C\mathcal{C} itself)
  • Lightest word in each cosets on the leftmost column

Example

Let F=Z2\mathbb{F}=\mathbb{Z}_2 and C={xGxF2}C=\{xG|x\in \mathbb{F}_2\}

G=(1011001101)G=\begin{pmatrix} 1 & 0 & 1 & 1 & 0\\ 0 & 1 & 1 & 0 & 1 \end{pmatrix}

So C={00000,10110,01011,11101}\mathcal{C}=\{00000,10110,01011,11101\}.

Then G=[5,2,3]2G=[5,2,3]_2.

The standard array is:

First row is C\mathcal{C}.

Second row is C+(00001)\mathcal{C}+(00001),

Third row is C+(00010)\mathcal{C}+(00010).

Fourth row is C+(00100)\mathcal{C}+(00100).

00000101100101111101
00001101110101011100
00010101000100111110
00100100100110111000

Any two elements in a row are of the form y1=y1+ey_1'=y_1+e and y2=y2+ey_2'=y_2+e for some eFne\in \mathbb{F}^n.

Same syndrome if H(y1+e)=H(y2+e)H(y_1'+e)^\top=H(y_2'+e)^\top.

Entries in different rows have different syndrome.

Proof

Choose the lightest word in each coset on the leftmost column.

Time complexity: O(n(nk))O(n(n-k)). Space complexity: nFnn|F|^n space.

Compare with exhaustive search: Time: O(Fn)O(|F|^n).

Syndrome decoding - Intuition

Given yy', we identify the set C+e\mathcal{C} + e to which yy' belongs by computing the syndrome.

  • We identify ee as the coset leader (leftmost entry) of the row C+e\mathcal{C} + e.
  • We output the codeword in C\mathcal{C} which is closest (cc') by subtracting ee from yy'.

Syndrome decoding - Formal

Given yFny'\in \mathbb{F}^n, we identify the set C+e\mathcal{C}+e to which yy' belongs by computing the syndrome.

We identify ee as the coset leader (leftmost entry) of the row C+e\mathcal{C}+e.

We output the codeword in C\mathcal{C} which is closest (example c3c_3) by subtracting ee from yy'.

Last updated on