Skip to Content
CSE5313CSE5313 Coding and information theory for data science (Lecture 2)

CSE5313 Coding and information theory for data science (Lecture 2)

Review on Channel coding

Let FF be the input alphabet, Φ\Phi be the output alphabet.

e.g. F={0,1},RF=\{0,1\},\mathbb{R}.

Introduce noise: Pr(c receivedc transmitted)\operatorname{Pr}(c'\text{ received}|c\text{ transmitted}).

We use uu to denote the information to be transmitted

cc to be the codeword.

cc' is the received codeword. given to the decoder.

uu' is the decoded information word.

Error if uuu' \neq u.

Example:

Binary symmetric channel (BSC)

F=Φ={0,1}F=\Phi=\{0,1\}

Every bit of cc is flipped with probability pp.

Binary erasure channel (BEC)

F=Φ={0,1,}F=\Phi=\{0,1,*\}, very common in practice when we are unsure when the bit is transmitted.

cc is transmitted, cc' is received.

cc' is cc with probability 1p1-p, ee with probability pp.

Encoding

Encoding EE is a function from FkF^k to FnF^n.

Where E(u)=cE(u)=c is the codeword.

Assume nkn\geq k, we don’t compress the information.

A code C\mathcal{C} is a subset of FnF^n.

Encoding is a one to one mapping from FkF^k to C\mathcal{C}.

In practice, we usually choose CFn\mathcal{C}\subseteq F^n to be the size of FkF^k.

Decoding

DD is a function from Φn\Phi^n to C\mathcal{C}.

D(c)=c^D(c')=\hat{c}

The decoder then outputs the unique uu' such that E(u)=c^E(u')=\hat{c}.

Our aim is to have u=uu=u'.

Decoding error probability: Perr=maxcCPerr(c)\operatorname{P}_{err}=\max_{c\in \mathcal{C}}\operatorname{P}_{err}(c).

where Perr(c)=yD(y)cPr(y receivedc transmitted)\operatorname{P}_{err}(c)=\sum_{y|D(y)\neq c}\operatorname{Pr}(y\text{ received}|c\text{ transmitted}).

Our goal is to construct decoder DD such that Perr\operatorname{P}_{err} is bounded.

Example:

Repetition code in binary symmetric channel:

Let F=Φ={0,1}F=\Phi=\{0,1\}. Every bit of cc is flipped with probability pp.

Say k=1k=1, n=3n=3 and let C={000,111}\mathcal{C}=\{000,111\}.

Let the encoder be E(u)=uuuE(u)=u u u.

The decoder is D(000)=D(100)=D(010)=D(001)=0D(000)=D(100)=D(010)=D(001)=0, D(110)=D(101)=D(011)=D(111)=1D(110)=D(101)=D(011)=D(111)=1.

Exercise: Compute the error probability of the repetition code in binary symmetric channel.

Solution

Recall that Perr(c)=yD(y)cPr(y receivedc transmitted)P_{err}(c)=\sum_{y|D(y)\neq c}\operatorname{Pr}(y\text{ received}|c\text{ transmitted}).

Use binomial random variable:

Perr(000)=yD(y)000Pr(y received000 transmitted)=Pr(2 flipes or more)=(n2)p2(1p)+(n3)p3=3p2(1p)+p3\begin{aligned} P_{err}(000)&=\sum_{y|D(y)\neq 000}\operatorname{Pr}(y\text{ received}|000\text{ transmitted})\\ &=\operatorname{Pr}(2\text{ flipes or more})\\ &=\binom{n}{2}p^2(1-p)+\binom{n}{3}p^3\\ &=3p^2(1-p)+p^3\\ \end{aligned}

The computation is identical for 111111.

Perr=max{Perr(000),Perr(111)}=Perr(000)=3p2(1p)+p3P_{err}=\max\{P_{err}(000),P_{err}(111)\}=P_{err}(000)=3p^2(1-p)+p^3.

Maximum likelihood principle

For p1/2p\leq 1/2, the last example is maximum likelihood decoder.

Notice that Pr(c=000c=000)=(1p)3\operatorname{Pr}(c'=000|c=000)=(1-p)^3 and Pr(c=000c=111)=p3\operatorname{Pr}(c'=000|c=111)=p^3.

  • If p1/2p\leq 1/2, then (1p)3p3(1-p)^3\geq p^3. c=000c=000 is more likely to be transmitted than c=111c=111.

When Pr(c=001c=000)=(1p)2p\operatorname{Pr}(c'=001|c=000)=(1-p)^2p and Pr(c=001c=111)=p2(1p)\operatorname{Pr}(c'=001|c=111)=p^2(1-p).

  • If p1/2p\leq 1/2, then (1p)2pp2(1p)(1-p)^2p\geq p^2(1-p). c=001c=001 is more likely to be transmitted than c=110c=110.

For p>1/2p>1/2, we just negate the above.

In general, Maximum likelihood decoder is D(c)=argmaxcCPr(c receivedc transmitted)D(c')=\arg\max_{c\in \mathcal{C}}\operatorname{Pr}(c'\text{ received}|c\text{ transmitted}).

Defining a “good” code

Two metrics:

  • How many redundant bits are needed?
    • e.g. repetition code: k=1k=1, n=3n=3 sends 22 redundant bits.
  • What is the resulting error probability?
    • Depends on the decoding function.
    • Normally, maximum likelihood decoding is assumed.
    • Should go zero with nn.

Definition for rate of code is kn\frac{k}{n}.

More generally, logFCn\log_{|F|}\frac{|\mathcal{C}|}{n}.

Definition for information entropy

Let XX be a random variable over a discrete set X\mathcal{X}.

  • That is every xXx\in \mathcal{X} has a probability Pr(X=x)\operatorname{Pr}(X=x).

The entropy H(X)H(X) of a discrete random variable XX is defined as:

H(X)=ExXlog1Pr(x)=xXPr(x)logPr(x)H(X)=\mathbb{E}_{x\sim X}{\log \frac{1}{\operatorname{Pr}(x)}}=-\sum_{x\in \mathcal{X}}\operatorname{Pr}(x)\log \operatorname{Pr}(x)

when X=Bernouili(p)X=Bernouili(p), we denote H(X)=H(p)=plogp(1p)log(1p)H(X)=H(p)=-p\log p-(1-p)\log (1-p).

A deeper explanation will be given in the later in the course.

Which rate are possible?

Claude Shannon ‘48: Coding theorem of the BSC(binary symmetric channel)

Recall r=knr=\frac{k}{n}.

Let H()H(\cdot) be the entropy function.

For every 0r<1H(p)0\leq r<1-H(p),

  • There exists C1,C2,\mathcal{C}_1, \mathcal{C}_2,\ldots of rates r1,r2,r_1,r_2,\ldots lengths n1,n2,n_1,n_2,\ldots and rirr_i\geq r.
  • That with Maximum likelihood decoding satisifies Perr0P_{err}\to 0 as ii\to \infty.

For any R1H(p)R\geq 1-H(p),

  • Any sequence C1,C2,\mathcal{C}_1, \mathcal{C}_2,\ldots of rates r1,r2,r_1,r_2,\ldots lengths n1,n2,n_1,n_2,\ldots and riRr_i\geq R,
  • Any andy decoding algorithm, Perr1P_{err}\to 1 as ii\to \infty.

1H(p)1-H(p) is the capacity of the BSC.

  • Informally, the capacity is the best possible rate of the code (asymptotically).
  • A special case of a broader theorem (Shannon’s coding theorem).
  • We will see later in this course.

Polar codes, for explicit construction of codes with rate arbitrarily close to capacity.

BSC capacity - Intuition

Capacity of the binary symmetric channel with crossover probability p=1H(p)p=1-H(p).

A correct decoder ccc'\to c essentially identifies two objects:

  • The codeword cc
  • The error word e=cce=c'-c subtraction mod2\mod 2.
  • cc and ee are independent of each other.

A typical ee has np\approx np 11‘s (law of large numbers), say n(p±δ)n(p\pm \delta).

Exercise:

Pr(e)=pn(p±δ)(1p)n(1p±δ)=2n(H(p)+ϵ)\operatorname{Pr}(e)=p^{n(p\pm \delta)}(1-p)^{n(1-p\pm \delta)}=2^{-n(H(p)+\epsilon)} for some ϵ\epsilon goes to zero as nn\to \infty.

Intuition

There exists 2n(H(p)\approx 2^{n(H(p)} typical error words.

To index those typical error words, we need log2(2nH(p))=nH(p)+O(1)\log_2 (2^{nH(p)})=nH(p)+O(1). bits to identify the error word ee.

To encode the message, we need log2C\log_2 |\mathcal{C}| bits.

Since we send nn bits, the rate is k+nH(p)+O(1)nk+nH(p)+O(1)\leq n, so kn1H(p)\frac{k}{n}\leq 1-H(p).

So the rate cannot exceed 1H(p)1-H(p).

Formal proof

Pr(e)=pn(p±δ)(1p)n(1p±δ)=pnp(1p)n(1p)p±nδ(1p)nδ\begin{aligned} \operatorname{Pr}(e)&=p^{n(p\pm \delta)}(1-p)^{n(1-p\pm \delta)}\\ &=p^{np}(1-p)^{n(1-p)}p^{\pm n\delta}(1-p)^{\mp n\delta}\\ \end{aligned}

And

2n(H(p)+ϵ)=2n(plogp(1p)log(1p)+ϵ)=2nplogp2n(1p)log(1p)2nϵ=pnp(1p)n(1p)2nϵ\begin{aligned} 2^{-n(H(p)+\epsilon)}&=2^{-n(-p\log p-(1-p)\log (1-p)+\epsilon)}\\ &=2^{np\log p}2^{n(1-p)\log (1-p)}2^{-n\epsilon}\\ &=p^{np}(1-p)^{n(1-p)}2^{-n\epsilon}\\ \end{aligned}

So we need to check there exists ϵ>0\epsilon>0 such that

limnp±nδ(1p)nδ2nϵ\lim_{n\to \infty}p^{\pm n\delta}(1-p)^{\mp n\delta}\leq 2^{-n\epsilon}

Test

2nϵ=pnp(1p)n(1p)2nϵnϵ=δnlogpδnlog(1p)ϵ=δ(log(1p)logp)\begin{aligned} 2^{-n\epsilon}&=p^{np}(1-p)^{n(1-p)}2^{-n\epsilon}\\ -n\epsilon&=\delta n\log p-\delta n\log (1-p)\\ \epsilon&=\delta (\log (1-p)-\log p)\\ \end{aligned}

Hamming distance

How to quantify the noise in the channel?

  • Number of flipped bits.

Definition of Hamming distance:

  • Denote c=(c1,c2,,cn)c=(c_1,c_2,\ldots,c_n) and c=(c1,c2,,cn)c'=(c'_1,c'_2,\ldots,c'_n).
  • dH(c,c)=i=1n[cici]d_H(c,c')=\sum_{i=1}^n [c_i\neq c'_i].

Minimum hamminng distance:

  • Let C\mathcal{C} be a code.
  • dH(C)=minc1,c2C,c1c2dH(c1,c2)d_H(\mathcal{C})=\min_{c_1,c_2\in \mathcal{C},c_1\neq c_2}d_H(c_1,c_2).

Hamming distance is a metric.

  • dH(x,y)0d_H(x,y)\geq 0 equal iff x=yx=y.
  • dH(x,y)=dH(y,x)d_H(x,y)=d_H(y,x)
  • Triangle inequality: dH(x,y)dH(x,z)+dH(z,y)d_H(x,y)\leq d_H(x,z)+d_H(z,y)

Level of error handling

  • error detection
  • erasure correction
  • error correction

Erasure: replacement of an entry by ∉F*\not\in F.

Error: substitution of one entry by a different one.

Example: If dH(C)=dd_H(\mathcal{C})=d.

Error detection

Theorem: If dH(C)=dd_H(\mathcal{C})=d, then there exists f:FnC{"error detected"}f:F^n\to \mathcal{C}\cap \{\text{"error detected"}\}. that detects every patter of d1\leq d-1 errors correctly.

  • That is, we can identify if the channel introduced at most d1d-1 errors.
  • No decoding is needed.

Idea:

Since dH(C)=dd_H(\mathcal{C})=d, one needs d\geq d errors to cause “confusion”.

Proof

The function

f(y)={y if yC"error detected"otherwisef(y)=\begin{cases} y\text{ if }y\in \mathcal{C}\\ \text{"error detected"} & \text{otherwise} \end{cases}

will only fails if there are d\geq d errors.

Erasure correction

Theorem: If dH(C)=dd_H(\mathcal{C})=d, then there exists f:{Fn{}}C{"failed"}f:\{F^n\cup \{*\}\}\to \mathcal{C}\cap \{\text{"failed"}\}. that recovers every patter of at most d1d-1 erasures.

Idea:

Suppose d=4d=4.

If 44 erasures occurred, there might be two possible codewords c,cCc,c'\in \mathcal{C}.

If 3\leq 3 erasures occurred, there is only one possible codeword cCc\in \mathcal{C}.

Error correction

Define the Hamming ball of radius rr centered at cc as:

BH(c,r)={yFn:dH(c,y)r}B_H(c,r)=\{y\in F^n:d_H(c,y)\leq r\}

Theorem: If dH(C)dd_H(\mathcal{C})\geq d, then there exists f:FnCf:F^n\to \mathcal{C} that corrects every pattern of at most d12\lfloor \frac{d-1}{2}\rfloor errors.

Ideas:

The ball {BH(c,d12)cC}\{B_H(c,\lfloor \frac{d-1}{2}\rfloor)|c\in \mathcal{C}\} are disjoint.

Use closest neighbor decoding, use triangle inequality.

Intro to linear codes

Summary: a code of minimum hamming distance dd can

  • detect d1\leq d-1 errors.
  • correct d1\leq d-1 erasures.
  • Correct d12\leq \lfloor \frac{d-1}{2}\rfloor errors.

Problems:

  • How to construct good codes, k/nk/n and dd large?
  • How good can these codes possibly be?
  • How to encode?
  • How to decode with noisy channel

Tools

  • Linear algebra over finite fields.

Linear codes

Consider FnF^n as a vector space, and let CFn\mathcal{C}\subseteq F^n be a subspace.

F,ΦF,\Phi are finites, we use finite fields (algebraic objects that “immitate” Rn\mathbb{R}^n, Cn\mathbb{C}^n).

Formally, satisfy the field axioms.

Next Lectures:

  • Field axioms
  • Prime fields (Fp\mathbb{F}_p)
  • Field extensions (e.g. Fpt\mathbb{F}_{p^t})
Last updated on