CSE5313 Coding and information theory for data science (Lecture 2)
Review on Channel coding
Let be the input alphabet, be the output alphabet.
e.g. .
Introduce noise: .
We use to denote the information to be transmitted
to be the codeword.
is the received codeword. given to the decoder.
is the decoded information word.
Error if .
Example:
Binary symmetric channel (BSC)
Every bit of is flipped with probability .
Binary erasure channel (BEC)
, very common in practice when we are unsure when the bit is transmitted.
is transmitted, is received.
is with probability , with probability .
Encoding
Encoding is a function from to .
Where is the codeword.
Assume , we don’t compress the information.
A code is a subset of .
Encoding is a one to one mapping from to .
In practice, we usually choose to be the size of .
Decoding
is a function from to .
The decoder then outputs the unique such that .
Our aim is to have .
Decoding error probability: .
where .
Our goal is to construct decoder such that is bounded.
Example:
Repetition code in binary symmetric channel:
Let . Every bit of is flipped with probability .
Say , and let .
Let the encoder be .
The decoder is , .
Exercise: Compute the error probability of the repetition code in binary symmetric channel.
Solution
Recall that .
Use binomial random variable:
The computation is identical for .
.
Maximum likelihood principle
For , the last example is maximum likelihood decoder.
Notice that and .
- If , then . is more likely to be transmitted than .
When and .
- If , then . is more likely to be transmitted than .
For , we just negate the above.
In general, Maximum likelihood decoder is .
Defining a “good” code
Two metrics:
- How many redundant bits are needed?
- e.g. repetition code: , sends redundant bits.
- What is the resulting error probability?
- Depends on the decoding function.
- Normally, maximum likelihood decoding is assumed.
- Should go zero with .
Definition for rate of code is .
More generally, .
Definition for information entropy
Let be a random variable over a discrete set .
- That is every has a probability .
The entropy of a discrete random variable is defined as:
when , we denote .
A deeper explanation will be given in the later in the course.
Which rate are possible?
Claude Shannon ‘48: Coding theorem of the BSC(binary symmetric channel)
Recall .
Let be the entropy function.
For every ,
- There exists of rates lengths and .
- That with Maximum likelihood decoding satisifies as .
For any ,
- Any sequence of rates lengths and ,
- Any andy decoding algorithm, as .
is the capacity of the BSC.
- Informally, the capacity is the best possible rate of the code (asymptotically).
- A special case of a broader theorem (Shannon’s coding theorem).
- We will see later in this course.
Polar codes, for explicit construction of codes with rate arbitrarily close to capacity.
BSC capacity - Intuition
Capacity of the binary symmetric channel with crossover probability .
A correct decoder essentially identifies two objects:
- The codeword
- The error word subtraction .
- and are independent of each other.
A typical has ‘s (law of large numbers), say .
Exercise:
for some goes to zero as .
Intuition
There exists typical error words.
To index those typical error words, we need . bits to identify the error word .
To encode the message, we need bits.
Since we send bits, the rate is , so .
So the rate cannot exceed .
Formal proof
And
So we need to check there exists such that
Test
Hamming distance
How to quantify the noise in the channel?
- Number of flipped bits.
Definition of Hamming distance:
- Denote and .
- .
Minimum hamminng distance:
- Let be a code.
- .
Hamming distance is a metric.
- equal iff .
- Triangle inequality:
Level of error handling
- error detection
- erasure correction
- error correction
Erasure: replacement of an entry by .
Error: substitution of one entry by a different one.
Example: If .
Error detection
Theorem: If , then there exists . that detects every patter of errors correctly.
- That is, we can identify if the channel introduced at most errors.
- No decoding is needed.
Idea:
Since , one needs errors to cause “confusion”.
Proof
The function
will only fails if there are errors.
Erasure correction
Theorem: If , then there exists . that recovers every patter of at most erasures.
Idea:
Suppose .
If erasures occurred, there might be two possible codewords .
If erasures occurred, there is only one possible codeword .
Error correction
Define the Hamming ball of radius centered at as:
Theorem: If , then there exists that corrects every pattern of at most errors.
Ideas:
The ball are disjoint.
Use closest neighbor decoding, use triangle inequality.
Intro to linear codes
Summary: a code of minimum hamming distance can
- detect errors.
- correct erasures.
- Correct errors.
Problems:
- How to construct good codes, and large?
- How good can these codes possibly be?
- How to encode?
- How to decode with noisy channel
Tools
- Linear algebra over finite fields.
Linear codes
Consider as a vector space, and let be a subspace.
are finites, we use finite fields (algebraic objects that “immitate” , ).
Formally, satisfy the field axioms.
Next Lectures:
- Field axioms
- Prime fields ()
- Field extensions (e.g. )