Coding and Information Theory Crash Course
Encoding
Let be two finite sets with size respectively.
Let be the word semigroup generated by .
A one-to-one mapping is called a code with message alphabet and encoded alphabet .
Example:
- RGB color space
- is a code
For example, ,
Uniquely decipherable codes
A code is called uniquely decipherable if the extension code
is one-to-one.
Example:
- , , ,
is uniquely decipherable.
- , , ,
is not uniquely decipherable.
Since
Irreducible codes
A code is called irreducible if for any , for some .
This condition ensures that every message used in the encoding process is uniquely decipherable.
Means that there is no ambiguity in the decoding process.
Theorem 1.1.1 Sardinas and Patterson Theorem
Let be alphabet sets of size respectively.
Let be a code that is uniquely decipherable.
Then
where is the length of the codeword . (Number of elements used in the codeword for )
Proof:
Let denote the max length of the codeword for any .
Let be the number of codewords of length .
Then
Note that is the length of the codeword.
The max number of elements that can be represented by elements in is , so .
This gives that the power series
converges to .
So the sum of the probabilities of all the codewords must be less than or equal to 1.
Sardinas and Patterson Algorithm
Let be the message alphabet and be the encoded alphabet.
We test whether the code is uniquely decipherable.
def is_uniquely_decipherable(f):
contain_common_prefix =
message_space=set()
for x in A:
if f(x) in message_space:
return False
message_space.add(f(x))
for i in range(1,len(f(x))):
if f(x)[:i] in message_space:
contain_common_prefix = True
while contain_common_prefix:
contain_common_prefix = False
for x in message_space:
for y in message_space:
code_length = min(len(x),len(y))
if x[:code_length] == y[:code_length]:
contain_common_prefix = True
if len(x) < len(y):
message_space.add(y[code_length:])
else:
message_space.add(x[code_length:])
break
return TrueShannon’s source coding theorem
Definition 1.1.4
An elementary information source is a pair where is an alphabet and is a probability distribution on . is a function such that .
The mean code word length of an information source given a code is defined as
The of the mean code word length is defined as
Lemma: Jenson’s inequality
Let be a convex function on the interval . Then for any and such that , we have
Proof:
If is a convex function, there are three properties that useful for the proof:
- for all
- For any , (Take tangent line at )
- For any and , we have (Take line connecting and )
We use , we replace and , we have
We sum all the inequalities, we have
Theorem 1.1.5
Shannon’s source coding theorem
Let be an elementary information source and let be the size of the encoded alphabet .
Then
where is the minimum mean code word length of all uniquely decipherable codes for .
Proof
First, we show that
Let be a map satisfying
We defined the probability distribution on as
Write so that .
Since ,
,
We have
Without loss of generality, we assume that for all .
is also called Kullback–Leibler divergence or relative entropy.
Since is a concave function, by Jensen’s inequality , we have
So
(This is also known as Gibbs’ inequality: Put in words, the information entropy of a distribution is less than or equal to its cross entropy with any other distribution .)
Since , , we have
Second, we show that
By Theorem 1.1.1, there exists an irreducible code such that for all .
So
Entropy
Let be an alphabet and let be the set of all probability distributions on . Any element is thus a map from to such that .
Thus is a compact conves subset of real linear space .
We define the map as
Basic properties of
- if and only if is a point mass.
- if and only if is uniform.
Huffman’s coding algorithm
Huffman’s coding algorithm is a greedy algorithm that constructs an optimal prefix code for a given probability distribution.
The algorithm is as follows:
- Sort the symbols by their probabilities in descending order.
- Merge the two symbols with the smallest probabilities and assign a new codeword to the merged symbol.
- Repeat step 2 until there is only one symbol left.
import heapq
def huffman(frequencies):
class Node:
def __init__(self, symbol=None, freq=0, left=None, right=None):
self.symbol = symbol
self.freq = freq
self.left = left
self.right = right
def __lt__(self, other): # For priority queue
return self.freq < other.freq
def is_leaf(self):
return self.symbol is not None
# Build the Huffman tree
heap = [Node(sym, freq) for sym, freq in frequencies.items()]
heapq.heapify(heap)
while len(heap) > 1:
left = heapq.heappop(heap)
right = heapq.heappop(heap)
merged = Node(freq=left.freq + right.freq, left=left, right=right)
heapq.heappush(heap, merged)
root = heap[0]
# Assign codes by traversing the tree
codebook = {}
def assign_codes(node, code=""):
if node.is_leaf():
codebook[node.symbol] = code
else:
assign_codes(node.left, code + "0")
assign_codes(node.right, code + "1")
assign_codes(root)
# Helper to pretty print the binary tree
def tree_repr(node, prefix=""):
if node.is_leaf():
return f"{prefix}→ {repr(node.symbol)} ({node.freq})\n"
else:
s = f"{prefix}• ({node.freq})\n"
s += tree_repr(node.left, prefix + " 0 ")
s += tree_repr(node.right, prefix + " 1 ")
return s
print("Huffman Codebook:")
for sym in sorted(codebook, key=lambda s: frequencies[s], reverse=True):
print(f"{repr(sym)}: {codebook[sym]}")
print("\nCoding Tree:")
print(tree_repr(root))
return codebookDefinition 1.2.1
Let be an alphabet and let be a probability distribution on .
Proof:
We use mathematical induction on the number of symbols .
Base Case:
Only two symbols — assign one “0”, the other “1”. This is clearly optimal (only possible choice).
Inductive Step:
Assume that Huffman’s algorithm produces an optimal code for symbols.
Now, given symbols, Huffman merges the two least probable symbols into , and constructs an optimal code recursively for symbols. n−1 symbols.
By the inductive hypothesis, the code on is optimal. is optimal.
By Step 2 above, assigning the two merged symbols and codewords and (based on 1.1.4) results in the optimal solution for .
Therefore, by induction, Huffman’s algorithm gives an optimal prefix code for any .