CSE4303 Introduction to Computer Security (Lecture 9)

Cryptographic Hash Functions

What is a Hash Function

A hash function maps a variable-length input to a fixed-length output.

$h : X \to Y$

Typical examples:

Java hashCode(): input is an Object, output is a 4-byte integer.
String polynomial hash example: $h("cs433s") = 'c' \cdot 31^6 + 's' \cdot 31^5 + \dots + 's'$

Key property:

Domain $|X|$ is much larger than range $|Y|$ .
Collisions are unavoidable in principle since $|X| > |Y|$ .

Main uses:

Compact numerical representation
Hash tables (Set, Map, dictionaries)
Object comparison
Integrity checking (fingerprint)

Security Properties

Let $h : X \to Y$ .

Preimage Resistance (One-way)
Given $y \in Y$ , it is computationally infeasible to find $x \in X$ such that
$h(x) = y$ .
Second Preimage Resistance (Weak collision resistance)
Given a specific $x \in X$ , it is computationally infeasible to find $x' \neq x$ such that
$h(x') = h(x)$ .
Collision Resistance (Strong collision resistance)
It is computationally infeasible to find any two distinct values $x, x' \in X$ such that
$h(x) = h(x')$ .

Adversarial definition:

Let $H : M \to T$ where $|M|$ is much larger than $|T|$ .
$H$ is collision resistant if for all efficient algorithms $A$ :

$Adv_{CR}[A, H] = Pr[A$ outputs a collision for $H]$

is negligible.

Generic Collision Attack (Birthday Attack)

Let $H : M \to \{0,1\}^n$ .

Generic algorithm to find a collision in time on the order of $2^{n/2}$ :

Choose $2^{n/2}$ random messages $m_1, \dots, m_{2^{n/2}}$ .
Compute $t_i = H(m_i)$ .
Look for $t_i = t_j$ .

Birthday phenomenon:

If the output space size is $B$ ,
high collision probability greater than $50\%$ occurs with about $\sqrt{B}$ samples.

Thus:

128-bit hash gives about $2^{64}$ collision attack
256-bit hash gives about $2^{128}$ collision attack

Practical Hash Functions

From performance and security table (AMD Opteron 2.2 GHz):

MD5: 128 bits, completely broken since 2004
SHA-1: 160 bits, practical collision attack demonstrated
SHA-256: 256 bits
SHA-512: 512 bits
Whirlpool: 512 bits

SHA-1 collision example: SHAttered attack (Google and CWI).
Two different PDF files were produced with identical SHA-1 hash.

Construction of Cryptographic Hash Functions

Merkle-Damgard Construction

Given compression function:

$h : T \times X \to T$

We build:

$H : X^{\le L} \to T$

Process:

Split message into blocks $m[0], m[1], \dots, m[L]$ .
Use fixed initialization vector $IV$ .
Iterate chaining:

$H_0 = IV$
$H_1 = h(H_0, m[0])$
$H_2 = h(H_1, m[1])$
$\dots$
$H_L = h(H_{L-1}, m[L])$
Apply padding: append $1000\ldots0$ concatenated with message length (64 bits).
If no space remains, add another block.

Theorem:
If compression function $h$ is collision resistant,
then $H$ is collision resistant.

Davies-Meyer Compression from Block Cipher

Given block cipher:

$E : K \times \{0,1\}^n \to \{0,1\}^n$

Define compression function:

$h(H, m) = E(m, H) \oplus H$

If $E$ behaves like an ideal cipher,
finding a collision in $h$ takes about $2^{n/2}$ evaluations.

This is optimal for $n$ -bit output.

Example: SHA-256

Built using:

Merkle-Damgard construction
Davies-Meyer style compression
Block cipher-like core: SHACAL-2

Structure:

512-bit message block
256-bit chaining value
256-bit output

Applications for Integrity and Authentication

Standalone Usage: Message Integrity

Application 1: Delayed Knowledge Verification

Idea: Publish $h(secret)$ first.
Later reveal secret.
Anyone can recompute hash and verify.

Justification: Preimage resistance ensures secret is hidden until revealed.

Example: Stock market prediction commitment.

Example for delayed knowledge verification

Publish $H("Stock will rise on May 1")$ .
On May 1, reveal the prediction string.
Anyone computes hash and checks equality.

Application 2: Password Storage

Model: System must verify password but not store plaintext.

Solution: Store hash of password.
During login:

Hash input
Compare with stored value

Example: Linux stores hashed passwords in the /etc/shadow file.
Includes:

Salt
Password hash
Metadata

Security relies on:

One-way property
Salting to prevent precomputed attacks

Application 3: Trusted Timestamping and Blockchains

Goal: Prove document existed before a given date.

Methods:

Publish document hash in newspaper.
Time Stamping Authority signs hash.
Publish hash in blockchain block.

Blockchain relies on:

One-way hash functions
Linking blocks via hash pointers

Application 4: Software Integrity with Secure Read-Only Space

Context: Trusted read-only public space (for example official website).

Process:

Publisher computes $H(F_1), H(F_2), \dots, H(F_n)$ .
Publish hashes publicly.
User downloads file $F_i$ and verifies hash.

If $H$ is collision resistant:
Attacker cannot modify file without detection.

No encryption required.
Public verifiability works if read-only space is trusted.

Symmetric Crypto Authentication: MACs and AE

This section can also be found here CSE442T Introduction to Cryptography (Lecture 18)

Message Authentication Codes (MACs)

Definition: MAC $I = (S, V)$ over $(K, M, T)$

$S(k, m) \to t$
$V(k, m, t) \to$ yes or no

Security model: Attacker can query $S(k, m_i)$ .
Goal: produce new $(m, t)$ not previously seen such that $V$ accepts.

$Adv_{MAC}[A, I]$ must be negligible.

MAC from PRF

Given PRF:

$F : K \times X \to Y$

Define MAC:

$S(k, m) = F(k, m)$
$V(k, m, t)$ accepts if $t = F(k, m)$

Theorem: If $F$ is secure PRF and $|Y|$ is large,
then derived MAC is secure.

Condition: $1 / |Y|$ must be negligible.
Example: $|Y| = 2^{80}$ .