CSE5313 Coding and information theory for data science (Lecture 25)

Polynomial Evaluation

Problem formulation:

We have $K$ datasets $X_1,X_2,\ldots,X_K$ .
Want to compute some polynomial function $f$ $f$ of degree $d$ $d$ on each dataset.
- Want $f(X_1),f(X_2),\ldots,f(X_K)$ .
Examples:
- $X_1,X_2,\ldots,X_K$ are points in $\mathbb{F}^{M\times M}$ , and $f(X)=X^8+3X^2+1$ .
- $X_k=(X_k^{(1)},X_k^{(2)})$ , both in $\mathbb{F}^{M\times M}$ , and $f(X)=X_k^{(1)}X_k^{(2)}$ .
- Gradient computation.

$P$ worker nodes:

Some are stragglers, i.e., not responsive.
Some are adversaries, i.e., return erroneous results.
Privacy: We do not want to expose datasets to worker nodes.

Lagrange Coded Computing

Let $\ell(z)$ be a polynomial whose evaluations at $\omega_1,\ldots,\omega_{K}$ are $X_1,\ldots,X_K$ .

That is, $\ell(\omega_i)=X_i$ for every $\omega_i\in \mathbb{F}, i\in [K]$ .

Some example constructions:

Given $X_1,\ldots,X_K$ with corresponding $\omega_1,\ldots,\omega_K$

$\ell(z)=\sum_{i=1}^K X_i\ell_i(z)$ , where $\ell_i(z)=\prod_{j\in[K],j\neq i} \frac{z-\omega_j}{\omega_i-\omega_j}=\begin{cases} 0 & \text{if } j\neq i \\ 1 & \text{if } j=i \end{cases}$ .

Then every $f(X_i)=f(\ell(\omega_i))$ is an evaluation of polynomial $f\circ \ell(z)$ at $\omega_i$ .

If the master obtains the composition $h=f\circ \ell$ , it can obtain every $f(X_i)=h(\omega_i)$ .

Goal: The master wished to obtain the polynomial $h(z)=f(\ell(z))$ .

Intuition:

Encoding is performed by evaluating $\ell(z)$ at $\alpha_1,\ldots,\alpha_P\in \mathbb{F}$ , and $P>K$ for redundancy.
Nodes apply $f$ on an evaluation of $\ell$ and obtain an evaluation of $h$ .
The master receives some potentially noisy evaluations, and finds $h$ .
The master evaluates $h$ at $\omega_1,\ldots,\omega_K$ to obtain $f(X_1),\ldots,f(X_K)$ .

Encoding for Lagrange coded computing

Need polynomial $\ell(z)$ such that:

$X_k=\ell(\omega_k)$ for every $k\in [K]$ .

Having obtained such $\ell$ we let $\tilde{X}_i=\ell(\alpha_i)$ for every $i\in [P]$ .

$span{\tilde{X}_1,\tilde{X}_2,\ldots,\tilde{X}_P}=span{\ell_1(x),\ell_2(x),\ldots,\ell_P(x)}$ .

Want $X_k=\ell(\omega_k)$ for every $k\in [K]$ .

Tool: Lagrange interpolation.

$\ell_k(z)=\prod_{i\neq k} \frac{z-\omega_j}{\omega_k-\omega_j}$ .
$\ell_k(\omega_k)=1$ and $\ell_k(\omega_k)=0$ for every $j\neq k$ .
$\deg \ell_k(z)=K-1$ .

Let $\ell(z)=\sum_{k=1}^K X_k\ell_k(z)$ .

$\deg \ell\leq K-1$ .
$\ell(\omega_k)=X_k$ for every $k\in [K]$ .

Let $\tilde{X}_i=\ell(\alpha_i)=\sum_{k=1}^K X_k\ell_k(\alpha_i)$ .

Every $\tilde{X}_i$ is a linear combination of $X_1,\ldots,X_K$ .

(\tilde{X}_1,\tilde{X}_2,\ldots,\tilde{X}_P)=(X_1,\ldots,X_K)\cdot G=(X_1,\ldots,X_K)\begin{bmatrix} \ell_1(\alpha_1) & \ell_1(\alpha_2) & \cdots & \ell_1(\alpha_P) \\ \ell_2(\alpha_1) & \ell_2(\alpha_2) & \cdots & \ell_2(\alpha_P) \\ \vdots & \vdots & \ddots & \vdots \\ \ell_K(\alpha_1) & \ell_K(\alpha_2) & \cdots & \ell_K(\alpha_P) \end{bmatrix}

This $G$ is called a Lagrange matrix with respect to

$\omega_1,\ldots,\omega_K$ . (interpolation points, rows)
$\alpha_1,\ldots,\alpha_P$ . (evaluation points, columns)

Basically, a modification of Reed-Solomon code.

Decoding for Lagrange coded computing

Say the system has $S$ stragglers (erasures) and $A$ adversaries (errors).

The master receives $P-S$ computation results $f(\tilde{X}_{i_1}),\ldots,f(\tilde{X}_{i_{P-S}})$ .

By design, therese are evaluations of $h: h(a_{i_1})=f(\ell(a_{i_1})),\ldots,h(a_{i_{P-S}})=f(\ell(a_{i_{P-S}}))$
A evaluation are noisy
$\deg h=\deg f\cdot \deg \ell=(K-1)\deg f$ .

Which process enables to interpolate a polynomial from noisy evaluations?

Ree-Solomon (RS) decoding.

Fact: Reed-Solomon decoding succeeds if and only if the number of erasures + 2 $\times$ the number of errors $\leq d-1$ .

Imagine $h$ as the “message” in Reed-Solomon code. $[P,(K-1)\deg f +1,P-(K-1)\deg f]_q$ .

Interpolating $h$ is possible if and only if $S+2A\leq (K-1)\deg f-1$ .

Once the master interpolates $h$ .

The evaluations $h(\omega_i)=f(\ell(\omega_i))=f(X_i)$ provides the interpolation results.

Theorem of Lagrange coded computing

Lagrange coded computing enables to compute $\{f(X_i)\}_{i=1}^K$ for any $f$ at the presence of at most $S$ stragglers and at most $A$ adversaries if

(K-1)\deg f+S+2A+1\leq P

Interpolation of result does not depend on $P$ (number of worker nodes).

Privacy for Lagrange coded computing

Currently any size- $K$ group of colluding nodes reveals the entire dataset.

Q: Can an individual node $i$ learn anything about $X_i$ ?

A: Yes, since $\tilde{X}_i$ is a linear combination of $X_1,\ldots,X_K$ (partial knowledge, a linear combination of private data).

Can we provide perfect privacy given that at most $T$ nodes collude?

That is, $I(X:\tilde{X}_i)=0$ $I (X : \tilde{X}_{i}) = 0$ for every $\mathcal{T}\subseteq [P]$ $T \subseteq [P]$ of size at most $T$ $T$ , where
- $X=(X_1,\ldots,X_K)$ , and
- $\tilde{X}_\mathcal{T}=(\tilde{X}_{i_1},\ldots,\tilde{X}_{i_{|\mathcal{T}|}})$ .

Solution: Slight change of encoding in LLC.

This only applied to $\mathbb{F}=\mathbb{F}_q$ (no perfect privacy over $\mathbb{R},\mathbb{C}$ . No uniform distribution can be defined).

The master chooses

$T$ keys $Z_{K+1},\ldots,Z_{K+T}$ uniformly at random ( $|Z_i|=|X_i|$ for all $i$ )
Interpolation points $\omega_1,\ldots,\omega_{K+T}$ .

Find the Lagrange polynomial $\ell(z)$ such that

$\ell(w_i)=X_i$ for $i\in [K]$
$\ell(w_{K+j})=Z_j$ for $j\in [T]$ .

Lagrange interpolation:

\ell(z)=\sum_{i=1}^{K} X_i\ell_i(z)+\sum_{j=1}^{T} X_{K+j}ell_{K+j}(z)

(\tilde{X}_1,\ldots,\tilde{X}_P)=(X_1,\ldots,X_K,Z_1,\ldots,Z_T)\cdot G

where

G=\begin{bmatrix} \ell_1(\alpha_1) & \ell_1(\alpha_2) & \cdots & \ell_1(\alpha_P) \\ \ell_2(\alpha_1) & \ell_2(\alpha_2) & \cdots & \ell_2(\alpha_P) \\ \vdots & \vdots & \ddots & \vdots \\ \ell_K(\alpha_1) & \ell_K(\alpha_2) & \cdots & \ell_K(\alpha_P) \\ \vdots & \vdots & \ddots & \vdots \\ \ell_{K+T}(\alpha_1) & \ell_{K+T}(\alpha_2) & \cdots & \ell_{K+T}(\alpha_P) \end{bmatrix}

For analysis, we denote $G=\begin{bmatrix}G^{top}\\G^{bot}\end{bmatrix}$ , where $G^{top}\in \mathbb{F}^{K\times P}$ and $G^{bot}\in \mathbb{F}^{T\times P}$ .

The proof for privacy is the almost the same as ramp scheme.

Proof

We have $(\tilde{X}_1,\ldots \tilde{X}_P)=(X_1,\ldots,X_K)\cdot G^{top}+(Z_1,\ldots,Z_T)\cdot G^{bot}$ .

Without loss of generality, $\mathcal{T}=[T]$ is the colluding set.

$\mathcal{T}$ hold $(\tilde{X}_1,\ldots \tilde{X}_P)=(X_1,\ldots,X_K)\cdot G^{top}_\mathcal{T}+(Z_1,\ldots,Z_T)\cdot G^{bot}_\mathcal{top}$ .

$G^{top}_\mathcal{T}$ , $G^{bot}_\mathcal{T}$ contain the first $T$ columns of $G^{top}$ , $G^{bot}$ , respectively.

Note that $G^{top}\in \mathbb{F}^{T\times P}_q$ is MDS, and hence $G^{top}_\mathcal{T}$ is a $T\times T$ invertible matrix.

Since $Z=(Z_1,\ldots,Z_T)$ chosen uniformly random, so $Z\cdot G^{bot}_\mathcal{T}$ is a one-time pad.

Same proof for decoding, we only need $K+1$ item to make the interpolation work.

Conclusion

Theorem: Lagrange Coded Computing is resilient against $S$ $S$ stragglers, $A$ $A$ adversaries, and $T$ $T$ colluding nodes if $P\geq (K+T-1)\deg f+S+2A+1$
- Privacy (increase with $\deg f$ ) cost more than the straggler and adversary (increase linearly).
Caveat: Requires finite field arithmetic!
Some follow-up works analyzed information leakage over the reals

Side note for Blockchain

Blockchain: A decentralized system for trust management.

Blockchain maintains a chain of blocks.

A block contains a set of transactions.
Transaction = value transfer between clients.
The chain is replicated on each node.

Periodically, a new block is proposed and appended to each local chain.

The block must not contain invalid transactions.
Nodes must agree on proposed block.

Existing systems:

All nodes perform the same set of tasks.
Every node must receive every block.

Performance does not scale with number of node

Improving performance of blockchain

The performance of blockchain is inherently limited by its design.

All nodes perform the same set of tasks.
Every node must receive every block.

Idea: Combine blockchain with distributed computing.

Node tasks should complement each other.

Sharding (notion from databases):

Nodes are partitioned into groups of equal size.
Each group maintains a local chain.
More nodes, more groups, more transactions can be processed.
Better performance.

Security Problem

Biggest problem in blockchains: Adversarial (Byzantine) nodes.

Malicious actors wish to include invalid transactions.

Solution in traditional blockchains: Consensus mechanisms.

Algorithms for decentralized agreement.
Tolerates up to $1/3$ Byzantine nodes.

Problem: Consensus conflicts with sharding.

Traditional consensus mechanisms tolerate $\approx 1/3$ Byzantine nodes.
If we partition $P$ $P$ nodes into $K$ $K$ groups, we can tolerate only $P/3K$ $P /3 K$ node failures.
- Down from $P/3$ in non-shared systems.

Goal: Solve the consensus problem in sharded systems.

Tool: Coded computing.

Problem formulation

At epoch $t$ of a shared blockchain system, we have

$K$ local chain $Y_1^{t-1},\ldots, Y_K^{t-1}$ .
$K$ new blocks $X_1(t),\ldots,X_K(t)$ .
A polynomial verification function $f(X_k(t),Y_k^t)$ , which validates $X_k(t)$ .

Proof

Balance check function $f(X_k(t),Y_k^t)=\sum_\tau Y_k(\tau)-X_k(t)$ .

More commonly, a (polynomial) hash function. Used to:

Verify the sender’s public key.
Verify the ownership of the transferred funds.

Need: Apply a polynomial functions on $K$ datasets.

Lagrange coded computing!

Blockchain with Lagrange coded computing

At epoch $t$ :

A leader is elected (using secure election mechanism).
The leader receives new blocks $X_1(t),\ldots,X_K(t)$ .
The leader disperses the encoded blocks $\tilde{X}_1(t),\ldots,\tilde{X}_P(t)$ $\tilde{X}_{1} (t), \dots, \tilde{X}_{P} (t)$ to nodes.
- Needs secure information dispersal mechanisms.

Every node $i\in [P]$ :

Locally stores a coded chain $\tilde{Y}_i^t$ (encoded using LCC).
Receives $\tilde{X}_i(t)$ .
Computes $f(\tilde{X}_i(t),\tilde{Y}_i^t)$ and sends to the leader.

The leader decodes to get $\{f(X_i(t),Y_i^t)\}_{i=1}^K$ and disperse securely to nodes.

Node $i$ appends coded block $\tilde{X}_i(t)$ to coded chain $\tilde{Y}_i^t$ (zeroing invalid transactions).

Guarantees security if $P\geq (K+T-1)\deg f+S+2A+1$ .

$A$ adversaries, degree $d$ verification polynomial.

Sharding without sharding:

Computations are done on (coded) partial chains/blocks.
- Good performance!
Since blocks/chains are coded, they are “dispersed” among many nodes.
- Security problem in sharding solved!
Since the encoding is done (securely) through a leader, no need to send every block to all nodes.
- Reduced communication! (main bottleneck).

Novelties:

First decentralized verification system with less than size of blocks times the number of nodes communication.
Coded consensus – Reach consensus on coded data.