Skip to Content
CSE5313CSE5313 Coding and information theory for data science (Lecture 25)

CSE5313 Coding and information theory for data science (Lecture 25)

Polynomial Evaluation

Problem formulation:

  • We have KK datasets X1,X2,,XKX_1,X_2,\ldots,X_K.
  • Want to compute some polynomial function ff of degree dd on each dataset.
    • Want f(X1),f(X2),,f(XK)f(X_1),f(X_2),\ldots,f(X_K).
  • Examples:
    • X1,X2,,XKX_1,X_2,\ldots,X_K are points in FM×M\mathbb{F}^{M\times M}, and f(X)=X8+3X2+1f(X)=X^8+3X^2+1.
    • Xk=(Xk(1),Xk(2))X_k=(X_k^{(1)},X_k^{(2)}), both in FM×M\mathbb{F}^{M\times M}, and f(X)=Xk(1)Xk(2)f(X)=X_k^{(1)}X_k^{(2)}.
    • Gradient computation.

PP worker nodes:

  • Some are stragglers, i.e., not responsive.
  • Some are adversaries, i.e., return erroneous results.
  • Privacy: We do not want to expose datasets to worker nodes.

Lagrange Coded Computing

Let (z)\ell(z) be a polynomial whose evaluations at ω1,,ωK\omega_1,\ldots,\omega_{K} are X1,,XKX_1,\ldots,X_K.

  • That is, (ωi)=Xi\ell(\omega_i)=X_i for every ωiF,i[K]\omega_i\in \mathbb{F}, i\in [K].

Some example constructions:

Given X1,,XKX_1,\ldots,X_K with corresponding ω1,,ωK\omega_1,\ldots,\omega_K

  • (z)=i=1KXii(z)\ell(z)=\sum_{i=1}^K X_i\ell_i(z), where i(z)=j[K],jizωjωiωj={0if ji1if j=i\ell_i(z)=\prod_{j\in[K],j\neq i} \frac{z-\omega_j}{\omega_i-\omega_j}=\begin{cases} 0 & \text{if } j\neq i \\ 1 & \text{if } j=i \end{cases}.

Then every f(Xi)=f((ωi))f(X_i)=f(\ell(\omega_i)) is an evaluation of polynomial f(z)f\circ \ell(z) at ωi\omega_i.

If the master obtains the composition h=fh=f\circ \ell, it can obtain every f(Xi)=h(ωi)f(X_i)=h(\omega_i).

Goal: The master wished to obtain the polynomial h(z)=f((z))h(z)=f(\ell(z)).

Intuition:

  • Encoding is performed by evaluating (z)\ell(z) at α1,,αPF\alpha_1,\ldots,\alpha_P\in \mathbb{F}, and P>KP>K for redundancy.
  • Nodes apply ff on an evaluation of \ell and obtain an evaluation of hh.
  • The master receives some potentially noisy evaluations, and finds hh.
  • The master evaluates hh at ω1,,ωK\omega_1,\ldots,\omega_K to obtain f(X1),,f(XK)f(X_1),\ldots,f(X_K).

Encoding for Lagrange coded computing

Need polynomial (z)\ell(z) such that:

  • Xk=(ωk)X_k=\ell(\omega_k) for every k[K]k\in [K].

Having obtained such \ell we let X~i=(αi)\tilde{X}_i=\ell(\alpha_i) for every i[P]i\in [P].

spanX~1,X~2,,X~P=span1(x),2(x),,P(x)span{\tilde{X}_1,\tilde{X}_2,\ldots,\tilde{X}_P}=span{\ell_1(x),\ell_2(x),\ldots,\ell_P(x)}.

Want Xk=(ωk)X_k=\ell(\omega_k) for every k[K]k\in [K].

Tool: Lagrange interpolation.

  • k(z)=ikzωjωkωj\ell_k(z)=\prod_{i\neq k} \frac{z-\omega_j}{\omega_k-\omega_j}.
  • k(ωk)=1\ell_k(\omega_k)=1 and k(ωk)=0\ell_k(\omega_k)=0 for every jkj\neq k.
  • degk(z)=K1\deg \ell_k(z)=K-1.

Let (z)=k=1KXkk(z)\ell(z)=\sum_{k=1}^K X_k\ell_k(z).

  • degK1\deg \ell\leq K-1.
  • (ωk)=Xk\ell(\omega_k)=X_k for every k[K]k\in [K].

Let X~i=(αi)=k=1KXkk(αi)\tilde{X}_i=\ell(\alpha_i)=\sum_{k=1}^K X_k\ell_k(\alpha_i).

Every X~i\tilde{X}_i is a linear combination of X1,,XKX_1,\ldots,X_K.

(X~1,X~2,,X~P)=(X1,,XK)G=(X1,,XK)[1(α1)1(α2)1(αP)2(α1)2(α2)2(αP)K(α1)K(α2)K(αP)](\tilde{X}_1,\tilde{X}_2,\ldots,\tilde{X}_P)=(X_1,\ldots,X_K)\cdot G=(X_1,\ldots,X_K)\begin{bmatrix} \ell_1(\alpha_1) & \ell_1(\alpha_2) & \cdots & \ell_1(\alpha_P) \\ \ell_2(\alpha_1) & \ell_2(\alpha_2) & \cdots & \ell_2(\alpha_P) \\ \vdots & \vdots & \ddots & \vdots \\ \ell_K(\alpha_1) & \ell_K(\alpha_2) & \cdots & \ell_K(\alpha_P) \end{bmatrix}

This GG is called a Lagrange matrix with respect to

  • ω1,,ωK\omega_1,\ldots,\omega_K. (interpolation points, rows)
  • α1,,αP\alpha_1,\ldots,\alpha_P. (evaluation points, columns)

Basically, a modification of Reed-Solomon code.

Decoding for Lagrange coded computing

Say the system has SS stragglers (erasures) and AA adversaries (errors).

The master receives PSP-S computation results f(X~i1),,f(X~iPS)f(\tilde{X}_{i_1}),\ldots,f(\tilde{X}_{i_{P-S}}).

  • By design, therese are evaluations of h:h(ai1)=f((ai1)),,h(aiPS)=f((aiPS))h: h(a_{i_1})=f(\ell(a_{i_1})),\ldots,h(a_{i_{P-S}})=f(\ell(a_{i_{P-S}}))
  • A evaluation are noisy
  • degh=degfdeg=(K1)degf\deg h=\deg f\cdot \deg \ell=(K-1)\deg f.

Which process enables to interpolate a polynomial from noisy evaluations?

Ree-Solomon (RS) decoding.

Fact: Reed-Solomon decoding succeeds if and only if the number of erasures + 2 ×\times the number of errors d1\leq d-1.

Imagine hh as the “message” in Reed-Solomon code. [P,(K1)degf+1,P(K1)degf]q[P,(K-1)\deg f +1,P-(K-1)\deg f]_q.

  • Interpolating hh is possible if and only if S+2A(K1)degf1S+2A\leq (K-1)\deg f-1.

Once the master interpolates hh.

  • The evaluations h(ωi)=f((ωi))=f(Xi)h(\omega_i)=f(\ell(\omega_i))=f(X_i) provides the interpolation results.

Theorem of Lagrange coded computing

Lagrange coded computing enables to compute {f(Xi)}i=1K\{f(X_i)\}_{i=1}^K for any ff at the presence of at most SS stragglers and at most AA adversaries if

(K1)degf+S+2A+1P(K-1)\deg f+S+2A+1\leq P

Interpolation of result does not depend on PP (number of worker nodes).

Privacy for Lagrange coded computing

Currently any size-KK group of colluding nodes reveals the entire dataset.

Q: Can an individual node ii learn anything about XiX_i?

A: Yes, since X~i\tilde{X}_i is a linear combination of X1,,XKX_1,\ldots,X_K (partial knowledge, a linear combination of private data).

Can we provide perfect privacy given that at most TT nodes collude?

  • That is, I(X:X~i)=0I(X:\tilde{X}_i)=0 for every T[P]\mathcal{T}\subseteq [P] of size at most TT, where
    • X=(X1,,XK)X=(X_1,\ldots,X_K), and
    • X~T=(X~i1,,X~iT)\tilde{X}_\mathcal{T}=(\tilde{X}_{i_1},\ldots,\tilde{X}_{i_{|\mathcal{T}|}}).

Solution: Slight change of encoding in LLC.

This only applied to F=Fq\mathbb{F}=\mathbb{F}_q (no perfect privacy over R,C\mathbb{R},\mathbb{C}. No uniform distribution can be defined).

The master chooses

  • TT keys ZK+1,,ZK+TZ_{K+1},\ldots,Z_{K+T} uniformly at random (Zi=Xi|Z_i|=|X_i| for all ii)
  • Interpolation points ω1,,ωK+T\omega_1,\ldots,\omega_{K+T}.

Find the Lagrange polynomial (z)\ell(z) such that

  • (wi)=Xi\ell(w_i)=X_i for i[K]i\in [K]
  • (wK+j)=Zj\ell(w_{K+j})=Z_j for j[T]j\in [T].

Lagrange interpolation:

(z)=i=1KXii(z)+j=1TXK+jellK+j(z)\ell(z)=\sum_{i=1}^{K} X_i\ell_i(z)+\sum_{j=1}^{T} X_{K+j}ell_{K+j}(z) (X~1,,X~P)=(X1,,XK,Z1,,ZT)G(\tilde{X}_1,\ldots,\tilde{X}_P)=(X_1,\ldots,X_K,Z_1,\ldots,Z_T)\cdot G

where

G=[1(α1)1(α2)1(αP)2(α1)2(α2)2(αP)K(α1)K(α2)K(αP)K+T(α1)K+T(α2)K+T(αP)]G=\begin{bmatrix} \ell_1(\alpha_1) & \ell_1(\alpha_2) & \cdots & \ell_1(\alpha_P) \\ \ell_2(\alpha_1) & \ell_2(\alpha_2) & \cdots & \ell_2(\alpha_P) \\ \vdots & \vdots & \ddots & \vdots \\ \ell_K(\alpha_1) & \ell_K(\alpha_2) & \cdots & \ell_K(\alpha_P) \\ \vdots & \vdots & \ddots & \vdots \\ \ell_{K+T}(\alpha_1) & \ell_{K+T}(\alpha_2) & \cdots & \ell_{K+T}(\alpha_P) \end{bmatrix}

For analysis, we denote G=[GtopGbot]G=\begin{bmatrix}G^{top}\\G^{bot}\end{bmatrix}, where GtopFK×PG^{top}\in \mathbb{F}^{K\times P} and GbotFT×PG^{bot}\in \mathbb{F}^{T\times P}.

The proof for privacy is the almost the same as ramp scheme.

Proof

We have (X~1,X~P)=(X1,,XK)Gtop+(Z1,,ZT)Gbot(\tilde{X}_1,\ldots \tilde{X}_P)=(X_1,\ldots,X_K)\cdot G^{top}+(Z_1,\ldots,Z_T)\cdot G^{bot}.

Without loss of generality, T=[T]\mathcal{T}=[T] is the colluding set.

T\mathcal{T} hold (X~1,X~P)=(X1,,XK)GTtop+(Z1,,ZT)Gtopbot(\tilde{X}_1,\ldots \tilde{X}_P)=(X_1,\ldots,X_K)\cdot G^{top}_\mathcal{T}+(Z_1,\ldots,Z_T)\cdot G^{bot}_\mathcal{top}.

  • GTtopG^{top}_\mathcal{T}, GTbotG^{bot}_\mathcal{T} contain the first TT columns of GtopG^{top}, GbotG^{bot}, respectively.

Note that GtopFqT×PG^{top}\in \mathbb{F}^{T\times P}_q is MDS, and hence GTtopG^{top}_\mathcal{T} is a T×TT\times T invertible matrix.

Since Z=(Z1,,ZT)Z=(Z_1,\ldots,Z_T) chosen uniformly random, so ZGTbotZ\cdot G^{bot}_\mathcal{T} is a one-time pad.

Same proof for decoding, we only need K+1K+1 item to make the interpolation work.

Conclusion

  • Theorem: Lagrange Coded Computing is resilient against SS stragglers, AA adversaries, and TT colluding nodes if P(K+T1)degf+S+2A+1P\geq (K+T-1)\deg f+S+2A+1
    • Privacy (increase with degf\deg f) cost more than the straggler and adversary (increase linearly).
  • Caveat: Requires finite field arithmetic!
  • Some follow-up works analyzed information leakage over the reals

Side note for Blockchain

Blockchain: A decentralized system for trust management.

Blockchain maintains a chain of blocks.

  • A block contains a set of transactions.
  • Transaction = value transfer between clients.
  • The chain is replicated on each node.

Periodically, a new block is proposed and appended to each local chain.

  • The block must not contain invalid transactions.
  • Nodes must agree on proposed block.

Existing systems:

  • All nodes perform the same set of tasks.
  • Every node must receive every block.

Performance does not scale with number of node

Improving performance of blockchain

The performance of blockchain is inherently limited by its design.

  • All nodes perform the same set of tasks.
  • Every node must receive every block.

Idea: Combine blockchain with distributed computing.

  • Node tasks should complement each other.

Sharding (notion from databases):

  • Nodes are partitioned into groups of equal size.
  • Each group maintains a local chain.
  • More nodes, more groups, more transactions can be processed.
  • Better performance.

Security Problem

Biggest problem in blockchains: Adversarial (Byzantine) nodes.

  • Malicious actors wish to include invalid transactions.

Solution in traditional blockchains: Consensus mechanisms.

  • Algorithms for decentralized agreement.
  • Tolerates up to 1/31/3 Byzantine nodes.

Problem: Consensus conflicts with sharding.

  • Traditional consensus mechanisms tolerate 1/3\approx 1/3 Byzantine nodes.
  • If we partition PP nodes into KK groups, we can tolerate only P/3KP/3K node failures.
    • Down from P/3P/3 in non-shared systems.

Goal: Solve the consensus problem in sharded systems.

Tool: Coded computing.

Problem formulation

At epoch tt of a shared blockchain system, we have

  • KK local chain Y1t1,,YKt1Y_1^{t-1},\ldots, Y_K^{t-1}.
  • KK new blocks X1(t),,XK(t)X_1(t),\ldots,X_K(t).
  • A polynomial verification function f(Xk(t),Ykt)f(X_k(t),Y_k^t), which validates Xk(t)X_k(t).

Proof

Balance check function f(Xk(t),Ykt)=τYk(τ)Xk(t)f(X_k(t),Y_k^t)=\sum_\tau Y_k(\tau)-X_k(t).

More commonly, a (polynomial) hash function. Used to:

  • Verify the sender’s public key.
  • Verify the ownership of the transferred funds.

Need: Apply a polynomial functions on KK datasets.

Lagrange coded computing!

Blockchain with Lagrange coded computing

At epoch tt:

  • A leader is elected (using secure election mechanism).
  • The leader receives new blocks X1(t),,XK(t)X_1(t),\ldots,X_K(t).
  • The leader disperses the encoded blocks X~1(t),,X~P(t)\tilde{X}_1(t),\ldots,\tilde{X}_P(t) to nodes.
    • Needs secure information dispersal mechanisms.

Every node i[P]i\in [P]:

  • Locally stores a coded chain Y~it\tilde{Y}_i^t (encoded using LCC).
  • Receives X~i(t)\tilde{X}_i(t).
  • Computes f(X~i(t),Y~it)f(\tilde{X}_i(t),\tilde{Y}_i^t) and sends to the leader.

The leader decodes to get {f(Xi(t),Yit)}i=1K\{f(X_i(t),Y_i^t)\}_{i=1}^K and disperse securely to nodes.

Node ii appends coded block X~i(t)\tilde{X}_i(t) to coded chain Y~it\tilde{Y}_i^t (zeroing invalid transactions).

Guarantees security if P(K+T1)degf+S+2A+1P\geq (K+T-1)\deg f+S+2A+1.

  • AA adversaries, degree dd verification polynomial.

Sharding without sharding:

  • Computations are done on (coded) partial chains/blocks.
    • Good performance!
  • Since blocks/chains are coded, they are “dispersed” among many nodes.
    • Security problem in sharding solved!
  • Since the encoding is done (securely) through a leader, no need to send every block to all nodes.
    • Reduced communication! (main bottleneck).

Novelties:

  • First decentralized verification system with less than size of blocks times the number of nodes communication.
  • Coded consensus – Reach consensus on coded data.
Last updated on