CSE5313 Coding and information theory for data science (Lecture 25)
Polynomial Evaluation
Problem formulation:
- We have datasets .
- Want to compute some polynomial function of degree on each dataset.
- Want .
- Examples:
- are points in , and .
- , both in , and .
- Gradient computation.
worker nodes:
- Some are stragglers, i.e., not responsive.
- Some are adversaries, i.e., return erroneous results.
- Privacy: We do not want to expose datasets to worker nodes.
Lagrange Coded Computing
Let be a polynomial whose evaluations at are .
- That is, for every .
Some example constructions:
Given with corresponding
- , where .
Then every is an evaluation of polynomial at .
If the master obtains the composition , it can obtain every .
Goal: The master wished to obtain the polynomial .
Intuition:
- Encoding is performed by evaluating at , and for redundancy.
- Nodes apply on an evaluation of and obtain an evaluation of .
- The master receives some potentially noisy evaluations, and finds .
- The master evaluates at to obtain .
Encoding for Lagrange coded computing
Need polynomial such that:
- for every .
Having obtained such we let for every .
.
Want for every .
Tool: Lagrange interpolation.
- .
- and for every .
- .
Let .
- .
- for every .
Let .
Every is a linear combination of .
This is called a Lagrange matrix with respect to
- . (interpolation points, rows)
- . (evaluation points, columns)
Basically, a modification of Reed-Solomon code.
Decoding for Lagrange coded computing
Say the system has stragglers (erasures) and adversaries (errors).
The master receives computation results .
- By design, therese are evaluations of
- A evaluation are noisy
- .
Which process enables to interpolate a polynomial from noisy evaluations?
Ree-Solomon (RS) decoding.
Fact: Reed-Solomon decoding succeeds if and only if the number of erasures + 2 the number of errors .
Imagine as the “message” in Reed-Solomon code. .
- Interpolating is possible if and only if .
Once the master interpolates .
- The evaluations provides the interpolation results.
Theorem of Lagrange coded computing
Lagrange coded computing enables to compute for any at the presence of at most stragglers and at most adversaries if
Interpolation of result does not depend on (number of worker nodes).
Privacy for Lagrange coded computing
Currently any size- group of colluding nodes reveals the entire dataset.
Q: Can an individual node learn anything about ?
A: Yes, since is a linear combination of (partial knowledge, a linear combination of private data).
Can we provide perfect privacy given that at most nodes collude?
- That is, for every of size at most , where
- , and
- .
Solution: Slight change of encoding in LLC.
This only applied to (no perfect privacy over . No uniform distribution can be defined).
The master chooses
- keys uniformly at random ( for all )
- Interpolation points .
Find the Lagrange polynomial such that
- for
- for .
Lagrange interpolation:
where
For analysis, we denote , where and .
The proof for privacy is the almost the same as ramp scheme.
Proof
We have .
Without loss of generality, is the colluding set.
hold .
- , contain the first columns of , , respectively.
Note that is MDS, and hence is a invertible matrix.
Since chosen uniformly random, so is a one-time pad.
Same proof for decoding, we only need item to make the interpolation work.
Conclusion
- Theorem: Lagrange Coded Computing is resilient against stragglers, adversaries, and colluding nodes if
- Privacy (increase with ) cost more than the straggler and adversary (increase linearly).
- Caveat: Requires finite field arithmetic!
- Some follow-up works analyzed information leakage over the reals
Side note for Blockchain
Blockchain: A decentralized system for trust management.
Blockchain maintains a chain of blocks.
- A block contains a set of transactions.
- Transaction = value transfer between clients.
- The chain is replicated on each node.
Periodically, a new block is proposed and appended to each local chain.
- The block must not contain invalid transactions.
- Nodes must agree on proposed block.
Existing systems:
- All nodes perform the same set of tasks.
- Every node must receive every block.
Performance does not scale with number of node
Improving performance of blockchain
The performance of blockchain is inherently limited by its design.
- All nodes perform the same set of tasks.
- Every node must receive every block.
Idea: Combine blockchain with distributed computing.
- Node tasks should complement each other.
Sharding (notion from databases):
- Nodes are partitioned into groups of equal size.
- Each group maintains a local chain.
- More nodes, more groups, more transactions can be processed.
- Better performance.
Security Problem
Biggest problem in blockchains: Adversarial (Byzantine) nodes.
- Malicious actors wish to include invalid transactions.
Solution in traditional blockchains: Consensus mechanisms.
- Algorithms for decentralized agreement.
- Tolerates up to Byzantine nodes.
Problem: Consensus conflicts with sharding.
- Traditional consensus mechanisms tolerate Byzantine nodes.
- If we partition nodes into groups, we can tolerate only node failures.
- Down from in non-shared systems.
Goal: Solve the consensus problem in sharded systems.
Tool: Coded computing.
Problem formulation
At epoch of a shared blockchain system, we have
- local chain .
- new blocks .
- A polynomial verification function , which validates .
Proof
Balance check function .
More commonly, a (polynomial) hash function. Used to:
- Verify the sender’s public key.
- Verify the ownership of the transferred funds.
Need: Apply a polynomial functions on datasets.
Lagrange coded computing!
Blockchain with Lagrange coded computing
At epoch :
- A leader is elected (using secure election mechanism).
- The leader receives new blocks .
- The leader disperses the encoded blocks to nodes.
- Needs secure information dispersal mechanisms.
Every node :
- Locally stores a coded chain (encoded using LCC).
- Receives .
- Computes and sends to the leader.
The leader decodes to get and disperse securely to nodes.
Node appends coded block to coded chain (zeroing invalid transactions).
Guarantees security if .
- adversaries, degree verification polynomial.
Sharding without sharding:
- Computations are done on (coded) partial chains/blocks.
- Good performance!
- Since blocks/chains are coded, they are “dispersed” among many nodes.
- Security problem in sharding solved!
- Since the encoding is done (securely) through a leader, no need to send every block to all nodes.
- Reduced communication! (main bottleneck).
Novelties:
- First decentralized verification system with less than size of blocks times the number of nodes communication.
- Coded consensus – Reach consensus on coded data.