Skip to Content
CSE5313CSE5313 Coding and information theory for data science (Lecture 14)

CSE5313 Coding and information theory for data science (Lecture 14)

The repair problem

Main challenge:

  • Locality (number of contacted servers)
  • Bandwidth (number of bits transferred)

From last lecture we build optimal Local Recoverable Codes (LRCs) for storage systems.

Let C=[n,k]q\mathcal{C} = [n, k]_q which is rr-LRC, with minimum distance dd.

  • Bound 1: knrr+1\frac{k}{n}\leq \frac{r}{r+1}.

  • Bound 2: dnkkr+2d\leq n-k-\frac{k}{r}+2.

  • Optimal LRC:

  • Let A={α1,,αn}\mathcal{A} = \{\alpha_1, \ldots, \alpha_n\}, partition A\mathcal{A} to Ai\mathcal{A}_i for i=1i=1 to nr+1\frac{n}{r+1}.

  • gFq[x]g\in \mathbb{F}_q[x] is good if deg(g)=r+1\deg(g) = r+1 and gg is constant on all Ai\mathcal{A}_i‘s.

  • C={fa(αi)}i=1naFqk}\mathcal{C} = \{f_a(\alpha_i)\}_{i=1}^{n}|a\in \mathbb{F}_q^k\}, where

    • fa(x)=i=0r1fa,i(x)xif_a(x) = \sum_{i=0}^{r-1} f_{a,i}(x)\cdot x^i,
    • fa,i(x)=j=0k/r1ai,jg(x)jf_{a,i}(x) = \sum_{j=0}^{k/r-1} a_{i,j}\cdot g(x)^j, where gg is a good polynomial.
    • gg is a “good” polynomial.
  • dimC=k\dim \mathcal{C} = k and d=nkkr+2d = n-k-\frac{k}{r}+2.

Minimizing the repair bandwidth

Goal: understand repair bandwidth.

  • What is the minimum repair bandwidth?
  • Is repair bandwidth in trade-off with other parameters?
  • Tool: The information flow graph.

Spoiler alert:

  • Tradeoff: Storage Repair and bandwidth.
  • Codes which achieve an optimal tradeoff.

Information flow graph

We can model the repair problem as a directed graph.

  • Source: System admin.
  • Sink: Data collector.
  • Nodes: Storage servers.
    • Nodes leave/crash
    • Newcomer replaces them \to new nodes.
  • Edges: Represents transmission of information. (Number of Fq\mathbb{F}_q elements is weight.)

Main observation:

  • kk elements from Fq\mathbb{F}_q must “flow” from the source (system admin) to the sink (data collector).
  • Any cut (U,U)(U,\overline{U}) which separates source from sink must have capacity at least kk.

Roadmap:

Information flow graph \to Minimum cut analysis \to Bound on file size \to Storage/bandwidth tradeoff.

Basic definitions for information flow graph

Warning

This is not the same as definitions in linear codes. kk is not the dimension of the code and dd is not the minimum distance of the code for general cases.

Parameters:

  • nn is the number of nodes in the initial system (before any node leaves/crashes).
  • kk is the number of nodes required to reconstruct the file kk.
  • dd is the number of nodes required to repair a failed node.
  • α\alpha is the storage at each node.
  • β\beta is the edge capacity for repair.
  • BB is the file size.

Goal: Find the trade off between n,k,d,α,β,Bn,k,d,\alpha,\beta,B using min-cut analysis of the information flow graph.

Initial system:

We denote the system admin as SS

Sever as 1,2,,n1,2,\ldots,n, each with edge capacity α\alpha.

For each new server:

We have two nodes inin and outout, the edge weight is α\alpha.

Connect to dd previous nodes outout‘s with edge capacity β\beta.

Data collector:connects to kk arbitrary nodes with each edge capacity α\alpha.

Observe that:

  • File size BB.
  • Any cut separating SS form DCDC must have capacity at least BB.
  • Otherwise, two different files are indistinguishable to DCDC.

Bound on bandwidth

Claim: mincuti=1k1min{(di)β,α}mincut\geq \sum_{i=1}^{k-1}\min\{(d-i)\beta, \alpha\}.

Intuition: Let (U,U)(U, \overline{U}) be a cut separating SS form DCDC.

  • The DC contacts newest kk nodes, say n1,n2,,nkn_1,n_2,\ldots,n_k.
  • The cut must decide if to cross α\alpha edges or β\beta edges.
  • At east did-i edges to go to UU.

Proof

Let (U,U)(U, \overline{U}) be a cut separating SS form DCDC, assuming SUS\in U and DCUDC\in \overline{U}.

Every directed acyclic graph has topological sort.

Let xout1x_{out}^1 be the first outout node in U\overline{U}. There are two cases:

  • xin1Ux_{in}^1\in U. Then α\alpha edges must be crossed.
  • xin1Ux_{in}^1\in \overline{U}. Then all dd incoming edges to xin1x_{in}^1 must be crossed. (Otherwise, there exists an earlier outout node xoutjx_{out}^j with xinjUx_{in}^j\in U, contradicting the topological sort.)

So xout1x_{out}^1 contributes at least min{dβ,α}\min\{d\beta, \alpha\} to the cut capacity.

Let xout2x_{out}^2 be the second outout node in U\overline{U}. There are two cases:

  • xin2Ux_{in}^2\in U. Then α\alpha edges must be crossed.
  • xin2Ux_{in}^2\in \overline{U}. Then at least d1d-1 incoming edges (11 edge may come from xout1x_{out}^1) to xin2x_{in}^2 must be crossed. (Otherwise, there exists an earlier outout node xoutjx_{out}^j with xinjUx_{in}^j\in U, contradicting the topological sort.)

So xout2x_{out}^2 contributes at least min{(d1)β,α}\min\{(d-1)\beta, \alpha\} to the cut capacity.

By repeating this process, we can show that the minimum cut capacity is at least i=1k1min{(di)β,α}\sum_{i=1}^{k-1}\min\{(d-i)\beta, \alpha\}.

Storage/bandwidth tradeoff

Claim: There exists an information graph with mincut=i=1k1min{(di)β,α}mincut = \sum_{i=1}^{k-1}\min\{(d-i)\beta, \alpha\}.

Homework: Build this graph as follows:

  • Construct the initial graph with nn nodes and the system admin.
  • Add n+kn+k nodes, each node connects to the most recent dd nodes.
  • Find the minimum cut capacity.

Corollary: Bi=1k1min{(di)β,α}B\leq \sum_{i=1}^{k-1}\min\{(d-i)\beta, \alpha\}.

Definition of regenerate code

A code which attains B=i=1k1min{(di)β,α}B=\sum_{i=1}^{k-1}\min\{(d-i)\beta, \alpha\} is called a regenerate code.

Goal: Find tradeoff between storage α\alpha to repair bandwidth dβd\beta.

Let γ=dβ\gamma = d\beta, then Bi=0k1min{(1i/d)γ,α}B \leq \sum_{i=0}^{k-1}\min\{(1-i/d)\gamma, \alpha\}.

Tool: Fix γ\gamma and dd, and minimize for α\alpha (not shown).

Result: The storage/bandwidth tradeoff.

  • Each point on/above the line is feasible.
  • Points on the line = regenerating codes.
  • One endpoint: Minimum Bandwidth Regenerating (MBR) codes.
  • another endpoint: Minimum Storage Regenerating (MSR) codes

Storage/bandwidth tradeoff

For Minimum Storage Regenerating (MSR) codes, we have α=Bk\alpha = \frac{B}{k}, β=Bk(dk+1)\beta = \frac{B}{k(d-k+1)}

For Minimum Bandwidth Regenerating (MBR) codes, we have α=dBkdk(k1)2\alpha = \frac{dB}{kd-\frac{k(k-1)}{2}}, β=Bkdk(k1)2\beta = \frac{B}{kd-\frac{k(k-1)}{2}}

Notes:

  • In MSR α=B/k\alpha=B/k, Data collector contacts kk nodes and downloads B/kB/k from each to reconstruct the file of size BB, that is optimal.
  • In MBR β=B/(kdk(k1)2)\beta=B/(kd-\frac{k(k-1)}{2}), new comer download exactly what it stores, which is the same as replication. This has much smaller storage overhead in the replication.

Regenerating codes, Magic #1:

  • MBR: Same repair-bandwidth as replication (α\alpha), at lower storage costs.
  • MSR: Same reconstruction-bandwidth (B/kB/k) as replication, at lower storage costs.

Regenerating codes, Magic #2:

  • In MSR: γ=dβ=dBk(dk+1)\gamma = d\beta = \frac{dB}{k(d-k+1)}, α=Bk\alpha = \frac{B}{k}
  • In MBR: γ=dβ=dBkdk(k1)2\gamma = d\beta = \frac{dB}{kd-\frac{k(k-1)}{2}}, α=dBkdk(k1)2\alpha = \frac{dB}{kd-\frac{k(k-1)}{2}}
  • Both decreasing functions of dd.
  • \Rightarrow Less repair-bandwidth by contacting more nodes, minimized at d=n1d = n - 1.

Constructing Minimum bandwidth regenerating (MBR) codes from Maximum distance separable (MDS) codes

Observation: For MBR code with parameters n,k,dn, k, d and β=1\beta = 1, one can construct MBR with parameters n,k,dn, k, d and any β\beta.

Next: Construct MBR for [n,k,d=n1][n, k, d = n - 1] and β=1\beta = 1.

In any MBR: α,β=dBkdk(k1)2,Bkdk(k1)2\alpha, \beta = \frac{dB}{kd-\frac{k(k-1)}{2}}, \frac{B}{kd-\frac{k(k-1)}{2}}

Specifically:

  • Storage α=dβ=d=n1\alpha = d\beta = d = n - 1.
  • File size B=kd(k2)β=kd(k2)B = kd - \binom{k}{2}\beta = kd - \binom{k}{2}

Take an [(n2),B][\binom{n}{2}, B] MDS code (e.g., Reed-Solomon).

Need qn2q\geq \frac{n}{2}.

Consider a complete graph KnK_n on nn nodes.

  • (n2)\binom{n}{2} edges.
  • Place each codeword symbol on a distinct edge.
  • Storage server ii stores all codeword symbols adjacent with node ii.
    • α=n1\alpha = n - 1.

Repairing on MBR codes

New comer contacts each node jij\neq i;

And downloads the symbol on edge (i,j)(i,j).

We get α=n1=dβ\alpha=n-1=d\beta which is optimal.

Reconstruction on MBR codes

We use [(n2),B]q[\binom{n}{2}, B]_q MDS code. So any BB symbols suffice to reconstruct the file.

Any tt nodes have (t2)\binom{t}{2} edges between them, and (n1)t2(t2)(n-1)t-2\binom{t}{2} edges to other nodes.

Overall (n1)t(t2)(n-1)t-\binom{t}{2}. For t=kt=k, we get kd(k2)=Bkd-\binom{k}{2}=B.

Constructing Minimum bandwidth regenerating (MBR) codes from Product-Matrix MBR codes

Recall: File size in MBR B=kd(k2)=(k+12)+k(dk)B=kd-\binom{k}{2}=\binom{k+1}{2}+k(d-k).

Step 1: Arrange the B=(k+12)+k(dk)B=\binom{k+1}{2}+k(d-k) symbols in a matrix MM follows:

M=(STT0)Fqd×dM=\begin{pmatrix} S & T\\ T^\top & 0 \end{pmatrix}\in \mathbb{F}_q^{d\times d}
  • SS is a k×kk\times k symmetric matrix. contains (k+12)\binom{k+1}{2} symbols.
  • TT is a k×(dk)k\times (d-k) matrix. contains k(dk)k(d-k) symbols.

So there are BB elements overall.

Step 2: Construct the encoding matrix C=(Ψ,Δ)Fqn×dC=(\Psi,\Delta)\in \mathbb{F}_q^{n\times d}

ΨFqn×k\Psi\in \mathbb{F}_q^{n\times k} such that

  • Any kk rows are linearly independent.
  • Example: Vandermonde matrix.

ΔFqn×(dk)\Delta\in \mathbb{F}_q^{n\times (d-k)} such that

  • Any dd rows of CC are linearly independent.
  • Example: Complete Ψ\Psi to a full n×dn\times d Vandermonde matrix.

Step 3: Encoding of the data MFqd×dM\in \mathbb{F}_q^{d\times d} using the encoding matrix CFqn×dC\in \mathbb{F}_q^{n\times d}.

  • Multiply MM by CC.
  • Store the ii the row of CMCM in the node ii.
  • Note CM=(Ψ,Δ)M=(ΨS+ΔT,ΨT)CM=(\Psi,\Delta)M=(\Psi S+\Delta T, \Psi T)

Repairing on Product-Matrix MBR codes

Assume node ii storing ciMc_iM is lost.

Repair from (any) nodes H={h1,,hd}H = \{h_1, \ldots, h_d\}.

  • Node hjh_j stores chjMc_{h_j}M.

Newcomer contacts each hjh_j: “My name is ii, and I’m lost.”

Node hjh_j sends chjMcic_{h_j}M c_i^\top (inner product).

Newcomer assembles CHMciC_H Mc_i^\top.

CHCH invertible by construction!

  • Recover MciMc_i^\top.

  • Recover ci\topMc_i^\topM (MM is symmetric)

Reconstruction on Product-Matrix MBR codes

Data Collector (DC) contacts (any) nodes D={d1,,dk}D = \{d_1, \ldots, d_k\}.

  • Node djd_j stores cdjMc_{d_j}M.

Downloads cdjMc_{d_j}M from node djd_j.

DC assembles CDMC_D M.

  • Recall CM=(ΨS,Δ)M=(ΨS+ΔT,ΨT)CM=(\Psi S,\Delta)M=(\Psi S+\Delta T, \Psi T)
  • CDM=(ΨDS,ΔD)M=(ΨDS+ΔDT,ΨDT)C_D M=(\Psi_D S,\Delta_D)M=(\Psi_D S+\Delta_D T, \Psi_D T)

ΨD\Psi_D invertible by construction.

  • DC computes ΨD1CDM=(S+ΨD1ΔD,T)\Psi_D^{-1}C_DM = (S+\Psi_D^{-1}\Delta_D^\top, T)
  • DC obtains TT.
  • Subtracts ΨD1ΔDT\Psi_D^{-1}\Delta_D T^\top from S+ΨD1ΔDTS+\Psi_D^{-1}\Delta_D T^\top to obtain SS.

Fill an example here please.

Last updated on