Home > Erasure Coding > Erasure Coding for Big Data

Erasure Coding for Big Data

The Clay code is optimal in $4$ respects as depicted here.
Figure 14: The Clay code is optimal in $4$ respects as depicted here.
Figure 15
Figure 15: The Clay code is depicted here in the form of a datacube. Each vertex of the datacube appears in the figure as a small cylinder, and is associated to an index of the form $(x,y,z)=(x,y,(z_0,z_1))$ where $\{x,y,z_0,z_1\}$ are all either $0$ or $1$. Image taken from [25].
  1. 1. File Size One should be able to store $B=8$ symbols from $\mathbb{F}_q$ in redundant fashion within this datacube,

  2. 2. Data-collection property The entire data file of $8$ symbols is to be recovered by connecting to any $2$ nodes,

  3. 3. Node-repair property The repair of a failed node should be accomplished by downloading just $2$ symbols from each of the $d=3$ remaining nodes.

In the following, we will describe how encoding and node repair take place in a Clay code. We refer the reader to [23] for an explanation as to how data collection is accomplished in the Clay code.

3.1 Indexing of Symbols Within the Clay Code

As noted earlier, we will view the Clay code as containing a total of $16$ $\mathbb{F}_q$ symbols, each placed at a distinct vertex of a datacube of size $(2\times 2 \times 4)$. In all of the figures that we show, the vertex appears as a small cylinder. We will index each code symbol by the triple

(x,y,z) & = & (x,y,(z_0,z_1)),

where $\{x,y,z_0,z_1\}$ all belong to $\{0,1\}$, as shown in Figure 15. The figure on the left identifies each plane with a value of $z=(z_0,z_1)$, whereas the figure on the right identifies the $(x,y)$ coordinates of each vertex within the example plane $z=(0,1)$.

3.2 Actual and Virtual Datacubes

For the purposes of describing the encoding and repair properties of the Clay code, it is convenient to introduce a second datacube of identical dimensions. We will refer to the datacube representing the Clay code itself as the actual datacube and the second datacube just introduced here, as the virtual datacube. Figure 16 shows the two datacubes, with the virtual datacube on the left appearing in blue and the actual datacube appearing on the right in red. The acronyms PFT and PRT appearing in the figure correspond respectively to the expansions pairwise forward transform and pairwise reverse transform. These terms will shortly be explained.

Figure 16
Figure 16: The virtual (left) and actual (right) datacubes associated with a Clay code. Image taken from [25].

Across the two datacubes, the symbols corresponding to the red dots in the same location are identical. Within each of the datacubes, the symbols associated with vertices that are not colored red, are paired. Example pairings are identified in Figure 16 in yellow and connected by dashed lines. Thus the symbols $\{U,U^{*}\}$ on the left are paired. So are the symbols $\{C,C^{*}\}$ on the right. The indices of the paired symbols are given in general, by:

(x,y,(z_0,z_1)) \Longleftrightarrow (z_y,y,(u_0,u_1))\\
\text{ where $(u_y=x \text{ and } u_i=z_i, i \neq y$).}

Thus, the two paired symbols share the same $y$ coordinate and the corresponding vertices lie in the same $y$-section of the datacube. The relationship between the pair $\{U,U^{*}\}$ on the left and the pair $\{C,C^{*}\}$ on the right is given by the PFT and PRT as shown below:

\underbrace{\left[ \begin{array}{c} C \\ C^{*} \end{array} \right] \ = \ A \left[ \begin{array}{c} U \\ U^{*} \end{array} \right]}_{\text{PFT}}, & \ \ \ & \underbrace{\left[ \begin{array}{c} U \\ U^{*} \end{array} \right] \ = \ A^{-1} \left[ \begin{array}{c} C \\ C^{*} \end{array} \right]}_{\text{PRT}},

where $A$ is a nonsingular matrix. The matrix $A$ is required to possess the following additional property: given any two elements in the set $\{U,U^{*},C,C^{*}\}$, the remaining two elements can be derived from them. This is equivalent to saying that the matrix

B & = & \left[ \begin{array}{c} I_2 \\ A \end{array} \right],

should have the property that any two rows of $B$ are linearly independent. Here $I_2$ denotes the $(2 \times 2)$ identity matrix. From this, it follows that given the contents of one datacube, the contents of the other datacube can be fully recovered. The symbols belonging to the virtual datacube possess an important property: the symbols in any given horizontal plane, corresponding to a fixed value of indexing parameter $z$, form an $[4,2]$ MDS code. This naturally, imposes a constraint on the contents of the actual datacube, and is the only constraint placed (in indirect fashion), on the contents of the actual datacube. We will show how this can be used to encode and carry out node repair.

3.3 Encoding

Encoding is carried out as described in Figure 17. The four rectangles appearing in the figure represent a top aerial view of the actual (in red) and virtual (in blue) datacubes.

Figure 17
Figure 17: Illustrating the $3$-step procedure for encoding of the Clay code.
Image taken from [25].

Encoding is carried out as per the steps given below (see Figure 17).

  • Step 1: The columns of the actual datacube corresponding to nodes having $(x,y)$ coordinates $(0,0)$, $(1,0)$ are filled with the $4+4=8$ message symbols.

  • Step 2: The PRT is then used to compute the contents of the corresponding nodes in the virtual datacube. This is possible since the two paired symbols always belong to the same $y$-section.

  • Step 3: In the virtual datacube, we know through Step 2, the values of the $8$ symbols belonging to the datacube and corresponding to nodes associated to vertices $(x,y)=(0,0),(1,0)$. The fact that the four symbols in each plane of the virtual datacube (i.e., the $4$ symbols corresponding to a fixed value of $z$ coordinate) form a $[4,2]$ MDS code, allows the remaining two symbols in that plane and having coordinates $(x,y) \in \{(0,1),(1,1)\}$ to be determined. Since this procedure can be carried out for each of the $4$ planes, at the end of this step, the entire contents of the virtual datacube have been determined.

  • In the last and final step, we use the PFT to determine the contents of the actual datacube and corresponding to nodes having vertices $(x,y) \in \{(0,1),(1,1)\}$. This concludes the encoding process.

3.4 Node Repair

Node repair is accomplished by carrying out the $3$-step procedure described below (see Figure 18). Let us assume without loss of generality that node associated to $(x_0,y_0)=(1,0)$ has failed.

  • Step 1: We focus on the planes associated to $z=(z_0,z_1)$ where $z_{y_0}=x_0$, i.e., $z_{0}=1$. There are $2$ such planes and we will refer to these as the repair planes. Thus in the present example, the repair planes are the planes corresponding to $z_0=1$, namely the planes $z=(1,0)$ and $z=(1,1)$. Using the PRT and the known contents of the actual datacube associated to vertex set $(0,1),(1,1)$, the contents of the virtual datacube and associated to the same vertex set $(0,1),(1,1)$ in the repair planes can be determined.

  • Step 2: The MDS code binding the $4$ symbols in the repair planes of the virtual datacube is used to decode the remaining symbols in the repair planes.

  • Step 3: the symbols in the repair planes and associated to the red dots are the same in the virtual and actual datacubes. Thus we have recovered $2$ of the lost symbols in the failed node (of the actual datacube), namely the symbols associated to vertex sets:

    ((x,y)=(1,0),z=(1,0)) \\ \text{ and } ((x,y)=(1,0),z=(1,1)) .

    We also have access to the symbol pairs $U,C$ associated to vertex sets

    ((x,y)=(0,0),z=(1,0))\\ \text{ and } ((x,y)=(0,0),z=(1,1)) .

    Using these, the corresponding symbols $C^*$ can be determined and these are precisely the remaining two symbols in the failed node, namely the symbols associated to vertex sets:

    ((x,y)=(1,0),z=(0,0))\\ \text{ and } ((x,y)=(1,0),z=(0,1)) .

    This concludes the repair process.

Figure 18
Figure 18: Illustrating how node repair is accomplished in the Clay code.
Image taken from [25].

3.5 Systems Evaluation and Contributions to Ceph

In a joint collaborative effort involving the University of Maryland, NetApp and the Indian Institute of Science, Clay codes have been implemented in an open-source distributed storage system called Ceph [26] and evaluated over an AWS (Amazon Web Services) cluster. This effort can be found described in [25]. It is planned to have the Clay code made available as an erasure code plugin in the upcoming Nautilus release [27] of Ceph. An earlier contribution by the authors to Ceph involved enabling vector-code support in Ceph by introducing the notion of sub-chunk and then enabling Clay code as an erasure-code plugin. The current implementation leaves open the choice of scalar MDS building block, i.e., the code can be realized through any available MDS implementation within Ceph, such as the Jerasure, Intel Storage Acceleration (ISA) and Shingled Erasure Code (SHEC) plugins.

4 Locally Recoverable Codes

As mentioned in Section 1.3, locally recoverable codes (LRCs), introduced in [10], are aimed at keeping to a low level, the repair degree. A linear code is systematic if the $k$ message symbols are explicitly present among the $n$ code symbols. An $(n,k,r)$ LRC $\mathcal{C}$ over a field $\mathbb{F}_q$ is a systematic $[n,k]$ linear block code having the property that every message symbol $c_t$, $t \in [k]$ can be recovered by computing a linear combination of the form

c_t & = & \sum_{j \in S_t} a_j c_j , \ \ a_j \in \mathbb{F}_q \tag{4}\label{eq:lrc},

involving at most $r$ other code symbols $c_j, j \in S_t$. Thus the set $S_t$ in the equation above has size at most $r.$ The minimum distance of an $(n,k,r)$ LRC must satisfy the bound

d_{\min} & \leq & (n-k+1) – \left( \left \lceil \frac{k}{r} \right \rceil -1 \right).

An LRC whose minimum distance satisfies the above bound with equality is said to be optimal. The class of pyramid codes [28] are an example of a class of optimal LRCs.

4.1 The Windows Azure LRC

Figure 19 shows the $(n=18,k=14,r=7)$ LRC employed in conjunction with Windows Azure [29] and which is related in structure, to the pyramid code. The dotted boxes indicate a collection of symbols that satisfy an overall parity check. This code has minimum distance $4$ which is the same as that of the $[n=9,k=6]$ RS code appearing in Figure 20.

In terms of reliability, the codes are comparable as they both have the same minimum distance $d_{\min}=4$. In terms of repair degree, the two codes are again comparable, having respective repair degrees of $7$ (Azure LRC) and $6$ (RS). The big difference is in the storage overhead, which stands at $\frac{18}{14}=1.29$ in the case of the Azure LRC and $\frac{9}{6}=1.5$ in the case of the $[9,6]$ RS code. This difference has reportedly saved Microsoft millions of dollars [30].

Figure 19
Figure 19: The LRC that is employed in Windows Azure.
Figure 20
Figure 20: An RS code having the same minimum distance as the Windows Azure LRC.

An LRC in which every code symbol can be recovered from a linear combination of at most $r$ other code symbols is called an all-symbol LRC. A construction for optimal all-symbol LRCs can be found in [31]. The codes in the construction may be regarded as subcodes of RS codes. An example is shown in Figure 21. As was noted in Section 1.1, code symbols in an RS code may be regarded as values of a polynomial associated with the message symbols. The construction depicted in Figure 21, is one in which code symbols are obtained by evaluating a subclass of polynomials. This subclass of polynomials has the property that given any code symbol corresponding to the evaluation $f(P_a)$, there exist two other code symbols $f(P_b),f(P_c)$ such that the three values lie on straight line and hence satisfy an equation of the form

u_af(P_a)+ u_bf(P_b)+ u_cf(P_c) & = & 0.

Figure 23
Figure 21: Illustrating the construction of an all-symbol, optimal LRC.
Figure 24
Figure 22: Illustrating on the left, a code with locality, in which each code symbol is protected by a $[4,3,2]$ local code and each local code is contained in a $[24,14,7]$ global code. In the hierarchical-locality code on the right, each local code is a part of one of the $[12,8,3]$ middle codes, which are in turn, contained in a $[24,14,6]$ global code. Image on right is
taken from [32].

Pages ( 4 of 6 ): « Previous123 4 56Next »

Leave a Comment:

Your email address will not be published. Required fields are marked *