Prof. Vijay Kumar was conferred the ACCS-CDAC Foundation award for 2018, along with Prof. V. Kamakoti of Indian Institute of Technology, Madras. Prof. Vijay Kumar is recognized for his seminal contributions to error-correcting codes and signal design for wireless communication which has been adopted into WCDMA Cellular Standards.
In today’s data hungry world, Prof. Vijay Kumar’s work, which appears deeply theoretical, is a very important contribution towards solving practical problems faced by the industry. Prof. Vijay Kumar did his BTech from IIT Kharagpur, and MTech from Kanpur before leaving for US to pursue his PhD at the University of Southern California in 1979. He joined the same department as faculty in 1983. His association at USC with Dr. Irving Reed of the Reed-Solomon Codes fame gave a thrust to his deep-set interest in pursuing a career in coding theory. Prof. Vijay Kumar joined the Department of Electrical Communication Engineering at the Indian Institute of Science (IISc) in 2003.
Prof. Vijay Kumar’s work started turning heads in industry and academia when he started delving into erasure coding as applied to computing and storage. CLAY (short for coupled-layer) codes are erasure codes designed to bring about significant savings in terms of network bandwidth and disk IO when a failed node/OSD/rack is being repaired. These codes, belonging to Minimum Storage Regenerating (MSR) class of code, offer a number of properties including low storage overhead and optimal disk I/O among others and make recovery of lost data from a storage node faster, accurate and with less expensive. Prof. Vijay Kumar and his team of researchers are applying regenerative erasure coding to new interesting domains.
Here’s an excerpt from an interview by Prashanth Hebbar from ACC Journal .
Your team made seminal contributions to Clay code by showing that any MDS code can be used to build the Clay code. Can you explain what led to the important work?
Our work on Clay code was started soon after a discussion on the limitations of Reed-Solomon codes with a visiting Berkeley professor. Two of our bright students from my department came up with the first instructions to what would be a Clay Code that works with MDS (Maximum Distance Separable) code making it a regenerating code instead of a replication code.
The Clay code, as the very name suggest, is constructed by MDS code in multiple layers and performing pair-wise coupling across layers. A Clay code can be built from Reed-Solomon code or any MDS codes. For example, take a block of storage which is encoded using Reed-Solomon code, we can pick up 4 layers and start coupling the layers by replacing every pair with a different pair to make a Clay code. We use Galois Field computations to do these operations. Among the three teams which contributed to this work, the fact that you can use any MDS code was shown by our team. What it means is that, if someone gives us a [20, 16] code, where the block length is 20 and dimension is 16 (which means, there are 16 information symbols and four pieces of redundancy making it 20), we take that and build the Clay code using MDS code by applying either the Galois Field operations or the more simpler XOR operations if required.
How can practitioners use an implementation of your work and how accessible is it?
We have implemented Clay codes and have made it available as open-source under LGPL license. Clay code is available as a plug-in for the open source unified distributed storage software, CEPH. CEPH traditionally supported scalar erasure codes like Reed-Solomon code. We have modified the CEPH to support Clay code, which is a vector code. Our work is now part of the master code base of CEPH.
You have also worked on Local Reconstruction Codes (LRC) and have extended that work in an interesting direction, Hierarchical codes, can you give us a context to this problem domain?
Today, we have built several classes of codes and with different properties. The regenerating codes is one direction, another direction is what we call the locally repairable codes. We are making a lot of important contribution to this branch of erasure coding. This branch was started by a team in Microsoft consisting of very smart computer scientists, Sergey Yekhanin, Parikshit Gopalan and others. They came up with this scheme and called it Codes with Locality but now it is called Local Reconstruction Codes (LRC). Microsoft used it in the Windows Azure Systems. Windows Azure data is protected by this code, a [18, 14] code (18 total node, 14 data chunks). That has reportedly saved Microsoft anywhere between millions to hundreds of millions of dollars. Because they were able to replace an existing RS code which had 50% [storage] overhead as against the LRC which only has 29% [storage] overhead. When you spread this gain over huge amount of data, it translates into huge cost savings.
We have made some interesting continuations to the LRC. For example, we had a situation where within a larger erasure code, there were smaller erasure codes built in. Earlier, it was done using RS codes which allowed recovering the data from a localized code. The question of what happens in the case of hierarchical nodes, how much of the network should be disturbed became important. We looked at it and came up with a code which allowed one to say, if you do not have this machine then you can look at the next level and go on through the hierarchy until you find the node which enables you to reconstruct lost data. You do not have to touch machines that do not directly belong to the hierarchy. For the same reason, it is called the Hierarchical code and we contributed that code to the literature [on LRC]. We brought the notion of Hierarchical code and build the code. Later, we realized that there existed a notion of Hierarchical code but the team which worked on it had no claims to what the fundamental limits are, the bounds beyond which you cannot do better.