Virginia Tech® home

Daniel Valvo

Instructor
Daniel Valvo profile picture
McBryde Room 414

460 McBryde Hall, Virginia Tech
225 Stanger Street
Blacksburg, VA 24061-1026

Throughout my tenure as a graduate researcher, I have had multiple areas of interest, all with a common thread: linear exact repair schemes. The study of these schemes and their applications have led me to gain results and expertise all over the map, from Reed-Solomon code repair, to distributed storage systems, to secure distributed matrix multiplication. In this statement I would like to share with you the several developments I have helped make in these areas over the last 4 years, as well as the very promising future directions I see my research heading.

Linear exact repair schemes are a subtopic of coding theory, which has been my overall field of study since 2019. Coding theory is of course the study of codes, which are mathematical objects capable of effectively transmitting and storing information even in the presence of errors or erasures. Certain codes, known as evaluation codes, are able to repair erasures in transmitted or stored data by interpolating polynomials over finite fields. Essentially, the data in question is in the form of a vector, whose entries are the evaluations of a polynomial over certain points. If an entry is erased or corrupted, a user can use a subset of the other entries to interpolate what the polynomial must have been, and evaluate it at the missing point to reproduce the erased data. Through the most popular instance of such codes, Reed-Solomon codes, evaluation codes have been used in innumerable applications throughout the tech world, from the encoding of DVDs, to the transmission of images from the Voyager space probe, to modern day distributed storage networks.

However, despite their success there is an inefficiency in the traditional way of repairing evaluation codes. It seems wasteful that to repair a single missing entry the user must recreate the whole polynomial. This is where linear exact repair schemes have gained attention. Linear exact repair schemes are an alternative way to repair an erased entry in an evaluation codeword without ever recreating the full polynomial. In a variety of settings this significantly improves the efficiency of repairing the erased data. Guruswarmi and Wooters first showed these schemes were a viable improvement on traditional repair schemes for Reed-Solomon codes in their 2016 paper on the subject, setting off an explosion of research in the area. Importantly, however, as these schemes do not require the user to interpolate polynomials, they can be applied in settings where erasure repair was traditionally very tedious or non-existent.

Settings such as multivariate evaluation codes, which have been previously sidelined due to their lack of efficient and effective repair schemes. Multivariate evaluation codes deserve special mention as they are a well of untapped potential. They can do everything univariate polynomials can do and more, as the degrees of freedom gained from the additional variables can be used to create highly specialized and optimized codes. As they have been relatively unstudied in the past, my collaborators and I have already found a plethora of situations in which they are more efficient than the traditionally used codes, and we have many more on the horizon.

Starting with my master’s thesis on the subject, I have been researching the application of multivariate evaluation codes to distributed storage systems for 4 years. A distributed storage system is a system in which a single data file is stored over multiple storage nodes. The data must be distributed to these nodes in such a way that if a single storage node fails, the information in the remaining nodes may be used to recover that lost data and restore the failed node. Models like these have direct applications to places like server farms, cloud storage systems, and really anywhere that vast amounts of important data are stored. This includes organizations such as Amazon, Google, Microsoft, and more. In their 2016 paper, Guruswarmi and Wooters studied the application of linear exact repair schemes to recovering a single lost node in a distributed storage system based off of a Reed-Solomon code. From this inspiration, my research has focused mainly on developing distributed storage systems based off of multivariate evaluation codes, as well as developing univariate and multivariate based repair schemes capable of repairing several nodes simultaneously.

Starting in 2021 and continuing to the present I have also opened up a second area of research concerning secure distributed matrix multiplication schemes (SDMM schemes). These schemes have gained popularity in the last decades as several cutting edge technologies from optimization problems to machine learning applications require the fast and efficient multiplication of thousands upon millions of matrices. The basic idea of an SDMM scheme is to perform matrix multiplication in parallel, distributing the computation across several helper servers. In order to provide security, it is important that the scheme is designed such that no T helper servers can collude to obtain any information about A or B. I was originally introduced to SDMM schemes as they relate to linear exact repair schemes, however, I have become fascinated with researching SDMM schemes in their own right.