Minimal MapReduce algorithms
Refereed conference paper presented and published in conference proceedings

Times Cited
Altmetrics Information

Other information
AbstractMapReduce has become a dominant parallel computing paradigm for big data, i.e., colossal datasets at the scale of tera-bytes or higher. Ideally, a MapReduce system should achieve a high degree of load balancing among the participating machines, and minimize the space usage, CPU and I/O time, and network transfer at each machine. Although these principles have guided the development of MapReduce algorithms, limited emphasis has been placed on enforcing serious constraints on the aforementioned metrics simultaneously. This paper presents the notion of minimal algorithm, that is, an algorithm that guarantees the best parallelization in multiple aspects at the same time, up to a small constant factor. We show the existence of elegant minimal algorithms for a set of fundamental database problems, and demonstrate their excellent performance with extensive experiments. Copyright © 2013 ACM.
All Author(s) ListTao Y., Lin W., Xiao X.
Name of Conference2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013
Start Date of Conference22/06/2013
End Date of Conference27/06/2013
Place of ConferenceNew York, NY
Country/Region of ConferenceUnited States of America
Detailed descriptionorganized by ACM,
Pages529 - 540
LanguagesEnglish-United Kingdom
KeywordsBig data, MapReduce, Minimal algorithm

Last updated on 2020-14-10 at 01:58