Toward Fine-Grained, Unsupervised, Scalable Performance Diagnosis for Production Cloud Computing Systems
Publication in refereed journal

Times Cited
Web of Science38WOS source URL (as at 14/09/2020) Click here for the latest count
Altmetrics Information

Other information
AbstractPerformance diagnosis is labor intensive in production cloud computing systems. Such systems typically face many real-world challenges, which the existing diagnosis techniques for such distributed systems cannot effectively solve. An efficient, unsupervised diagnosis tool for locating fine-grained performance anomalies is still lacking in production cloud computing systems. This paper proposes CloudDiag to bridge this gap. Combining a statistical technique and a fast matrix recovery algorithm, CloudDiag can efficiently pinpoint fine-grained causes of the performance problems, which does not require any domain-specific knowledge to the target system. CloudDiag has been applied in a practical production cloud computing systems to diagnose performance problems. We demonstrate the effectiveness of CloudDiag in three real-world case studies.
All Author(s) ListMi HB, Wang HM, Zhou YF, Lyu MRT, Cai H
Journal nameIEEE Transactions on Parallel and Distributed Systems
Volume Number24
Issue Number6
Pages1245 - 1255
LanguagesEnglish-United Kingdom
KeywordsCloud computing; performance diagnosis; request tracing
Web of Science Subject CategoriesComputer Science; Computer Science, Theory & Methods; COMPUTER SCIENCE, THEORY & METHODS; Engineering; Engineering, Electrical & Electronic; ENGINEERING, ELECTRICAL & ELECTRONIC

Last updated on 2020-15-09 at 03:26