Skip to main content

Lightweight, high-resolution monitoring for troubleshooting production systems

Author(s): Bhatia, Sapan; Kumar, Abhishek; Fiuczynski, Marc E; Peterson, Larry

Download
To refer to this page use: http://arks.princeton.edu/ark:/88435/pr13v74
Full metadata record
DC FieldValueLanguage
dc.contributor.authorBhatia, Sapan-
dc.contributor.authorKumar, Abhishek-
dc.contributor.authorFiuczynski, Marc E-
dc.contributor.authorPeterson, Larry-
dc.date.accessioned2021-10-08T19:49:25Z-
dc.date.available2021-10-08T19:49:25Z-
dc.date.issued2008en_US
dc.identifier.citationBhatia, Sapan, Abhishek Kumar, Marc E. Fiuczynski, and Larry Peterson. "Lightweight, high-resolution monitoring for troubleshooting production systems." In Proceedings of the 8th USENIX conference on Operating systems design and implementation (2008): pp. 103-116.en_US
dc.identifier.urihttps://static.usenix.org/events/osdi08/tech/full_papers/bhatia/bhatia.pdf-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/pr13v74-
dc.description.abstractProduction systems are commonly plagued by intermittent problems that are difficult to diagnose. This paper describes a new diagnostic tool, called Chopstix, that continuously collects profiles of low-level OS events (e.g., scheduling, L2 cache misses, CPU utilization, I/O operations, page allocation, locking) at the granularity of executables, procedures and instructions. Chopstix then reconstructs these events offline for analysis. We have used Chopstix to diagnose several elusive problems in a largescale production system, thereby reducing these intermittent problems to reproducible bugs that can be debugged using standard techniques. The key to Chopstix is an approximate data collection strategy that incurs very low overhead. An evaluation shows Chopstix requires under 1% of the CPU, under 256KB of RAM, and under 16MB of disk space per day to collect a rich set of system-wide data.en_US
dc.format.extent103 - 116en_US
dc.language.isoen_USen_US
dc.relation.ispartofProceedings of the 8th USENIX conference on Operating systems design and implementationen_US
dc.rightsFinal published version. Article is made available in OAR by the publisher's permission or policy.en_US
dc.titleLightweight, high-resolution monitoring for troubleshooting production systemsen_US
dc.typeConference Articleen_US
pu.type.symplectichttp://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceedingen_US

Files in This Item:
File Description SizeFormat 
LightweightHighResMonitoring.pdf226.06 kBAdobe PDFView/Download


Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.