Skip to main content

Runtime asynchronous fault tolerance via speculation

Author(s): Zhang, Yun; Ghosh, Soumyadeep; Huang, Jialu; Lee, Jae W; Mahlke, Scott A; et al

Download
To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1dr6p
Full metadata record
DC FieldValueLanguage
dc.contributor.authorZhang, Yun-
dc.contributor.authorGhosh, Soumyadeep-
dc.contributor.authorHuang, Jialu-
dc.contributor.authorLee, Jae W-
dc.contributor.authorMahlke, Scott A-
dc.contributor.authorAugust, David I-
dc.date.accessioned2021-10-08T19:45:21Z-
dc.date.available2021-10-08T19:45:21Z-
dc.date.issued2012en_US
dc.identifier.citationZhang, Yun, Soumyadeep Ghosh, Jialu Huang, Jae W. Lee, Scott A. Mahlke, and David I. August. "Runtime asynchronous fault tolerance via speculation." Proceedings of the Tenth International Symposium on Code Generation and Optimization (2012): pp. 145-154. doi:10.1145/2259016.2259035en_US
dc.identifier.issn2164-2397-
dc.identifier.urihttps://liberty.princeton.edu/Publications/cgo12_raft.pdf-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/pr1dr6p-
dc.description.abstractTransient faults are emerging as a critical reliability concern in modern microprocessors. Redundant hardware solutions are commonly deployed to detect transient faults, but they are less flexible and cost-effective than software solutions. However, software solutions are rendered impractical because of high performance overheads. To address this problem, this paper presents Runtime Asynchronous Fault Tolerance via Speculation (RAFT), the fastest transient fault detection technique known to date. Serving as a layer between the application and the underlying platform, RAFT automatically generates two symmetric program instances from a program binary. It detects transient faults in a non-invasive way and exploits high-confidence value speculation to achieve low runtime overhead. Evaluation on a commodity multicore system demonstrates that RAFT delivers a geomean performance overhead of 2.83% on a set of 30 SPEC CPU benchmarks and STAMP benchmarks. Compared with existing transient fault detection techniques, RAFT exhibits the best performance and fault coverage, without requiring any change to the hardware or the software applications.en_US
dc.format.extent145 - 154en_US
dc.language.isoen_USen_US
dc.relation.ispartofProceedings of the Tenth International Symposium on Code Generation and Optimizationen_US
dc.rightsAuthor's manuscripten_US
dc.titleRuntime asynchronous fault tolerance via speculationen_US
dc.typeConference Articleen_US
dc.identifier.doi10.1145/2259016.2259035-
pu.type.symplectichttp://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceedingen_US

Files in This Item:
File Description SizeFormat 
RuntimeAsynchronousFaultToleranceSpeculation.pdf303.12 kBAdobe PDFView/Download


Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.