Skip to main content

Encore: low-cost, fine-grained transient fault recovery

Author(s): Feng, Shuguang; Gupta, Shantanu; Ansari, Amin; Mahlke, Scott A; August, David I

Download
To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1bn6v
Full metadata record
DC FieldValueLanguage
dc.contributor.authorFeng, Shuguang-
dc.contributor.authorGupta, Shantanu-
dc.contributor.authorAnsari, Amin-
dc.contributor.authorMahlke, Scott A-
dc.contributor.authorAugust, David I-
dc.date.accessioned2021-10-08T19:45:45Z-
dc.date.available2021-10-08T19:45:45Z-
dc.date.issued2011-12en_US
dc.identifier.citationFeng, Shuguang, Shantanu Gupta, Amin Ansari, Scott A. Mahlke, and David I. August. "Encore: low-cost, fine-grained transient fault recovery." Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (2011): pp. 398-409. doi:10.1145/2155620.2155667en_US
dc.identifier.urihttp://cccp.eecs.umich.edu/papers/sfeng-encore11.pdf-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/pr1bn6v-
dc.description.abstractTo meet an insatiable consumer demand for greater performance at less power, silicon technology has scaled to unprecedented dimensions. However, the pursuit of faster processors and longer battery life has come at the cost of reliability. Given the rise of processor reliability as a first-order design constraint, there has been a growing interest in low-cost, non-intrusive techniques for transient fault detection. Many of these recent proposals have counted on the availability of hardware recovery mechanisms. Although common in aggressive out-of-order cores, hardware support for speculative rollback and recovery is less common in lower-end commodity processors. This paper presents Encore, a software-based fault recovery mechanism tailored for these lower-cost systems that lack native hardware support for speculative rollback recovery. Encore combines program analysis, profile data, and simple code transformations to create statistically idempotent code regions that can recover from faults at very little cost. Using this software-only, compiler-based approach, Encore provides the ability to recover from transient faults without specialized hardware or the costs of traditional, full-system checkpointing solutions. Experimental results show that Encore, with just 14% of runtime overhead, can safely recover, on average from 97% of transient faults when coupled with existing detection schemes.en_US
dc.format.extent398 - 409en_US
dc.language.isoen_USen_US
dc.relation.ispartofProceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitectureen_US
dc.rightsAuthor's manuscripten_US
dc.titleEncore: low-cost, fine-grained transient fault recoveryen_US
dc.typeConference Articleen_US
dc.identifier.doi10.1145/2155620.2155667-
pu.type.symplectichttp://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceedingen_US

Files in This Item:
File Description SizeFormat 
EncoreLowCostFineGrainedFaultRecovery.pdf1.31 MBAdobe PDFView/Download


Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.