Skip to main content

Who's Afraid of Uncorrectable Bit Errors? Online Recovery of Flash Errors with Distributed Redundancy

Author(s): Tai, Amy; Kryczka, Andrew; Kanaujia, Shobhit O; Jamieson, Kyle; Freedman, Michael J; et al

Download
To refer to this page use: http://arks.princeton.edu/ark:/88435/pr17z7n
Full metadata record
DC FieldValueLanguage
dc.contributor.authorTai, Amy-
dc.contributor.authorKryczka, Andrew-
dc.contributor.authorKanaujia, Shobhit O-
dc.contributor.authorJamieson, Kyle-
dc.contributor.authorFreedman, Michael J-
dc.contributor.authorCidon, Asaf-
dc.date.accessioned2021-10-08T19:50:22Z-
dc.date.available2021-10-08T19:50:22Z-
dc.date.issued2019en_US
dc.identifier.citationTai, Amy, Andrew Kryczka, Shobhit O. Kanaujia, Kyle Jamieson, Michael J. Freedman, and Asaf Cidon. "Who's afraid of uncorrectable bit errors? online recovery of flash errors with distributed redundancy." In USENIX Annual Technical Conference (2019): pp. 977-992.en_US
dc.identifier.urihttps://www.usenix.org/system/files/atc19-tai.pdf-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/pr17z7n-
dc.description.abstractDue to its high performance and decreasing cost per bit, flash storage is the main storage medium in datacenters for hot data. However, flash endurance is a perpetual problem, and due to technology trends, subsequent generations of flash devices exhibit progressively shorter lifetimes before they experience uncorrectable bit errors. In this paper, we propose addressing the flash lifetime problem by allowing devices to expose higher bit error rates. We present DIRECT, a set of techniques that harnesses distributed-level redundancy to enable the adoption of new generations of denser and less reliable flash storage technologies. DIRECT does so by using an end-to-end approach to increase the reliability of distributed storage systems. We implemented DIRECT on two real-world storage systems: ZippyDB, a distributed key-value store in production at Facebook and backed by RocksDB, and HDFS, a distributed file system. When tested on production traces at Facebook, DIRECT reduces application-visible error rates in ZippyDB by more than 100x and recovery time by more than 10,000x. DIRECT also allows HDFS to tolerate a 10,000--100,000x higher bit error rate without experiencing application-visible errors. By significantly increasing the availability and durability of distributed storage systems in the face of bit errors, DIRECT helps extend flash lifetimes.en_US
dc.format.extent977 - 992en_US
dc.language.isoen_USen_US
dc.relation.ispartofUSENIX Annual Technical Conferenceen_US
dc.rightsFinal published version. This is an open access article.en_US
dc.titleWho's Afraid of Uncorrectable Bit Errors? Online Recovery of Flash Errors with Distributed Redundancyen_US
dc.typeConference Articleen_US
pu.type.symplectichttp://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceedingen_US

Files in This Item:
File Description SizeFormat 
BitErrors.pdf625.15 kBAdobe PDFView/Download


Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.