Riffle: optimized shuffle service for large-scale data analytics

Zhang, Haoyu; Cho, Brian; Seyfe, Ergin; Ching, Avery; Freedman, Michael J

Riffle: optimized shuffle service for large-scale data analytics

Author(s): Zhang, Haoyu; Cho, Brian; Seyfe, Ergin; Ching, Avery; Freedman, Michael J

Download

To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1hz60

Full metadata record

DC Field	Value	Language
dc.contributor.author	Zhang, Haoyu	-
dc.contributor.author	Cho, Brian	-
dc.contributor.author	Seyfe, Ergin	-
dc.contributor.author	Ching, Avery	-
dc.contributor.author	Freedman, Michael J	-
dc.date.accessioned	2021-10-08T19:46:22Z	-
dc.date.available	2021-10-08T19:46:22Z	-
dc.date.issued	2018-04	en_US
dc.identifier.citation	Zhang, Haoyu, Brian Cho, Ergin Seyfe, Avery Ching, and Michael J. Freedman. "Riffle: optimized shuffle service for large-scale data analytics." In Proceedings of the Thirteenth EuroSys Conference (2018): pp. 1-15. doi:10.1145/3190508.3190534	en_US
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/pr1hz60	-
dc.description.abstract	The rapidly growing size of data and complexity of analytics present new challenges for large-scale data processing systems. Modern systems keep data partitions in memory for pipelined operators, and persist data across stages with wide dependencies on disks for fault tolerance. While processing can often scale well by splitting jobs into smaller tasks for better parallelism, all-to-all data transfer---called shuffle operations---become the scaling bottleneck when running many small tasks in multi-stage data analytics jobs. Our key observation is that this bottleneck is due to the superlinear increase in disk I/O operations as data volume increases. We present Riffle, an optimized shuffle service for big-data analytics frameworks that significantly improves I/O efficiency and scales to process petabytes of data. To do so, Riffle efficiently merges fragmented intermediate shuffle files into larger block files, and thus converts small, random disk I/O requests into large, sequential ones. Riffle further improves performance and fault tolerance by mixing both merged and unmerged block files to minimize merge operation overhead. Using Riffle, Facebook production jobs on Spark clusters with over 1,000 executors experience up to a 10x reduction in the number of shuffle I/O requests and 40% improvement in the end-to-end job completion time.	en_US
dc.format.extent	1 - 15	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartof	Proceedings of the Thirteenth EuroSys Conference	en_US
dc.rights	Final published version. This is an open access article.	en_US
dc.title	Riffle: optimized shuffle service for large-scale data analytics	en_US
dc.type	Conference Article	en_US
dc.identifier.doi	10.1145/3190508.3190534	-
pu.type.symplectic	http://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceeding	en_US

Files in This Item:

File	Description	Size	Format
RiffleOptimShuffleServiceLargeScaleDataAnalytics.pdf		441.26 kB	Adobe PDF	View/Download

Show Simple Item Record