To refer to this page use:
|Abstract:||Decoupling techniques have been proposed to reduce the amount of memory latency exposed to high-performance accelerators as they fetch data. Although decoupled access-execute (DAE) and more recent decoupled data supply approaches offer promising single-threaded performance improvements, little work has considered how to extend them into parallel scenarios. This article explores the opportunities and challenges of designing parallel, high-performance, resource-efficient decoupled data supply systems. We propose Mercury, a parallel decoupled data supply system that utilizes thread-level parallelism for high-throughput data supply with good portability attributes. Additionally, we introduce some microarchitectural improvements for data supply units to efficiently handle long-latency indirect loads.|
|Citation:||Ham, Tae Jun, Juan L. Aragón, and Margaret Martonosi. "Efficient Data Supply for Parallel Heterogeneous Architectures." ACM Transactions on Architecture and Code Optimization (TACO) 16, no. 2 (2019): 9:1-9:23. doi:10.1145/3310332|
|Pages:||9:1 - 9:23|
|Type of Material:||Journal Article|
|Journal/Proceeding Title:||ACM Transactions on Architecture and Code Optimization|
|Version:||Final published version. This is an open access article.|
Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.