Low power general purpose loop acceleration for NDP applications

Modern processor architectures face a throughput scaling problem as the performance bottleneck shifts from the core pipeline to the data transfer operations between the dynamic random access memory (DRAM) and the processor chip. To address such issue researchers have proposed the near-data processing (NDP) paradigm in which the instruction execution is moved to the DRAM die thus, lowering the data movement between the processor and the DRAM. Previous NDP works focus on specific application types and thus the general purpose application execution paradigm is neglected. In this work we propose an NDP methodology for low power general purpose loop acceleration. For this reason we design and implement a hardware loop accelerator from the ground up to improve the throughput and lower the power consumption of general purpose loops. We adopt a novel loop scheduling approach which enables the loop accelerator to take advantage of the dataflow parallelism of the executing loop and we implement our design on the logic layer of a hybrid memory cube (HMC) DRAM. Post-layout simulations demonstrate an average speedup factor of 20.5x when executing kernels from various scientific fields while the energy consumption is reduced by a factor of 9.3x over the host CPU execution. © 2020 ACM.

URI

http://hdl.handle.net/11615/80272

Colecciones

Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19735]