Extracting coarse-grained pipelined parallelism out of sequential applications for parallel processor arrays
Datum
2009Schlagwort
Zusammenfassung
We present development and runtime support for building application specific data processing pipelines out of sequential code, and for executing them on a general purpose platform that features a reconfigurable Parallel Processor Array (PPA). Our approach is to let the programmer annotate the source of the application to indicate the desired pipeline stages and associated data flow, with little code restructuring. A pre-processor is then used to transform the annotated program into different code segments according to the indicated pipeline structure, generate the corresponding executable code, and produce a bundled application package containing all executables and deployment information for the target platform. There are special mechanisms for setting up the application-specific pipeline structure on the PPA and achieving integrated execution in the context of a general-purpose operating system, enabling the pipelined application to access the usual system peripherals and run concurrently with other conventional programs. To verify our approach, we have built a prototype system using soft processor arrays on an embedded FPGA platform, and transformed a well-known application into a pipelined version that executes successfully on our prototype. © 2009 Springer Berlin Heidelberg.
Collections
Verwandte Dokumente
Anzeige der Dokumente mit ähnlichem Titel, Autor, Urheber und Thema.
-
Supporting multitasking of pipelined computations on embedded parallel processor arrays
Syrivelis, D.; Lalis, S. (2009)This paper presents software support that enables seamless task restructuring and load balancing of pipelined applications at runtime, making it possible to dynamically pick the stages that will be executed as separate ... -
Instruction-Flow-Based Timing Analysis in Pipelined Processors
Tziouvaras A., Dimitriou G., Dossis M., Stamoulis G. (2019)Microprocessor design utilizes timing analysis in order to establish the maximal operation clock speed of the circuit. In static timing analysis, clock frequency is set in accord with the worst-case delay in the circuit ... -
Performance and power simulation of a functional-unit-network processor with Simplescalar and Wattch
Kalaitzidis K., Dimitriou G., Stamoulis G., Dossis M. (2015)Loop acceleration is a means to enhance performance of a singleor multiple-issue microprocessor core. A new edge-like processor architecture incorporates a loop accelerator directly in the out-oforder back end of the ...