Parallelised Multithreaded Applications on a 4-core Field Programmable Gate Array (FPGA) Architecture
Abstract
Background: The challenges in real-time multithreading, particularly in the efficiency of multithreaded applications running concurrently on multiple cores, have evolved significantly due to the increase in IoT, cloud and edge computing applications. The continuous increase in cores depth adds further research issues related to the efficiency of such multicore systems and their applications. Therefore, further research is still required. Multicore systems can achieve higher performance running in parallel multiple multithreaded applications. However, efficient parallelisation of multiple threads among many cores is not an easy task. Field Programmable Gate Arrays (FPGAs) is a preferred technology for the rapid design and experimentation with such architectures, based primarily on softcore processors. Objectives: The purpose of this research is to investigate the efficiency of running in parallel and concurrently multithreaded applications on a 4-core FPGA multicore architecture. Methods: The design of a 4-core FPGA architecture is implemented with Nios II/f soft processors on a Cyclone IV series chip, having real-time Linux operating system (OS) support. A multithreaded application with specific compute-intensive tasks is developed in C, and is used to obtain measurements in specific efficiency metrics under different core configurations. Results: The reliability of the proposed 4-core FPGA architecture is validated against 4-core and 2core development platforms, respectively, on Raspberry Pi4 and BeagleBone AI single board computers. The results have been analysed and evaluated upon performance metrics, including execution time, response time, speedup, and cores usage. The experimental tests demonstrate the validity and efficiency of the approach to using FPGA for experimentations with multithreaded applications. Conclusion: The obtained results show that the proposed FPGA architecture stands well both in terms of timing and efficiency metrics. Execution times are about 50% lower, and the average speedup at 21% is fairly close to that of 33% for the Raspberry Pi4, and higher than BeagleBone AI (10%). The proposed measurements approach and evaluation methodology could benefit the design and development of real-time systems utilizing operating systems with real-time support in emerging areas, such as embedded devices in real-time control. © 2022 Bentham Science Publishers.
Collections
Related items
Showing items related by title, author, creator and subject.
-
A programmable Si-photonic node for SDN-enabled Bloom filter forwarding in disaggregated data centers
Moralis-Pegios M., Terzenidis N., Vagionas C., Pitris S., Chatzianagnostou E., Brimont A., Zanzi A., Sanchis P., Marti J., Kraft J., Rochracher K., Dorrestein S., Bogdan M., Tekin T., Syrivelis D., Tassiulas L., Miliou A., Pleros N., Vyrsokinos K. (2017)Programmable switching nodes supporting Software-Defined Networking (SDN) over optical interconnecting technologies arise as a key enabling technology for future disaggregated Data Center (DC) environments. The SDNenabling ... -
Analysis of pediatric Obstructive Nephropathy using protein antibody arrays and computational techniques
Valavanis, I.; Caubet, C.; Maglogiannis, I.; Klein, J.; Schanstra, J.; Chatziioannou, A. (2010)Obstructive Nephropathy (ON) is a renal disease quite frequent in newborns and children. Although it is caused by the improper flow of urine, the nephron destruction can be magnified by various molecular processes. In the ... -
Synthesis of platform architectures from OpenCL programs
Owaida, M.; Bellas, N.; Daloukas, K.; Antonopoulos, C. D. (2011)The problem of automatically generating hardware modules from a high level representation of an application has been at the research forefront in the last few years. In this paper, we use OpenCL, an industry supported ...