%0 Journal Article %T Transparent Runtime Migration of Loop-Based Traces of Processor Instructions to Reconfigurable Processing Units %A Jo£¿o Bispo %A Nuno Paulino %A Jo£¿o M. P. Cardoso %A Jo£¿o Canas Ferreira %J International Journal of Reconfigurable Computing %D 2013 %I Hindawi Publishing Corporation %R 10.1155/2013/340316 %X The ability to map instructions running in a microprocessor to a reconfigurable processing unit (RPU), acting as a coprocessor, enables the runtime acceleration of applications and ensures code and possibly performance portability. In this work, we focus on the mapping of loop-based instruction traces (called Megablocks) to RPUs. The proposed approach considers offline partitioning and mapping stages without ignoring their future runtime applicability. We present a toolchain that automatically extracts specific trace-based loops, called Megablocks, from MicroBlaze instruction traces and generates an RPU for executing those loops. Our hardware infrastructure is able to move loop execution from the microprocessor to the RPU transparently, at runtime, and without changing the executable binaries. The toolchain and the system are fully operational. Three FPGA implementations of the system, differing in the hardware interfaces used, were tested and evaluated with a set of 15 application kernels. Speedups ranging from 1.26 to 3.69 were achieved for the best alternative using a MicroBlaze processor with local memory. 1. Introduction The performance of an embedded application running on a general-purpose processor (GPP) can be enhanced by moving the computationally intensive parts to specialized hardware units and/or to Reconfigurable Processing Units (RPUs) acting as acceleration coprocessors of the GPP [1, 2]. This is a common practice in embedded systems. However, doing so, manually or automatically, usually implies a hardware/software partitioning step over the input source code [3]. This step is static, requires the source code of the application, and does not promote code and performance portability as the hardware/software components are obtained for a specific target architecture. Dynamic partitioning and mapping of computations (hereafter simply referred as dynamic partitioning) [4¨C6] is a promising technique able to move computations from an GPP to the coprocessor in a transparent and flexible way, and may become an important contribution for the future reconfigurable embedded computing systems. In this paper, we present a system which can automatically map loops, detected by running a MicroBlaze executable binary, to an RPU. We focus on a special kind of trace-based loop, named Megablock [7], and transform Megablocks into graph representations which are then used to generate Megablock-tailored RPUs. Megablocks are repeating patterns of elementary units of the trace (e.g., basic blocks) in the instruction stream of the program being executed. The RPU %U http://www.hindawi.com/journals/ijrc/2013/340316/