%0 Journal Article %T Performance Analysis Techniques for Multi-Soft-Core and Many-Soft-Core Systems %A David Castells-Rufas %A Eduard Fernandez-Alonso %A Jordi Carrabina %J International Journal of Reconfigurable Computing %D 2012 %I Hindawi Publishing Corporation %R 10.1155/2012/736347 %X Multi-soft-core systems are a viable and interesting solution for embedded systems that need a particular tradeoff between performance, flexibility and development speed. As the growing capacity allows it, many-soft-cores are also expected to have relevance to future embedded systems. As a consequence, parallel programming methods and tools will be necessarily embraced as a part of the full system development process. Performance analysis is an important part of the development process for parallel applications. It is usually mandatory when you want to get a desired performance or to verify that the system is meeting some real-time constraints. One of the usual techniques used by the HPC community is the postmortem analysis of application traces. However, this is not easily transported to the embedded systems based on FPGA due to the resource limitations of the platforms. We propose several techniques and some hardware architectural support to be able to generate traces on multiprocessor systems based on FPGAs and use them to optimize the performance of the running applications. 1. Introduction The emerging reconfigurable hardware devices allow the design of complex embedded systems combining soft-core processors with a mix of other IP cores. The reduced NRE costs compared to ASIC is a typical reason to choose FPGAs as the platform to implement some applications on [1]. However, the continuous increase of capacity and the flexibility offered by reconfigurable hardware are also important reasons to select FPGAs in order to get good time-to-market and time-in-market values. The actual FPGA strong demand let us speculate about the systems hosted by coming devices. We predict that hundreds of soft-core processors will be hosted in future devices, giving birth to many-soft-core systems. This speculative claim is based on the observation of several trends. First, the capacity of integration has been dramatically growing during the last decades following Moore¡¯s law, and, in fact, already today, top-of-the-line devices are possible to host more than 100 simple soft-core processors (see Figure 1). Figure 1: Evolution in the capacity of integration of Altera devices with respect to the theoretical number of soft-core processors that they might host. Second, companies are evolving EDA tools to make it easier to integrate a large number of processors. For instance, Altera recently introduced QSys [2], which can greatly simplify the tasks of designing tiled Noc-Based MPSoCs on a FPGA. Third, parallel programming is going mainstream in almost any computing platform. %U http://www.hindawi.com/journals/ijrc/2012/736347/