全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Performance Analysis Techniques for Multi-Soft-Core and Many-Soft-Core Systems

DOI: 10.1155/2012/736347

Full-Text   Cite this paper   Add to My Lib

Abstract:

Multi-soft-core systems are a viable and interesting solution for embedded systems that need a particular tradeoff between performance, flexibility and development speed. As the growing capacity allows it, many-soft-cores are also expected to have relevance to future embedded systems. As a consequence, parallel programming methods and tools will be necessarily embraced as a part of the full system development process. Performance analysis is an important part of the development process for parallel applications. It is usually mandatory when you want to get a desired performance or to verify that the system is meeting some real-time constraints. One of the usual techniques used by the HPC community is the postmortem analysis of application traces. However, this is not easily transported to the embedded systems based on FPGA due to the resource limitations of the platforms. We propose several techniques and some hardware architectural support to be able to generate traces on multiprocessor systems based on FPGAs and use them to optimize the performance of the running applications. 1. Introduction The emerging reconfigurable hardware devices allow the design of complex embedded systems combining soft-core processors with a mix of other IP cores. The reduced NRE costs compared to ASIC is a typical reason to choose FPGAs as the platform to implement some applications on [1]. However, the continuous increase of capacity and the flexibility offered by reconfigurable hardware are also important reasons to select FPGAs in order to get good time-to-market and time-in-market values. The actual FPGA strong demand let us speculate about the systems hosted by coming devices. We predict that hundreds of soft-core processors will be hosted in future devices, giving birth to many-soft-core systems. This speculative claim is based on the observation of several trends. First, the capacity of integration has been dramatically growing during the last decades following Moore’s law, and, in fact, already today, top-of-the-line devices are possible to host more than 100 simple soft-core processors (see Figure 1). Figure 1: Evolution in the capacity of integration of Altera devices with respect to the theoretical number of soft-core processors that they might host. Second, companies are evolving EDA tools to make it easier to integrate a large number of processors. For instance, Altera recently introduced QSys [2], which can greatly simplify the tasks of designing tiled Noc-Based MPSoCs on a FPGA. Third, parallel programming is going mainstream in almost any computing platform.

References

[1]  P. H. W. Leong, “Recent trends in FPGA architectures and applications,” in Proceedings of the 4th IEEE International Symposium on Electronic Design, Test and Applications (DELTA '08), pp. 137–141, January 2008.
[2]  Altera, Qsys System Integration Tool, http://www.altera.com/products/software/quartus-ii/subscription-edition/qsys/qts-qsys.html.
[3]  P. E. McKenney and D. Sarma, “Hard real-time response,” Patent US, 7748003, 2010, http://www.google.com/patents/US7748003.
[4]  J. Curreri, S. Koehler, A. George, B. Holland, and R. Garcia, “Performance analysis framework for high-level language applications in reconfigurable computing,” ACM Transactions on Reconfigurable Technology and Systems (TRETS), vol. 3, article 5, 2010.
[5]  S. Koehler, G. Stitt, and A. D. George, “Platform-aware bottleneck detection for reconfigurable computing applications,” ACM Transactions on Reconfigurable Technology and Systems (TRETS), vol. 4, no. 3, article 30, 2011.
[6]  T. S. J. Sun and R. D. Leu, “Software performance analysis using hardware analyzer,” Patent US, 5903759, 1999, http://ip.com/patfam/xx/25351944.
[7]  Altera Corporation, “Design Debugging Using the SignalTap II Embedded Logic Analyzer,” 2007, http://www.altera.com/literature/hb/qts/qts_qii53009.pdf.
[8]  Altera Corporation, “Profiling Nios II Systems,” 2005, http://www.altera.com/literature/an/an391.pdf.
[9]  J. G. Tong and M. A. S. Khalid, “A comparison of profiling tools for FPGA-based embedded systems,” in Proceedings of the Canadian Conference on Electrical and Computer Engineering (CCECD '07), pp. 1687–1690, April 2007.
[10]  L. Shannon and P. Chow, “Using reconfigurability to achieve real-time profiling for hardware/software codesign,” in Proceedings of the 12th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '04), pp. 190–199, February 2004.
[11]  D. Castells-Rufas, J. Joven, S. Risue?o et al., “MPSoC performance analysis with virtual prototyping platforms,” in Proceedings of the 39th International Conference on Parallel Processing Workshops (ICPPW '10), pp. 154–160, September 2010.
[12]  E. Fernandez-Alonso, D. Castells-Rufas, S. Risue?o, J. Carrabina, and J. Joven, “A NoC-based multi-{soft}core with 16 cores,” in Proceedings of the IEEE International Conference on Electronics, Circuits, and Systems (ICECS '10), pp. 259–262, December 2010.
[13]  A. Knüpfer, H. Brunst, J. Doleschal et al., The Vampir Performance Analysis Tool-Set Tools for High Performance Computing, Springer, 2008.
[14]  H. Hübert, B. Stabernack, and K.I. Wels, “Performance and memory profiling for embedded system design,” in Proceedings of the IEEE 2nd International Symposium on Industrial Embedded Systems (SIES '07), pp. 94–101, July 2007.
[15]  M. Montón, A. Portero, M. Moreno, B. Martínez, and J. Carrabina, “Mixed SW/systemC SoC emulation framework,” in Proceedings of the IEEE International Symposium on Industrial Electronics (ISIE '07), pp. 2338–2341, June 2007.
[16]  H. Posadas, F. Herrera, P. Sánchez, E. Villar, and F. Blasco, “System-level performance analysis in systemc,” in Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE '04), vol. 1, pp. 378–383, February 2004.
[17]  H. Posadas, S. Real, and E. Villar, M3-SCoPE: Performance Modeling of Multi-Processor Embedded Systems for Fast Design Space Exploration Multi-Objective Design Space Exploration of Multiprocessor SOC Architectures: The Multicube Approach, vol. 19, Springer, 2011.
[18]  I. B?hm, B. Franke, and N. Topham, “Cycle-accurate performance modelling in an ultra-fast just-in-time dynamic binary translation instruction set simulator,” in Proceedings of the 10th International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (IC-SAMOS '10), pp. 1–10, July 2010.
[19]  A. Knüpfer, R. Brendel, H. Brunst, H. Mix, and W. E. Nagel, “Introducing the open trace format (OTF),” in Proceedings of the 6th International Conference, Part II, Computational Science (ICCS '06), N. Vassil Alexandrov, G. D. van Albada, M. A. Peter Sloot, and J. Dongarra, Eds., vol. 3992, pp. 526–533, Springer, Reading, UK, May 2006.
[20]  J. Balart, A. Duran, M. Gonzàlez, X. Martorell, E. Ayguadé, and J. Labarta, “Nanos mercurium: a research compiler for openmp,” in Proceedings of the European Workshop on OpenMP, vol. 8, 2004.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413