全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

On the Feasibility and Limitations of Just-in-Time Instruction Set Extension for FPGA-Based Reconfigurable Processors

DOI: 10.1155/2012/418315

Full-Text   Cite this paper   Add to My Lib

Abstract:

Reconfigurable instruction set processors provide the possibility of tailor the instruction set of a CPU to a particular application. While this customization process could be performed during runtime in order to adapt the CPU to the currently executed workload, this use case has been hardly investigated. In this paper, we study the feasibility of moving the customization process to runtime and evaluate the relation of the expected speedups and the associated overheads. To this end, we present a tool flow that is tailored to the requirements of this just-in-time ASIP specialization scenario. We evaluate our methods by targeting our previously introduced Woolcano reconfigurable ASIP architecture for a set of applications from the SPEC2006, SPEC2000, MiBench, and SciMark2 benchmark suites. Our results show that just-in-time ASIP specialization is promising for embedded computing applications, where average speedups of 5x can be achieved by spending 50 minutes for custom instruction identification and hardware generation. These overheads will be compensated if the applications execute for more than 2 hours. For the scientific computing benchmarks, the achievable speedup is only 1.2x, which requires significant execution times in the order of days to amortize the overheads. 1. Introduction Instruction set extension (ISE) is a frequently used approach for tailoring a CPU architecture to a particular application or domain [1]. The result of this customization process is an application-specific instruction set processor (ASIP) that augments a base CPU with custom instructions to increase the performance and energy efficiency. Once designed, the ASIP’s instruction set is typically fixed and turned into a hardwired silicon implementation. Alternatively, a reconfigurable ASIP architecture can implement the custom instructions in reconfigurable logic. Such reconfigurable ASIPs have been proposed in academic research [2–6], and there exist a few commercially available CPU architectures that allow for customizing the instruction set, for example, the Xilinx Virtex 4/5FX FPGAs or the Stretch S5 processor [7]. But although the adaptation of the instruction set during runtime is technically feasible and provides a promising technology to build adaptive computer systems which optimize themselves according to the needs of the actually executed workload [8], the idea of adapting the instruction set during runtime has been hardly explored. A number of obstacles make the exploitation of just-in-time (JIT) ISE challenging: (1) there are only very few commercially available

References

[1]  P. Ienne and R. Leupers, Customizable Embedded Processors: Design Technologies and Applications, Morgan Kaufmann, San Francisco, Calif, USA, 2006.
[2]  M. J. Wirthlin and B. L. Hutchings, “Dynamic instruction set computer,” in Proceedings of the 3rd IEEE Symposium on FPGAs for Custom Computing Machines, (FCCM '95), pp. 99–107, IEEE Computer Society, April 1995.
[3]  R. Razdan and M. D. Smith, “A high-performance microarchitecture with hardware-programmable functional units,” in Proceedings of the 27th International Symposium on Microarchitecture, (MICRO '94), pp. 172–180, ACM, New York, NY, USA, November 1994.
[4]  Z. A. Ye, A. Moshovos, S. Hauck, and P. Banerjee, “Chimaera: a high-performance architecture with a tightly-coupled reconfigurable functional unit,” in Proceedings of the 27th Annual International Symposium on Computer Architecture, (ISCA '00), pp. 225–235, ACM, June 2000.
[5]  P. M. Athanas and H. F. Silverman, “Processor reconfiguration through instruction-set metamorphosis,” Computer, vol. 26, no. 3, pp. 11–18, 1993.
[6]  M. Grad and C. Plessl, “Woolcano: an architecture and tool flow for dynamic instruction set extension on Xilinx Virtex-4 FX,” in Proceedings of the 9th International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA '09), pp. 319–322, CSREA Press, Monte Carlo Resort, Nev, USA, July 2009.
[7]  J. M. Arnold, “S5: the architecture and development flow of a software configurable processor,” in Proceedings of the International Conference on Field Programmable Technology, (ICFPT '05), pp. 121–128, IEEE Computer Society, Kent Ridge Guild House, Singapore, December 2005.
[8]  S. Borkar, “Design challenges of technology scaling,” IEEE Micro, vol. 19, no. 4, pp. 23–29, 1999.
[9]  M. Grad and C. Plessl, “An open source circuit library with benchmarking facilities,” in Proceedings of the 10th International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA, '10), T. P. Plaks, D. Andrews, R. F. DeMara et al., Eds., pp. 144–150, CSREA Press, Las Vegas, Nev, USA, July 2010.
[10]  M. Grad and C. Plessl, “Pruning the design space for just-in-time processor customization,” in Proceedings of the International Conference on ReConFigurable Computing and FPGAs (ReConFig '10), pp. 67–72, IEEE Computer Society, Cancun, Mexico, December 2010.
[11]  M. Grad and C. Plessl, “Just-in-time instruction set extension—feasibility and limitations for an FPGA-based reconfigurable ASIP architecture,” in Proceedings of the 18th Reconfigurable Architectures Workshop, (RAW '11), pp. 278–285, IEEE Computer Society, May 2011.
[12]  M. Wazlowski, L. Agarwal, T. Lee et al., “PRISM-II compiler and architecture,” in Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines, (FCCM '93), pp. 9–16, IEEE Computer Society, April 1993.
[13]  C. Galuzzi and K. Bertels, “The instruction-set extension problem: a survey,” in Proceedings of the International Conference on Architecture of Computing Systems, (ARCS '08), Lecture Notes in Computer Science, no. 4943, pp. 209–220, Springer/Kluwer Academic, Dresden, Germany, February 2008.
[14]  R. J. Hookway and M. A. Herdeg, “DIGITAL FX!32: combining emulation and binary translation,” Digital Technical Journal, vol. 9, no. 1, pp. 3–12, 1997.
[15]  V. Bala, E. Duesterwald, and S. Banerjia, “Transparent dynamic optimization,” Tech. Rep. number HPL-1999-78, HP Laboratories Cambridge, 1999.
[16]  K. Ebcioglu and E. R. Altman, “DAISY: dynamic compilation for 100% architectural compatibility,” in Proceedings of the 24th Annual International Symposium on Computer Architecture, pp. 26–37, New York, NY, USA, June 1997.
[17]  F. Vahid, G. Stitt, and R. Lysecky, “Warp processing: dynamic translation of binaries to FPGA circuits,” Computer, vol. 41, no. 7, pp. 40–46, 2008.
[18]  A. C. S. Beck and L. Carro, “Dynamic reconfiguration with binary translation: breaking the ILP barrier with software compatibility,” in Proceedings of the 42nd Design Automation Conference, (DAC '05), pp. 732–737, New York, NY, USA, June 2005.
[19]  L. Pozzi, K. Atasu, and P. Ienne, “Exact and approximate algorithms for the extension of embedded processor instruction sets,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 25, no. 7, pp. 1209–1229, 2006.
[20]  P. Yu and T. Mitra, “Scalable custom instructions identification for instruction-set extensible processors,” in Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, (CASES '04), pp. 69–78, Washington, DC, USA, September 2004.
[21]  C. A. William, W. Fornaciari, L. Pozzi, and M. Sami, “A DAG-based design approach for reconfigurable VLIW processors,” in Proceedings of the Design, Automation and Test in Europe Conference, (DATE '99), pp. 778–779, ACM, Munich, Germany, January 1999.
[22]  J. Gong, D. D. Gajski, and S. Narayan, “Software estimation from executable specifications,” Journal of Computer Software Engineering, vol. 2, pp. 239–258, 1994.
[23]  The PowerPC 405TM Core, IBM, 1998.
[24]  A. Ray, T. Srikanthan, and W. Jigang, “Practical techniques for performance estimation of processors,” in Proceedings of the International Workshop on System-on-Chip for Real-Time Applications, (IWSOC '05), pp. 308–311, IEEE Computer Society, Washington, DC, USA, 2005.
[25]  B. So, P. C. Diniz, and M. W. Hall, “Using estimates from behavioral synthesis tools in compiler-directed design space exploration,” in Proceedings of the 40th Design Automation Conference, pp. 514–519, New York, NY, USA, June 2003.
[26]  Floating-Point Operator v5.0, Xilinx.
[27]  N. Maheshwari and S. S. Sapatnekar, Timing Analysis and Optimization of Sequential Circuits, Springer/Kluwer Academic Publishers, Norwell, Mass, USA, 1999.
[28]  R. Meeuws, Y. Yankova, K. Bertels, G. Gaydadjiev, and S. Vassiliadis, “A quantitative prediction model for hardware/software partitioning,” in Proceedings of the International Conference on Field Programmable Logic and Applications, (FPL '07), pp. 735–739, Amsterdam, The Netherlands, August 2007.
[29]  P. Bonzini and L. Pozzi, “Polynomial-time subgraph enumeration for automated instruction set extension,” in Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, pp. 1331–1336, Nice, France, April 2007.
[30]  E. Bergeron, M. Feeley, and J. P. David, “Hardware JIT compilation for off-the-shelf dynamically reconfigurable FPGAs,” in Proceedings of the Joint European Conferences on Theory and Practice of Software 17th International Conference on Compiler Construction (CC/ETAPS’08), pp. 178–192, Springer-Verlag, Berlin, Heidelberg, 2008.

Full-Text

Contact Us

[email protected]

QQ:3279437679

WhatsApp +8615387084133