%0 Journal Article
%T A Heuristic Scheduler for Port-Constrained Floating-Point Pipelines
%A Zheming Jin
%A Jason D. Bakos
%J International Journal of Reconfigurable Computing
%D 2013
%I Hindawi Publishing Corporation
%R 10.1155/2013/849545
%X We describe a heuristic scheduling approach for optimizing floating-point pipelines subject to input port constraints. The objective of our technique is to maximize functional unit reuse while minimizing the following performance metrics in the generated circuit: (1) maximum multiplexer fanin, (2) datapath fanout, (3) number of multiplexers, and (4) number of registers. For a set of systems biology markup language (SBML) benchmark expressions, we compare the resource usages given by our method to those given by a branch-and-bound enumeration of all valid schedules. Compared with the enumeration results, our heuristic requires on average 33.4% less multiplexer bits and 32.9% less register bits than the worse case, while only requiring 14% more multiplexer bits and 4.5% more register bits than the optimal case. We also compare our results against those given by the state-of-art high-level synthesis tool Xilinx AutoESL. For the most complex of our benchmark expressions, our synthesis technique requires 20% less FPGA slices than AutoESL. 1. Introduction Over the past twenty years there has been a wide range of research in the field of high-level synthesis for pipelined datapaths [1每3]. High-level synthesis methods can often be guided by user-specified constraints that result in throughput and area tradeoffs for the generated circuits. For example, designers may wish to generate circuits with as few resources as possible at the expense of increased multiplexer fan-in that generally results in lower maximum clock frequency. On the other hand, designers may want the generated circuits to achieve maximum clock speed without regard to the resultant resource requirement. While most HLS tools consider these types of constraints, few incorporate input port-constraints when optimizing resource usage. In other words, many HLS tools do not allow the designer to explicitly constrain the synthesis process for memory bandwidth. As an example, consider the following arithmetic expression, the phylogenetic likelihood function (PLF), which is widely used in likelihood-based phylogenetic tools and is shown in [4] Written in another way, the PLF for a single symbol can be computed using the following line of high-level code: This expression requires sixteen floating-point inputs. To achieve maximum throughput when implemented as a pipeline, all sixteen inputs must be read and fed into the pipeline in each clock cycle. Since there are no dependencies in this expression, this would allow one output value to be produced every clock cycle after the pipeline fills. However, this
%U http://www.hindawi.com/journals/ijrc/2013/849545/