Abstract
A common approach to enhance the performance of processors is to increase the number of function units which
operate concurrently. We observe this development in all recent general purpose superscalar processors, and in VLIW
(very long instruction word) processors used for more dedicated application domains, like the multi-media domain.
This paper analyzes the data path complexity of ILP processors (in particular VLIWs), and shows that they soon may
hit the complexity wall; their complexity gets out of control when scaling to very high performance. Several methods are
investigated for reducing this complexity. Essentially these methods trade hardware for software complexity, i.e.,
performing as much as possible at compile time. Combining these methods results in a new architecture, called
transport triggered architecture or TTA. The concept of transport triggering is outlined together with its characteristics.
It will be shown that the application of this concept results in a number of hardware advantages, and introduces a
number of new scheduling optimizations. Together they substantially reduce the ILP complexity bottleneck, which will
be demonstrated by a number of experiments.
Original language | English |
---|---|
Pages (from-to) | 949-973 |
Number of pages | 25 |
Journal | Journal of Systems Architecture |
Volume | 45 |
Issue number | 12/13 |
DOIs | |
Publication status | Published - 1999 |