Hardware-Software Codesign for Decoupled Execution
- Reference number
- SM17-0064
- Start and end dates
- 180101-191231
- Amount granted
- 748 815 SEK
- Administrative organization
- Uppsala University
- Research area
- Information, Communication and Systems Technology
Summary
Modern computers are limited by memory latency (the memory wall) and by energy consumption (the power wall). More and more, we find that the answer to these problems lies in acceleration: i.e., extracting a part of a program and running it on specialized hardware that best matches its characteristics. This can yield significant energy efficiency at very high performance. I have pioneered an approach for decoupling access (difficult code dealing with all data transfer and control-flow) from execution (computation dataflow) to best match the dynamic energy management capabilities of modern processors. The key observation is that this decoupling can be applied for transparent acceleration of general-purpose code by allocating the access part and the execute part on differently specialized cores: memory operations require vastly different acceleration from compute. The promise in this approach is that both the “memory wall” and the “power wall” can be attacked with one “stone:” customized acceleration of decoupled access and execute. The goal of this proposal is to give Stefanos Kaxiras the opportunity to be paired up with a leading company in processor design, ARM Ltd., in order to explore under under real world assumptions the possibilities of acceleration using access-execute decoupling. A joint project is expected to greatly benefit both parties as ARM gains from cutting-edge ideas for program execution and Kaxiras gains, from the exposure to real-world constraints.
Popular science description
Energy-efficiency in computer systems continues to be the grand challenge of our day. More and more we find that the answer to this difficult problem is in specializing the hardware to better match the characteristics of the programs. In general terms this is referred to as workload acceleration. The reason why this leads to better energy-efficiency and higher performance is that hardware that is specialized to execute just a specific function sheds all inefficiencies of a general-purpose machine that must be able to execute any conceivable software. The key challenge is making acceleration easily available to general-purpose software. This project will take an approach developed at Uppsala University that enables better matching of programs to specialized hardware and apply it in an industrial setting. The approach, called decoupled access-execute, transforms programs and separates them into two different phases: accessing memory and processing. These two phases can then be independently matched on different accelerators. The purpose of the project is to assess this approach and to find the answer as to whether it is relevant, or possibly needs to be modified with the help of hardware assistance, to suit real-world workloads and constraints. In particular, the project will apply this approach to the processors and accelerators (starting from the vector accelerators but then moving on to more custom accelerators) of ARM, the leader in energy-efficient processors. The implications of the project are expected to affect multiple industries that extensively use ARM processor technology such as Telecommunications, Automotive computer systems, and Internet of Things (IoT) companies, since acceleration is emerging as the leading solution to both the need for better performance and the need for higher energy-efficiency. The experience gained in this project will be invaluable for the University both for future research but equally importantly for the education of a new generation of computer engineers who will be well versed in cutting-edge approaches to system design.