A major concern of current and future on-chip systems is the thermal problem i.e. electrical energy is dissipated leading to high chip temperatures. Short term effects may include transient malfunction or even irreversible damage whereas long-term effects may lead to deteriorating functionality (e.g. increased signal travel times) or to irreversible damage due to, for example, electro-migration. The problem worsens with the inception of 3D architectures as the per-surface dissipated thermal energy increases. It is the goal of this proposal to address dependability problems in 3D stacked many-core architectures resulting from thermal effects. We tackle the problem by means of a combined system and architecture level approach. A hierarchical agent-based thermal management system initiates proactive task migration onto cooler processing resources while a communication virtualization layer dynamically adapts and protects connectivity between (migrated) tasks and external I/Os. Various characteristic features are necessary for the feasibility of the proposed approach: i) scalability is necessary in order to be applicable for future many-core systems with hundreds or even thousands of cores; ii) real-time capabilities are needed for meeting embedded application requirements; iii) a virtual communication layer with service guarantees is required for abstracting the underlying physical on-chip interconnect structure (i.e., NoC); iv) run-time adaptability is required to adapt the management and communication subsystems according to the characteristics of the thermal events; v) architecture agnosticism allows the concepts to be deployed on a number of architectures; and vi) the techniques must be inherently robust. In the first 2-year phase of the project an emulation on a prototype is planned whereas the second and the third phase include a refined emulation and a stacked 3D chip prototype, respectively.
This proposal makes contributions to the following columns of the SPP 1500:
- Dependable Hardware Architectures
- Operation, Observation and Adaptation