Tutorial Outline:Embedded systems are increasingly becoming susceptible to a host of errors that affect both functional and non-functional aspects of their behavior as we aggressively scale down to nano-technology nodes. This poses significant challenges for the design of dependable embedded systems. The tutorial covers two important drivers - temperature and process variation - that degrade the dependability of embedded systems, and discusses possible mechanisms to ameliorate their negative impacts.
In the first part of the tutorial, we cover basics of on-chip reliability from a thermal point of view i.e., how heat/temperature is related to reliability. We introduce the concepts of NBTI and Electromigration etc., and make the case that many reliability-related mechanisms are accelerated by temperature. Since temperature is important, we also discuss challenges posed by, and solutions for how to estimate and measure temperature. We conclude the first part of the tutorial by giving hints on how to control on-chip temperature through mechanisms such as load balancing.
In the second part of the tutorial, we review causes of process variability, and how they are manifested as errors in the design. We present classical approaches to hide such variability, including various forms of guard banding, overdesign, and redundancy. We then describe a number of strategies at successively higher levels of abstraction - covering the circuit, microarchitecture, compiler, operating systems and software applications - to monitor, detect, adapt to, and exploit the exposed variability. Finally we describe new approaches that attempt to eliminate guard banding at the software and architectural levels, and which opportunistically adjust to variability, and proactively conform to a deliberately underdesigned hardware with relaxed design and manufacturing constraints.