Multi-Perspective Embedded Systems Resilience

Multi-Perspective Embedded Systems Resilience

Workshop on Design for Reliability, 8th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC)


Berlin, Germany


January 21, 2013

Links:Design for Reliabilty Workshop

Andreas Herkersdorf

Andreas Herkersdorf - Technische Universitat Munchen
Chair for Integrated Systems

Multi-Perspective Embedded Systems Resilience

CMOS technology will see continued capacity growth enabling integration of additional dozens of processor cores, Megabytes of memory and diverse function accelerators on a single chip. On the downside of this economically-driven evolution, transistor feature sizes approaching 10s of nm scales are increasingly vulnerable for numerous forms of environmental and manufacturing variabilities. Statistical bit flips in memory, logic and registers will originate from ionizing radiation (soft errors) and transient signal timing violations may be due to temperature or supply voltage fluctuation. Traditional fault tolerance concepts based on information, time and spatial redundancy are challenged and, hence, need augmentation in order to cope with these exposures in a cost efficient and sustainable manner.

This talk advocates tackling System on Chip (SoC) dependability issues with a balanced blend of interrelated countermeasures at different levels of Hardware/Software system abstraction from the technology related device up to the macro-architecture and OS/application software layer. Approaching reliability challenges with skills and methods from the entire SoC design flow, rather than focusing on a single discipline (e.g., design) only, opens additional opportunities for reliability improvements. The complexity of todays SoCs and the variety of fault sources practically prevents exhaustive design verification during design time. We will show how advanced self-adaptation, self-organization techniques, which allow a partial migration of traditional design time tasks into runtime, can beneficially combine reliability, power and performance management.

Speaker's Bio:

Andreas Herkersdorf is a professor in the Department of Electrical Engineering and Information Technology and also adjunct to the Department of Informatics at Technische Universitδt Mόnchen (TUM). He received the Dipl.-Ing. degree from TUM in 1987 and the Dr. degree from ETH Zurich, Switzerland, in 1991, both in electrical engineering. Between 1988 and 2003, he has been in technical and management positions with the IBM Research Laboratory in Rόschlikon, Switzerland. Since 2003, Dr. Herkersdorf is director of the Chair for Integrated Systems at TUM. He is a senior member of the IEEE, member of the DFG (German Research Foundation) Review Board and serves as editor for Springer and Elsevier journals for design automation and communications electronics. His research interests include application-specific multi-processor architectures, IP network processing, Network on Chip, system level SoC modeling and design space exploration methods, and self-adaptive fault-tolerant computing.