A reliability analysis approach to assist the design of aggressively scaled reconfigurable architectures

Pereira, Mônica Magalhães

dc.contributor.advisor	Carro, Luigi	pt_BR
dc.contributor.author	Pereira, Mônica Magalhães	pt_BR
dc.date.accessioned	2012-05-22T01:35:11Z	pt_BR
dc.date.issued	2012	pt_BR
dc.identifier.uri	http://hdl.handle.net/10183/49068	pt_BR
dc.description.abstract	As computer systems are built with aggressively scaled and unreliable technologies, some implementations rely on function specialization with reconfigurable computing to increase performance by exploiting parallelism, with possible energy gains. However, the use of reconfigurable devices in general purpose computing also brings extra reliability challenges at the system level. Solutions to cope with that are generally accompanied with the addition of excessive area, performance and power overheads to the overall system. These overheads could be reduced if a more extensive analysis was performed to evaluate the best fault tolerance strategy to balance the tradeoff between reliability and the mentioned aspects. In this context, this work present a comprehensive analysis of architectural design that includes the use of reliability modeling and takes into consideration aspects such as area, performance, and power. The analysis aims to assist the design of reliability-aware reconfigurable architectures by giving some indications about what kind of redundancy should be used in order to increase reliability. In the proposed analysis, we show that communication among functional units is critical to the overall reliability of reconfigurable architectures. Therefore, where most of the reliability investments should be made. Moreover, the analysis also demonstrate that there is a threshold in the amount of redundancy that can be added in order to increase reliability. This limit is determined by the fact that adding redundancy increases area overhead. This overhead influences reliability until overcomes the reliability gains. Therefore, even disregarding area cost, the gains in reliability will cease or even decrease. To provide a more extended evaluation, a fault tolerance approach was proposed to cope with permanent faults. The LOwER-FaT strategy is a mechanism embedded in a run-time reconfiguration mechanism that automatically selects the fault-free resources without adding extra time overhead to the configuration generation mechanism. The fault-tolerant strategy takes advantage of the on-line transparent configuration generation mechanism to transparently avoid faulty functional units and interconnects. Moreover, the strategy does not require the addition of spare resources. All the resources are used to accelerate execution, and only in case of fault, a resource is replaced by a working one, with a performance penalty caused by the reduction in the amount of resources. In spite of that, experimental results showed a mean performance degradation of 14% on overall performance under 20% fault rate. Moreover, reliability results indicated gains of around six orders of magnitude when the fault tolerance strategy was in place.	en
dc.format.mimetype	application/pdf	pt_BR
dc.language.iso	eng	pt_BR
dc.rights	Open Access	en
dc.subject	Tolerancia : Falhas	pt_BR
dc.subject	Reconfigurable architectures	en
dc.subject	Microeletrônica	pt_BR
dc.subject	Fault tolerance	en
dc.subject	Cmos	pt_BR
dc.subject	Reliability analysis	en
dc.subject	Scaling	en
dc.title	A reliability analysis approach to assist the design of aggressively scaled reconfigurable architectures	pt_BR
dc.type	Tese	pt_BR
dc.identifier.nrb	000835126	pt_BR
dc.degree.grantor	Universidade Federal do Rio Grande do Sul	pt_BR
dc.degree.department	Instituto de Informática	pt_BR
dc.degree.program	Programa de Pós-Graduação em Computação	pt_BR
dc.degree.local	Porto Alegre, BR-RS	pt_BR
dc.degree.date	2012	pt_BR
dc.degree.level	doutorado	pt_BR

Nome:: 000835126.pdf
Tamanho:: 3.495Mb
Formato:: PDF
Descrição:: Texto completo (inglês)

Visualizar/abrir

Este item está licenciado na Creative Commons License

Ciências Exatas e da Terra (5098)

Computação (1754)

Mostrar registro simples