Mostrar registro simples

dc.contributor.advisorBampi, Sergiopt_BR
dc.contributor.authorPaim, Guilherme Pereirapt_BR
dc.date.accessioned2021-05-28T04:27:04Zpt_BR
dc.date.issued2021pt_BR
dc.identifier.urihttp://hdl.handle.net/10183/221703pt_BR
dc.description.abstractSince the end of transistor scaling in 2-D appeared on the horizon, innovative circuit design paradigms have been on the rise to go beyond the well-established and ultraconservative exact computing. Many compute-intensive applications – such as video processing – exhibit an intrinsic error resilience and do not necessarily require perfect accuracy in their numerical operations. Approximate computing (AxC) is emerging as a design alternative to improve the performance and energy-efficiency requirements for many applications by trading its intrinsic error tolerance with algorithm and circuit efficiency. Exact computing also imposes a worst-case timing to the conventional design of hardware accelerators to ensure reliability, leading to an efficiency loss. Conversely, the timing-speculative (TS) hardware design paradigm allows increasing the frequency or decreasing the voltage beyond the limits determined by static timing analysis (STA), thereby narrowing pessimistic safety margins that conventional design methods implement to prevent hardware timing errors. Timing errors should be evaluated by an accurate gate-level simulation, but a significant gap remains: How these timing errors propagate from the underlying hardware all the way up to the entire algorithm behavior, where they just may degrade the performance and quality of service of the application at stake? This thesis tackles this issue by developing and demonstrating a cross-layer framework capable of performing investigations of both AxC (i.e., from approximate arithmetic operators, approximate synthesis, gate-level pruning) and TS hardware design (i.e., from voltage over-scaling, frequency over-clocking, temperature rising, and device aging). The cross-layer framework can simulate both timing errors and logic errors at the gate-level by crossing them dynamically, linking the hardware result with the algorithm-level, and vice versa during the evolution of the application’s runtime. Existing frameworks perform investigations of AxC and TS techniques at circuit-level (i.e., at the output of the accelerator) agnostic to the ultimate impact at the application level (i.e., where the impact is truly manifested), leading to less optimization. Unlike state of the art, the framework proposed offers a holistic approach to assessing the tradeoff of AxC and TS techniques at the application-level. This framework maximizes energy efficiency and performance by identifying the maximum approximation levels at the application level to fulfill the required good enough quality. This thesis evaluates the framework with an 8-way SAD (Sum of Absolute Differences) hardware accelerator operating into an HEVC encoder as a case study. Application-level results showed that the SAD based on the approximate adders achieve savings of up to 45% of energy/operation with an increase of only 1.9% in BD-BR. On the other hand, VOS (Voltage Over-Scaling) applied to the SAD generates savings of up to 16.5% in energy/operation with around 6% of increase in BD-BR. The framework also reveals that the boost of about 6.96% (at 50°) to 17.41% (at 75° with 10- Y aging) in the maximum clock frequency achieved with TS hardware design is totally lost by the processing overhead from 8.06% to 46.96% when choosing an unreliable algorithm to the blocking match algorithm (BMA). We also show that the overhead can be avoided by adopting a reliable BMA. This thesis also shows approximate DTT (Discrete Tchebichef Transform) hardware proposals by exploring a transform matrix approximation, truncation and pruning. The results show that the approximate DTT hardware proposal increases the maximum frequency up to 64%, minimizes the circuit area in up to 43.6%, and saves up to 65.4% in power dissipation. The DTT proposal mapped for FPGA shows an increase of up to 58.9% on the maximum frequency and savings of about 28.7% and 32.2% on slices and dynamic power, respectively compared with stat.en
dc.format.mimetypeapplication/pdfpt_BR
dc.language.isoengpt_BR
dc.rightsOpen Accessen
dc.subjectComputação aproximativapt_BR
dc.subjectTiming speculativeen
dc.subjectProcessamento de vídeospt_BR
dc.subjectHigh performanceen
dc.subjectEnergy efficienten
dc.subjectFPGApt_BR
dc.subjectVoltage Over scalingen
dc.subjectMicroeletrônicapt_BR
dc.subjectTemperature risingen
dc.subjectAceleradores de hardwarept_BR
dc.subjectDevice agingen
dc.titleApproximate and timing-speculative hardware design for high-performance and energy-efficient video processingpt_BR
dc.typeTesept_BR
dc.contributor.advisor-coCosta, Eduardo Antonio Cesar dapt_BR
dc.identifier.nrb001125900pt_BR
dc.degree.grantorUniversidade Federal do Rio Grande do Sulpt_BR
dc.degree.departmentInstituto de Informáticapt_BR
dc.degree.programPrograma de Pós-Graduação em Microeletrônicapt_BR
dc.degree.localPorto Alegre, BR-RSpt_BR
dc.degree.date2021pt_BR
dc.degree.leveldoutoradopt_BR


Thumbnail
   

Este item está licenciado na Creative Commons License

Mostrar registro simples