Boosting big data streaming applications in clouds with burstFlow

Souza Junior, Paulo Ricardo Rodrigues de; Matteussi, Kassiano José; Veith, Alexandre da Silva; Zanchetta, Breno Fanchiotti; Leithardt, Valderi Reis Quietinho; Murciego, Álvaro Lozano; Freitas, Edison Pignaton de; Anjos, Julio Cesar Santos dos; Geyer, Claudio Fernando Resin

dc.contributor.author	Souza Junior, Paulo Ricardo Rodrigues de	pt_BR
dc.contributor.author	Matteussi, Kassiano José	pt_BR
dc.contributor.author	Veith, Alexandre da Silva	pt_BR
dc.contributor.author	Zanchetta, Breno Fanchiotti	pt_BR
dc.contributor.author	Leithardt, Valderi Reis Quietinho	pt_BR
dc.contributor.author	Murciego, Álvaro Lozano	pt_BR
dc.contributor.author	Freitas, Edison Pignaton de	pt_BR
dc.contributor.author	Anjos, Julio Cesar Santos dos	pt_BR
dc.contributor.author	Geyer, Claudio Fernando Resin	pt_BR
dc.date.accessioned	2023-04-07T03:26:39Z	pt_BR
dc.date.issued	2020	pt_BR
dc.identifier.issn	2169-3536	pt_BR
dc.identifier.uri	http://hdl.handle.net/10183/256835	pt_BR
dc.description.abstract	The rapid growth of stream applications in financial markets, health care, education, social media, and sensor networks represents a remarkable milestone for data processing and analytic in recent years, leading to new challenges to handle Big Data in real-time. Traditionally, a single cloud infrastructure often holds the deployment of Stream Processing applications because it has extensive and adaptative virtual computing resources. Hence, data sources send data from distant and different locations of the cloud infrastructure, increasing the application latency. The cloud infrastructure may be geographically distributed and it requires to run a set of frameworks to handle communication. These frameworks often comprise a Message Queue System and a Stream Processing Framework. The frameworks explore Multi-Cloud deploying each service in a different cloud and communication via high latency network links. This creates challenges to meet real-time application requirements because the data streams have different and unpredictable latencies forcing cloud providers' communication systems to adjust to the environment changes continually. Previous works explore static micro-batch demonstrating its potential to overcome communication issues. This paper introduces BurstFlow, a tool for enhancing communication across data sources located at the edges of the Internet and Big Data Stream Processing applications located in cloud infrastructures. BurstFlow introduces a strategy for adjusting the micro-batch sizes dynamically according to the time required for communication and computation. BurstFlow also presents an adaptive data partition policy for distributing incoming streams across available machines by considering memory and CPU capacities. The experiments use a real-world multi-cloud deployment showing that BurstFlow can reduce the execution time up to 77% when compared to the state-of-the-art solutions, improving CPU efficiency by up to 49%.	en
dc.format.mimetype	application/pdf	pt_BR
dc.language.iso	eng	pt_BR
dc.relation.ispartof	IEEE Access. [Piscataway, NJ]. Vol. 8 (2020), p. 219124 - 219136	pt_BR
dc.rights	Open Access	en
dc.subject	Processamento de dados	pt_BR
dc.subject	Stream processing applications	en
dc.subject	Big data	pt_BR
dc.subject	Multi cloud	en
dc.subject	Micro-batches	en
dc.subject	Computação em nuvem	pt_BR
dc.subject	Data partition	en
dc.title	Boosting big data streaming applications in clouds with burstFlow	pt_BR
dc.type	Artigo de periódico	pt_BR
dc.identifier.nrb	001135243	pt_BR
dc.type.origin	Estrangeiro	pt_BR

Ficheros en el ítem

Nombre:: 001135243.pdf
Tamaño:: 1.154Mb
Formato:: PDF
Descripción:: Texto completo (inglês)

Ver

Este ítem está licenciado en la Creative Commons License

Artículos de Periódicos (39320)

Ciencias Exactas y Naturales (5974)

Mostrar el registro sencillo del ítem