Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets

Dorn, Márcio; Grisci, Bruno Iochins; Narloch, Pedro Henrique; Feltes, Bruno César; Ávila, Eduardo Muller; Kahmann, Alessandro; Alho, Clarice Sampaio

dc.contributor.author	Dorn, Márcio	pt_BR
dc.contributor.author	Grisci, Bruno Iochins	pt_BR
dc.contributor.author	Narloch, Pedro Henrique	pt_BR
dc.contributor.author	Feltes, Bruno César	pt_BR
dc.contributor.author	Ávila, Eduardo Muller	pt_BR
dc.contributor.author	Kahmann, Alessandro	pt_BR
dc.contributor.author	Alho, Clarice Sampaio	pt_BR
dc.date.accessioned	2023-04-07T03:26:40Z	pt_BR
dc.date.issued	2021	pt_BR
dc.identifier.issn	2376-5992	pt_BR
dc.identifier.uri	http://hdl.handle.net/10183/256836	pt_BR
dc.description.abstract	The Coronavirus pandemic caused by the novel SARS-CoV-2 has significantly impacted human health and the economy, especially in countries struggling with financial resources for medical testing and treatment, such as Brazil’s case, the third most affected country by the pandemic. In this scenario, machine learning techniques have been heavily employed to analyze different types of medical data, and aid decision making, offering a low-cost alternative. Due to the urgency to fight the pandemic, a massive amount of works are applying machine learning approaches to clinical data, including complete blood count (CBC) tests, which are among the most widely available medical tests. In this work, we review the most employed machine learning classifiers for CBC data, together with popular sampling methods to deal with the class imbalance. Additionally, we describe and critically analyze three publicly available Brazilian COVID-19 CBC datasets and evaluate the performance of eight classifiers and five sampling techniques on the selected datasets. Our work provides a panorama of which classifier and sampling methods provide the best results for different relevant metrics and discuss their impact on future analyses. The metrics and algorithms are introduced in a way to aid newcomers to the field. Finally, the panorama discussed here can significantly benefit the comparison of the results of new ML algorithms.	en
dc.format.mimetype	application/pdf	pt_BR
dc.language.iso	eng	pt_BR
dc.relation.ispartof	PeerJ Computer Science. New York. Vol. 7 (set. 2021), p. 670-704	pt_BR
dc.rights	Open Access	en
dc.subject	Aprendizado de máquina	pt_BR
dc.subject	Machine learning	en
dc.subject	Mineração de dados	pt_BR
dc.subject	Data mining	en
dc.subject	Imbalanced datasets	en
dc.subject	COVID-19	pt_BR
dc.subject	Covid, Hemogram	en
dc.title	Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets	pt_BR
dc.type	Artigo de periódico	pt_BR
dc.identifier.nrb	001138423	pt_BR
dc.type.origin	Estrangeiro	pt_BR

Ficheros en el ítem

Nombre:: 001138423.pdf
Tamaño:: 12.47Mb
Formato:: PDF
Descripción:: Texto completo (inglês)

Ver

Este ítem está licenciado en la Creative Commons License

Artículos de Periódicos (40917)

Ciencias Biologicas (3218)
Artículos de Periódicos (40917)

Ciencias Exactas y Naturales (6197)

Mostrar el registro sencillo del ítem