Uncovering discrimination generated by different machine learning methods using data visualization

Machine learning (ML) methods have become common in many applications that affect our daily lives. However, decisions based on their results are not always carefully monitored, creating various problems such as implicit discrimination. Discrimination or biased results are frequent, and although some works focus on this problem, few use visualizations to convey information about such aspects. This work presents Find Discrimination, a visual analysis tool designed to help machine learning experts study the behavior and results of different machine learning models regarding discrimination. The main contribution of this work is a novel combination of well-known visualizations such as the Parallel Sets, Clustered Bar Chart, and Nested Donut Chart, which generate efficient, simple, and intuitive visual representations, supporting a comprehensible analysis according to the problem and the dataset. Each visualization provides fundamental insights for such an analysis. The Parallel Sets technique helps to distinguish subsets of data with identical characteristics, the Clustered Bar Chart allows the visualization of the performance of the ML models used and also facilitates their comparison, while the Nested Donut Chart allows the comparison of subsets that have the same characteristics but were classified differently by the ML models. We carried out a user study with 16 participants using two publicly available datasets to assess the comprehensibility of the FindDiscrimination tool and to evaluate the efficiency and user satisfaction in using the tool. A questionnaire guided the subjects by proposing tasks and asking questions about the datasets being analyzed. In the end, the questionnaire contained the questions to measure the System Usability Scale. Results suggested that our tool has high comprehensibility, as most participants answered the questions we designed to assess that aspect correctly. Moreover, SUS scores indicated high usability. ...

Institución

Universidade Federal do Rio Grande do Sul. Instituto de Informática. Programa de Pós-Graduação em Computação.

Colecciones

Ciencias Exactas y Naturales (5121)

Computación (1763)

Otras opciones

Mostrar todos los metadatos

Estatísticas

Este ítem está licenciado en la Creative Commons License