A study on offensive video detection

Web users around the world produce and publish high volumes of data of various types, such as text, images, and videos. To keep a friendly and respectful environment, the platforms in which this content is published usually restrain users from publishing offensive content and rely on moderators to filter the posts. However, this method is insufficient due to the high volume of publications. The identification of offensive material can be automatically performed using machine learning, but it needs an annotated dataset. Although there are datasets for offensive text detection available, there are no such datasets for videos. Also, most of the published datasets process English data, leaving Portuguese and other languages underrepresented. In this work, we investigate the problem of offensive video detection. We assemble, describe, and publish a dataset of videos in Portuguese. Also, we run experiments using popular machine learning classifiers used in offensive language detection and report our findings, alongside multiple evaluation metrics. In the results, we found that word embedding provided better results when used with Deep Learning classifiers, but n-gram performed better than word embedding for Classic algorithms. Random Forest and Naive Bayes classifiers presented the best performance across most of the features when compared to the other Classic algorithms. The W-CNN architecture employed in our study presented the best results for most of the feature sets using Deep Learning algorithms. For Transfer Learning models, BERT was the best classifier for most of the feature sets. Also, for the ensemble experiments, Naive Bayes, Random Forest, M-CNN, and M-LSTM achieved the best results for the experiments with all features and the ones using feature ablation. Using ensemble improved the results for some categories of algorithms and feature representation. Also, feature ablation experiments helped to identify the contribution of each feature in the ensemble results, improving the results in some cases. Overall, Deep Learning algorithms scored the best results, followed by Classic and Transfer Learning algorithms. ...

Abstract in Portuguese (Brasil)

Usuários da Web em todo o mundo produzem e publicam grandes volumes de dados de vários tipos, como texto, imagens e vídeos. Para manter um ambiente amigável e respeitoso, as plataformas nas quais esse conteúdo é publicado geralmente impedem os usuários de publicar conteúdo ofensivo e contam com moderadores para filtrar as postagens. No entanto, esse método é insuficiente devido ao alto volume de publicações. A identificação de conteúdo ofensivo pode ser realizada automaticamente usando aprendizado de máquina, mas precisa de um conjunto de dados anotado. Embora existam conjuntos de dados disponíveis para detecção de texto ofensivo, não existem conjuntos de dados para vídeos. Além disso, a maioria dos conjuntos de dados publicados processa dados em inglês, deixando português e outras linguagens com pouca representatividade. Neste trabalho, investigamos o problema da detecção de vídeo ofensivo. Nós montamos, descrevemos e publicamos um conjunto de dados de vídeos em português. Além disso, realizamos experimentos usando classificadores populares de aprendizado de máquina usados na detecção de linguagem ofensiva e relatamos nossas descobertas, juntamente com várias métricas de avaliação. Nos resultados, descobrimos que word embedding forneceram resultados melhores quando utilizado com Deep Learning, mas n-gram foi melhor do que word embedding para algoritmos Clássicos. Os classificadores Random Forest e Naive Bayes apresentaram o melhor desempenho na maioria dos atributos quando comparados aos outros classificadores Clássicos. A arquirtetura W-CNN utilizada no nosso estudo apresentou os melhores resultados para a maioria dos conjuntos de atributos utilizando Deep Learning. Para modelos de Transfer Learning, BERT foi o melhor classificador para a maioria dos conjuntos de atributos. Além disso, para os experimentos com ensemble, Naive Bayes, Random Forest, M-CNN and M-LSTM conseguiram os melhores resultados para experimentos com todos os atributos e aqueles utilizando remoção de atributos. Utilizar ensemble melhorou os resultados de alguns grupos de algoritmos e representações de atributos. Adicionalmente, experimentos de remoção de atributos ajudaram a identificar a contribuição de cada atributo nos resultados de ensembles, melhorando os resultados em alguns casos. Em geral, algoritmos de Deep Learning conseguiram os melhores resultados, seguidos por algoritmos Clássicos e de Transfer Learning. ...

Institution

Universidade Federal do Rio Grande do Sul. Instituto de Informática. Programa de Pós-Graduação em Computação.

Collections

Exact and Earth Sciences (5143)

Computation (1766)

Other options

Show all item metadata

Statistics

This item is licensed under a Creative Commons License