Analysis and improvements of multi objective reinforcement learning algorithms based on pareto dominating policies

Silva, Giovani da

dc.contributor.advisor	Tavares, Anderson Rocha	pt_BR
dc.contributor.author	Silva, Giovani da	pt_BR
dc.date.accessioned	2024-02-16T05:00:52Z	pt_BR
dc.date.issued	2023	pt_BR
dc.identifier.uri	http://hdl.handle.net/10183/272023	pt_BR
dc.description.abstract	In today’s complex optimization landscape, challenges often transcend the pursuit of a solitary objective, instead requiring the simultaneous consideration of multiple, some times conflicting, goals. This complexity has given rise to the field of Multi-Objective Optimization. In recent years, researchers have begun to integrate these multi-objective approaches into Reinforcement Learning, leading to the emergence of Multi-Objective Reinforcement Learning (MORL). The field is gaining traction, especially due to the ca pabilities of model-free Reinforcement Learning algorithms. Essentially, these model free MORL algorithms strive to balance multiple, often conflicting, objectives without necessitating prior knowledge of the environment. This thesis provides an in-depth anal ysis of model-free MORL algorithms anchored in Pareto Dominating Policies (PDP), specifically focusing on two key algorithms: Pareto Q-Learning (PQL) and Pareto Deep Q-Networks (PDQN), these algorithms were selected for their model-free characteristics and their resemblance to well-known reinforcement learning algorithms like Q-Learning and Deep Q-Networks. This study features implementations from scratch of both the PQL and PDQN algorithms. It evaluates the performance of PQL in the Deep Sea Treasure environment and assesses PDQN in both the Deep Sea Treasure and a simulated urban traffic setting. This research identifies common challenges, such as the generation of non-optimal policies and the difficulties associated with managing large state spaces. Our findings reveal that the application of the PDQN algorithm in real-world scenarios, such as Gym City Flow (ZHANG et al., 2019), has led to no improvements, thereby demonstrating their inefficacy. To address these challenges, this work proposes enhance ments to the PDQN algorithm and introduces a new MORL technique based on Pareto Dominating Actions. Preliminary tests indicate that this innovative approach shows promise in enhancing the effectiveness of MORL algorithms. The primary contributions of this work lie in its examination of the current state of MORL algorithms based on Pareto Dominating Policies: discussing their architecture, their chal lenges and their possible improvements, while also testing their effectiveness in MO sce narios. By doing that, we are trying to shed light on their inherent limitations and chal lenges. In light of these limitations, we propose enhancements to the PDQN algorithm through an innovative approach that holds the potential to establish an effective approach to MORL algorithms in the future. This work serves as both a critical review of existing methodologies and a forward-looking exploration of the future landscape of MORL cen tered on Pareto Optimality.	en
dc.format.mimetype	application/pdf	pt_BR
dc.language.iso	eng	pt_BR
dc.rights	Open Access	en
dc.subject	Aprendizagem por reforço	pt_BR
dc.subject	Multi-objective	en
dc.subject	Inteligência artificial	pt_BR
dc.subject	Pareto Dominating Policies	en
dc.subject	Redes	pt_BR
dc.subject	Pareto Deep Q-networks	en
dc.title	Analysis and improvements of multi objective reinforcement learning algorithms based on pareto dominating policies	pt_BR
dc.type	Trabalho de conclusão de graduação	pt_BR
dc.identifier.nrb	001195959	pt_BR
dc.degree.grantor	Universidade Federal do Rio Grande do Sul	pt_BR
dc.degree.department	Instituto de Informática	pt_BR
dc.degree.local	Porto Alegre, BR-RS	pt_BR
dc.degree.date	2023	pt_BR
dc.degree.graduation	Ciência da Computação: Ênfase em Engenharia da Computação: Bacharelado	pt_BR
dc.degree.level	graduação	pt_BR

Nome:: 001195959.pdf
Tamanho:: 1.877Mb
Formato:: PDF
Descrição:: Texto completo (inglês)

Visualizar/abrir

Este item está licenciado na Creative Commons License

Trabalhos de Conclusão de Curso de Graduação (37546)

TCC Ciência da Computação (1024)

Mostrar registro simples