Analysis and improvements of multi objective reinforcement learning algorithms based on pareto dominating policies
dc.contributor.advisor | Tavares, Anderson Rocha | pt_BR |
dc.contributor.author | Silva, Giovani da | pt_BR |
dc.date.accessioned | 2024-02-16T05:00:52Z | pt_BR |
dc.date.issued | 2023 | pt_BR |
dc.identifier.uri | http://hdl.handle.net/10183/272023 | pt_BR |
dc.description.abstract | In today’s complex optimization landscape, challenges often transcend the pursuit of a solitary objective, instead requiring the simultaneous consideration of multiple, some times conflicting, goals. This complexity has given rise to the field of Multi-Objective Optimization. In recent years, researchers have begun to integrate these multi-objective approaches into Reinforcement Learning, leading to the emergence of Multi-Objective Reinforcement Learning (MORL). The field is gaining traction, especially due to the ca pabilities of model-free Reinforcement Learning algorithms. Essentially, these model free MORL algorithms strive to balance multiple, often conflicting, objectives without necessitating prior knowledge of the environment. This thesis provides an in-depth anal ysis of model-free MORL algorithms anchored in Pareto Dominating Policies (PDP), specifically focusing on two key algorithms: Pareto Q-Learning (PQL) and Pareto Deep Q-Networks (PDQN), these algorithms were selected for their model-free characteristics and their resemblance to well-known reinforcement learning algorithms like Q-Learning and Deep Q-Networks. This study features implementations from scratch of both the PQL and PDQN algorithms. It evaluates the performance of PQL in the Deep Sea Treasure environment and assesses PDQN in both the Deep Sea Treasure and a simulated urban traffic setting. This research identifies common challenges, such as the generation of non-optimal policies and the difficulties associated with managing large state spaces. Our findings reveal that the application of the PDQN algorithm in real-world scenarios, such as Gym City Flow (ZHANG et al., 2019), has led to no improvements, thereby demonstrating their inefficacy. To address these challenges, this work proposes enhance ments to the PDQN algorithm and introduces a new MORL technique based on Pareto Dominating Actions. Preliminary tests indicate that this innovative approach shows promise in enhancing the effectiveness of MORL algorithms. The primary contributions of this work lie in its examination of the current state of MORL algorithms based on Pareto Dominating Policies: discussing their architecture, their chal lenges and their possible improvements, while also testing their effectiveness in MO sce narios. By doing that, we are trying to shed light on their inherent limitations and chal lenges. In light of these limitations, we propose enhancements to the PDQN algorithm through an innovative approach that holds the potential to establish an effective approach to MORL algorithms in the future. This work serves as both a critical review of existing methodologies and a forward-looking exploration of the future landscape of MORL cen tered on Pareto Optimality. | en |
dc.format.mimetype | application/pdf | pt_BR |
dc.language.iso | eng | pt_BR |
dc.rights | Open Access | en |
dc.subject | Aprendizagem por reforço | pt_BR |
dc.subject | Multi-objective | en |
dc.subject | Inteligência artificial | pt_BR |
dc.subject | Pareto Dominating Policies | en |
dc.subject | Redes | pt_BR |
dc.subject | Pareto Deep Q-networks | en |
dc.title | Analysis and improvements of multi objective reinforcement learning algorithms based on pareto dominating policies | pt_BR |
dc.type | Trabalho de conclusão de graduação | pt_BR |
dc.identifier.nrb | 001195959 | pt_BR |
dc.degree.grantor | Universidade Federal do Rio Grande do Sul | pt_BR |
dc.degree.department | Instituto de Informática | pt_BR |
dc.degree.local | Porto Alegre, BR-RS | pt_BR |
dc.degree.date | 2023 | pt_BR |
dc.degree.graduation | Ciência da Computação: Ênfase em Engenharia da Computação: Bacharelado | pt_BR |
dc.degree.level | graduação | pt_BR |
Este item está licenciado na Creative Commons License
-
TCC Ciência da Computação (1024)