Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity

Anquise, Candy Alexandra Huanca

dc.contributor.advisor	Bazzan, Ana Lucia Cetertich	pt_BR
dc.contributor.author	Anquise, Candy Alexandra Huanca	pt_BR
dc.date.accessioned	2021-11-17T04:24:22Z	pt_BR
dc.date.issued	2021	pt_BR
dc.identifier.uri	http://hdl.handle.net/10183/231836	pt_BR
dc.description.abstract	Multi-objective decision-making entails planning based on a model to find the best policy to solve such problems. If this model is unknown, learning through interaction provides the means to behave in the environment. Multi-objective decision-making in a multi-agent system poses many unsolved challenges. Among them, multiple objectives and non-stationarity, caused by simultaneous learners, have been addressed separately so far. In this work, algorithms that address these issues by taking strengths from different methods are proposed and applied to a route choice scenario formulated as a multi-armed bandit problem. Therefore, the focus is on action selection. In the route choice problem, drivers must select a route while aiming to minimize both their travel time and toll. The proposed algorithms take and combine important aspects from works that tackle only one issue: non-stationarity or multiple objectives, making possible to handle these problems together. The methods used from these works are a set of Upper-Confidence Bound (UCB) algorithms and the Pareto Q-learning (PQL) algorithm. The UCB-based algorithms are Pareto UCB1 (PUCB1), the discounted UCB (DUCB) and sliding window UCB (SWUCB). PUCB1 deals with multiple objectives, while DUCB and SWUCB address non-stationarity in different ways. PUCB1 was extended to include characteristics from DUCB and SWUCB. In the case of PQL, as it is a state-based method that focuses on more than one objective, a modification was made to tackle a problem focused on action selection. Results obtained from a comparison in a route choice scenario show that the proposed algorithms deal with non-stationarity and multiple objectives, while using a discount factor is the best approach. Advantages, limitations and differences of these algorithms are discussed.	en
dc.format.mimetype	application/pdf	pt_BR
dc.language.iso	eng	pt_BR
dc.rights	Open Access	en
dc.subject	Multi-objective	en
dc.subject	Sistemas multiagentes	pt_BR
dc.subject	Aprendizagem	pt_BR
dc.subject	Decision-making	en
dc.subject	Multi-objective route choice	en
dc.subject	Reinforcement learning	en
dc.title	Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity	pt_BR
dc.type	Dissertação	pt_BR
dc.identifier.nrb	001133526	pt_BR
dc.degree.grantor	Universidade Federal do Rio Grande do Sul	pt_BR
dc.degree.department	Instituto de Informática	pt_BR
dc.degree.program	Programa de Pós-Graduação em Computação	pt_BR
dc.degree.local	Porto Alegre, BR-RS	pt_BR
dc.degree.date	2021	pt_BR
dc.degree.level	mestrado	pt_BR

Ficheros en el ítem

Nombre:: 001133526.pdf
Tamaño:: 1.496Mb
Formato:: PDF
Descripción:: Texto completo (inglês)

Ver

Este ítem está licenciado en la Creative Commons License

Ciencias Exactas y Naturales (5197)

Computación (1787)

Mostrar el registro sencillo del ítem