Jacobi, Otavio Flores (2021) [Trabalho de conclusão de graduação]
Training Reinforcement Learning agents that learn both the value function and the envi ronment model can be a very time consuming method, one of the main reasons for that is that these agents learn by actions one step at ...