hide
Free keywords:
-
Abstract:
This paper presents Recurrent Policy Gradients, a modelfree
reinforcement learning (RL) method creating limited-memory stochastic
policies for partially observable Markov decision problems (POMDPs)
that require long-term memories of past observations. The approach
involves approximating a policy gradient for a Recurrent Neural Network
(RNN) by backpropagating return-weighted characteristic eligibilities
through time. Using a Long Short-Term Memory architecture, we
are able to outperform other RL methods on two important benchmark
tasks. Furthermore, we show promising results on a complex car driving simulation task.