Solving Deep Memory POMDPs with Recurrent Policy Gradients

Wierstra, D; Förster, A; Peters, J; Schmidhuber, J

doi:10.1007/978-3-540-74690-4_71

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Conference Paper

Solving Deep Memory POMDPs with Recurrent Policy Gradients

MPS-Authors

There are no MPG-Authors in the publication available

External Resource

https://link.springer.com/content/pdf/10.1007%2F978-3-540-74690-4_71.pdf
(Publisher version)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Wierstra, D., Förster, A., Peters, J., & Schmidhuber, J. (2007). Solving Deep Memory POMDPs with Recurrent Policy Gradients. In J. Marques de Sá, L. Alexandre, W. Duch, & D. Mandic (Eds.), Artificial Neural Networks – ICANN 2007: 7th International Conference, Porto, Portugal, September 9-13, 2007 (pp. 697-706). Berlin, Germany: Springer.

Cite as: https://hdl.handle.net/11858/00-001M-0000-0013-CBF9-E

Abstract

This paper presents Recurrent Policy Gradients, a modelfree reinforcement learning (RL) method creating limited-memory stochastic
policies for partially observable Markov decision problems (POMDPs)
that require long-term memories of past observations. The approach
involves approximating a policy gradient for a Recurrent Neural Network
(RNN) by backpropagating return-weighted characteristic eligibilities
through time. Using a Long Short-Term Memory architecture, we
are able to outperform other RL methods on two important benchmark
tasks. Furthermore, we show promising results on a complex car driving simulation task.