de.mpg.escidoc.pubman.appbase.FacesBean
English
 
Help Guide Disclaimer Contact us Login
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

Solving Deep Memory POMDPs with Recurrent Policy Gradients

MPS-Authors
http://pubman.mpdl.mpg.de/cone/persons/resource/persons84135

Förster A, Peters,  J
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Dept. Empirical Inference, Max Planck Institute for Intelligent Systems, Max Planck Society;

Locator
There are no locators available
Fulltext (public)
There are no public fulltexts available
Supplementary Material (public)
There is no public supplementary material available
Citation

Wierstra, D., Förster A, Peters, J., & Schmidhuber, J. (2007). Solving Deep Memory POMDPs with Recurrent Policy Gradients. Artificial Neural Networks: ICANN 2007, 697-706.


Cite as: http://hdl.handle.net/11858/00-001M-0000-0013-CBF9-E
Abstract
This paper presents Recurrent Policy Gradients, a modelfree reinforcement learning (RL) method creating limited-memory stochastic policies for partially observable Markov decision problems (POMDPs) that require long-term memories of past observations. The approach involves approximating a policy gradient for a Recurrent Neural Network (RNN) by backpropagating return-weighted characteristic eligibilities through time. Using a “Long Short-Term Memory” architecture, we are able to outperform other RL methods on two important benchmark tasks. Furthermore, we show promising results on a complex car driving simulation task.