Gaussian Process Dynamic Programming

Deisenroth, MP; Rasmussen, CE; Peters, J

doi:10.1016/j.neucom.2008.12.019

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

Gaussian Process Dynamic Programming

MPS-Authors

/persons/resource/persons84156

Rasmussen, CE
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons84135

Peters, J
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

https://www.sciencedirect.com/science/article/pii/S0925231209000162
(Publisher version)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Deisenroth, M., Rasmussen, C., & Peters, J. (2009). Gaussian Process Dynamic Programming. Neurocomputing, 72(7-9), 1508-1524. doi:10.1016/j.neucom.2008.12.019.

Cite as: https://hdl.handle.net/11858/00-001M-0000-0013-C589-B

Abstract

Reinforcement learning (RL) and optimal control of systems with contin-
uous states and actions require approximation techniques in most interesting
cases. In this article, we introduce Gaussian process dynamic programming
(GPDP), an approximate value-function based RL algorithm. We consider
both a classic optimal control problem, where problem-specific prior knowl-
edge is available, and a classic RL problem, where only very general priors
can be used. For the classic optimal control problem, GPDP models the
unknown value functions with Gaussian processes and generalizes dynamic
programming to continuous-valued states and actions. For the RL problem,
GPDP starts from a given initial state and explores the state space using
Bayesian active learning. To design a fast learner, available data has to be
used efficiently. Hence, we propose to learn probabilistic models of the a
priori unknown transition dynamics and the value functions on the fly. In
both cases, we successfully apply the resulting continuous-valued controllers
to the under-actuated pendulum swing up and analyze the performances of
the suggested algorithms. It turns out that GPDP uses data very efficiently
and can be applied to problems, where classic dynamic programming would
be cumbersome.