Approximate Dynamic Programming with Gaussian Processes

Deisenroth, MP; Peters, J; Rasmussen, CE

doi:10.1109/ACC.2008.4587201

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Conference Paper

Approximate Dynamic Programming with Gaussian Processes

MPS-Authors

/persons/resource/persons84135

Peters, J
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4587201
(Publisher version)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Deisenroth, M., Peters, J., & Rasmussen, C. (2008). Approximate Dynamic Programming with Gaussian Processes. In 2008 American Control Conference (pp. 4480-4485). Piscataway, NJ, USA: IEEE Service Center.

Cite as: https://hdl.handle.net/11858/00-001M-0000-0013-C8EF-B

Abstract

In general, it is difficult to determine an optimal closed-loop policy in nonlinear control problems with continuous-valued state and control domains. Hence, approximations
are often inevitable. The standard method of discretizing
states and controls suffers from the curse of dimensionality
and strongly depends on the chosen temporal sampling rate. In
this paper, we introduce Gaussian process dynamic programming
(GPDP) and determine an approximate globally optimal
closed-loop policy. In GPDP, value functions in the Bellman
recursion of the dynamic programming algorithm are modeled
using Gaussian processes. GPDP returns an optimal statefeedback
for a finite set of states. Based on these outcomes, we
learn a possibly discontinuous closed-loop policy on the entire
state space by switching between two independently trained
Gaussian processes. A binary classifier selects one Gaussian
process to predict the optimal control signal. We show that
GPDP is able to yield an almost optimal solution to an LQ
problem using few sample points. Moreover, we successfully
apply GPDP to the underpowered pendulum swing up, a
complex nonlinear control problem.