Model-Based Reinforcement Learning with Continuous States and Actions

Deisenroth, MP; Rasmussen, CE; Peters, J

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Conference Paper

Model-Based Reinforcement Learning with Continuous States and Actions

MPS-Authors

/persons/resource/persons84156

Rasmussen, CE
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons84135

Peters, J
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2008-8.pdf
(Publisher version)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

ESANN-2008-Deisenroth.pdf
(Any fulltext), 300KB

Supplementary Material (public)

There is no public supplementary material available

Citation

Deisenroth, M., Rasmussen, C., & Peters, J. (2008). Model-Based Reinforcement Learning with Continuous States and Actions. In M. Verleysen (Ed.), Advances in computational intelligence and learning: 16th European Symposium on Artificial Neural Networks (pp. 19-24). Evere, Belgium: d-side.

Cite as: https://hdl.handle.net/11858/00-001M-0000-0013-C9E1-0

Abstract

Finding an optimal policy in a reinforcement learning (RL) framework with continuous state and action spaces is challenging. Approximate solutions
are often inevitable. GPDP is an approximate dynamic programming algorithm
based on Gaussian process (GP) models for the value functions. In
this paper, we extend GPDP to the case of unknown transition dynamics.
After building a GP model for the transition dynamics, we apply GPDP
to this model and determine a continuous-valued policy in the entire state
space. We apply the resulting controller to the underpowered pendulum swing up. Moreover, we compare our results on this RL task to a nearly optimal discrete DP solution in a fully known environment.