de.mpg.escidoc.pubman.appbase.FacesBean
Deutsch
 
Hilfe Wegweiser Datenschutzhinweis Impressum Kontakt
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT
  Gaussian Process Dynamic Programming

Deisenroth, M., Rasmussen, C., & Peters, J. (2009). Gaussian Process Dynamic Programming. Neurocomputing, 72(7-9), 1508-1524. doi:10.1016/j.neucom.2008.12.019.

Item is

Basisdaten

einblenden: ausblenden:
Datensatz-Permalink: http://hdl.handle.net/11858/00-001M-0000-0013-C589-B Versions-Permalink: http://hdl.handle.net/11858/00-001M-0000-0013-C58A-9
Genre: Zeitschriftenartikel

Externe Referenzen

einblenden:

Urheber

einblenden:
ausblenden:
 Urheber:
Deisenroth, MP1, Autor              
Rasmussen, CE1, Autor              
Peters, J1, 2, Autor              
Affiliations:
1Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society, escidoc:1497795              
2Dept. Empirical Inference, Max Planck Institute for Intelligent Systems, Max Planck Society, escidoc:1497647              

Inhalt

einblenden:
ausblenden:
Schlagwörter: -
 Zusammenfassung: Reinforcement learning (RL) and optimal control of systems with contin- uous states and actions require approximation techniques in most interesting cases. In this article, we introduce Gaussian process dynamic programming (GPDP), an approximate value-function based RL algorithm. We consider both a classic optimal control problem, where problem-specific prior knowl- edge is available, and a classic RL problem, where only very general priors can be used. For the classic optimal control problem, GPDP models the unknown value functions with Gaussian processes and generalizes dynamic programming to continuous-valued states and actions. For the RL problem, GPDP starts from a given initial state and explores the state space using Bayesian active learning. To design a fast learner, available data has to be used efficiently. Hence, we propose to learn probabilistic models of the a priori unknown transition dynamics and the value functions on the fly. In both cases, we successfully apply the resulting continuous-valued controllers to the under-actuated pendulum swing up and analyze the performances of the suggested algorithms. It turns out that GPDP uses data very efficiently and can be applied to problems, where classic dynamic programming would be cumbersome.

Details

einblenden:
ausblenden:
Sprache(n):
 Datum: 2009-03
 Publikationsstatus: Im Druck publiziert
 Seiten: -
 Ort, Verlag, Ausgabe: -
 Inhaltsverzeichnis: -
 Art der Begutachtung: -
 Art des Abschluß: -

Veranstaltung

einblenden:

Entscheidung

einblenden:

Projektinformation

einblenden:

Quelle 1

einblenden:
ausblenden:
Titel: Neurocomputing
Genre der Quelle: Zeitschrift
 Urheber:
Affiliations:
Ort, Verlag, Ausgabe: -
Seiten: - Band / Heft: 72 (7-9) Artikelnummer: - Start- / Endseite: 1508 - 1524 Identifikator: -