Model-Based Reinforcement Learning with Continuous States and Actions

Deisenroth, MP; Rasmussen, CE; Peters, J

Lokale TagsFreigabegeschichteDetailsÜbersicht

Model-Based Reinforcement Learning with Continuous States and Actions

Deisenroth, M., Rasmussen, C., & Peters, J. (2008). Model-Based Reinforcement Learning with Continuous States and Actions. In M. Verleysen (Ed.), Advances in computational intelligence and learning: 16th European Symposium on Artificial Neural Networks (pp. 19-24). Evere, Belgium: d-side.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/11858/00-001M-0000-0013-C9E1-0 Versions-Permalink: https://hdl.handle.net/21.11116/0000-0003-7FD9-B

Genre: Konferenzbeitrag

Dateien

einblenden: Dateien

ausblenden: Dateien

:

ESANN-2008-Deisenroth.pdf (beliebiger Volltext), 300KB

Öffnen Speichern

Datei-Permalink:
https://hdl.handle.net/21.11116/0000-0003-7FDA-A

Name:
ESANN-2008-Deisenroth.pdf

Beschreibung:
-

OA-Status:

Sichtbarkeit:
Öffentlich

MIME-Typ / Prüfsumme:
application/pdf / [MD5]

Technische Metadaten:

Öffnen

Copyright Datum:
-

Copyright Info:
-

Lizenz:
-

Externe Referenzen

einblenden:

ausblenden:

externe Referenz:
https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2008-8.pdf (Verlagsversion) Open Access Status unbekannt

Beschreibung:
-

OA-Status:

Urheber

einblenden:

ausblenden:

Urheber:
Deisenroth, MP, Autor
Rasmussen, CE^{1, 2}, Autor
Peters, J^{1, 2}, Autor

Affiliations:
1Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_1497795
2Max Planck Institute for Biological Cybernetics, Max Planck Society, Spemannstrasse 38, 72076 Tübingen, DE, ou_1497794

Inhalt

einblenden:

ausblenden:

Schlagwörter: -

Zusammenfassung: Finding an optimal policy in a reinforcement learning (RL) framework with continuous state and action spaces is challenging. Approximate solutions
are often inevitable. GPDP is an approximate dynamic programming algorithm
based on Gaussian process (GP) models for the value functions. In
this paper, we extend GPDP to the case of unknown transition dynamics.
After building a GP model for the transition dynamics, we apply GPDP
to this model and determine a continuous-valued policy in the entire state
space. We apply the resulting controller to the underpowered pendulum swing up. Moreover, we compare our results on this RL task to a nearly optimal discrete DP solution in a fully known environment.

Details

einblenden:

ausblenden:

Sprache(n):

Datum: Erschienen: 2008-04

Publikationsstatus: Erschienen

Seiten: -

Ort, Verlag, Ausgabe: -

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: URI: http://www.dice.ucl.ac.be/esann/index.php?pg=pgm
BibTex Citekey: 4977

Art des Abschluß: -

Veranstaltung

einblenden:

ausblenden:

Titel: 16th European Symposium on Artificial Neural Networks (ESANN 2008)

Veranstaltungsort: Bruges, Belgium

Start-/Enddatum: 2008-04-23 - 2008-04-25

ausblenden:

Titel: Advances in computational intelligence and learning: 16th European Symposium on Artificial Neural Networks

Genre der Quelle: Konferenzband

Urheber:
Verleysen, M, Herausgeber

Affiliations:
-

Ort, Verlag, Ausgabe: Evere, Belgium : d-side

Seiten: - Band / Heft: - Artikelnummer: - Start- / Endseite: 19 - 24 Identifikator: -

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle 1