Hierarchical Relative Entropy Policy Search

Daniel, C; Neumann, G; Peters, J

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Konferenzbeitrag

Hierarchical Relative Entropy Policy Search

MPG-Autoren

/persons/resource/persons84135

Peters, J
Dept. Empirical Inference, Max Planck Institute for Intelligent Systems, Max Planck Society;

Externe Ressourcen

http://proceedings.mlr.press/v22/daniel12/daniel12.pdf
(Verlagsversion)

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Es sind keine frei zugänglichen Volltexte in PuRe verfügbar

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Daniel, C., Neumann, G., & Peters, J. (2012). Hierarchical Relative Entropy Policy Search. In N. Lawrence, & M. Girolami (Eds.), Artificial Intelligence and Statistics, 21-23 April 2012, La Palma, Canary Islands (pp. 273-281). Madison, WI, USA: International Machine Learning Society.

Zitierlink: https://hdl.handle.net/11858/00-001M-0000-0013-B7E8-8

Zusammenfassung

Many real-world problems are inherently hi- erarchically structured. The use of this struc- ture in an agent's policy may well be the key to improved scalability and higher per- formance. However, such hierarchical struc- tures cannot be exploited by current policy search algorithms. We will concentrate on a basic, but highly relevant hierarchy - the 'mixed option' policy. Here, a gating network first decides which of the options to execute and, subsequently, the option-policy deter- mines the action. In this paper, we reformulate learning a hi- erarchical policy as a latent variable estima- tion problem and subsequently extend the Relative Entropy Policy Search (REPS) to the latent variable case. We show that our Hierarchical REPS can learn versatile solu- tions while also showing an increased perfor- mance in terms of learning speed and quality of the found policy in comparison to the non- hierarchical approach.