de.mpg.escidoc.pubman.appbase.FacesBean
Deutsch
 
Hilfe Wegweiser Impressum Kontakt Einloggen
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Zeitschriftenartikel

Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning

MPG-Autoren
http://pubman.mpdl.mpg.de/cone/persons/resource/persons83950

Hachiya,  H
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;

http://pubman.mpdl.mpg.de/cone/persons/resource/persons84135

Peters,  J
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Dept. Empirical Inference, Max Planck Institute for Intelligent Systems, Max Planck Society;

Externe Ressourcen
Es sind keine Externen Ressourcen verfügbar
Volltexte (frei zugänglich)
Es sind keine frei zugänglichen Volltexte verfügbar
Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Hachiya, H., Peters, J., & Sugiyama, M. (2011). Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning. Neural Computation, 23(11), 2798-2832. doi:10.1162/NECO_a_00199.


Zitierlink: http://hdl.handle.net/11858/00-001M-0000-0013-B90C-2
Zusammenfassung
Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R), is demonstrated through robot learning experiments.