de.mpg.escidoc.pubman.appbase.FacesBean
Deutsch
 
Hilfe Wegweiser Impressum Kontakt Einloggen
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Konferenzbeitrag

Fitted Q-iteration by Advantage Weighted Regression

MPG-Autoren
http://pubman.mpdl.mpg.de/cone/persons/resource/persons84135

Peters,  J
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Dept. Empirical Inference, Max Planck Institute for Intelligent Systems, Max Planck Society;

Externe Ressourcen
Es sind keine Externen Ressourcen verfügbar
Volltexte (frei zugänglich)
Es sind keine frei zugänglichen Volltexte verfügbar
Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Neumann, G., & Peters, J. (2009). Fitted Q-iteration by Advantage Weighted Regression. Advances in neural information processing systems 21: 22nd Annual Conference on Neural Information Processing Systems 2008, 1177-1184.


Zitierlink: http://hdl.handle.net/11858/00-001M-0000-0013-C47D-F
Zusammenfassung
Recently, fitted Q-iteration (FQI) based methods have become more popular due to their increased sample efficiency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in real-world tasks, e.g., in robotics and other technical applications. The greedy action selection commonly used for the policy improvement step is particularly problematic as it is expensive for continuous actions, can cause an unstable learning process, introduces an optimization bias and results in highly non-smooth policies unsuitable for real-world systems. In this paper, we show that by using a soft-greedy action selection the policy improvement step used in FQI can be simplified to an inexpensive advantage-weighted regression. With this result, we are able to derive a new, computationally efficient FQI algorithm which can even deal with high dimensional action spaces.