Efficient Sample Reuse in EM-Based Policy Search

Hachiya, H; Peters, J; Sugiyama, M

doi:10.1007/978-3-642-04180-8_48

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Conference Paper

Efficient Sample Reuse in EM-Based Policy Search

MPS-Authors

/persons/resource/persons84135

Peters, J
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

https://link.springer.com/content/pdf/10.1007%2F978-3-642-04180-8_48.pdf
(Publisher version)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Hachiya, H., Peters, J., & Sugiyama, M. (2009). Efficient Sample Reuse in EM-Based Policy Search. In W. Buntine, M. Grobelnik, D. Mladenic, & J. Shaw-Taylor (Eds.), Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2009, Bled, Slovenia, September 7-11, 2009 (pp. 469-484). Berlin, Germany: Springer.

Cite as: https://hdl.handle.net/11858/00-001M-0000-0013-C307-1

Abstract

Direct policy search is a promising reinforcement learning framework in particular for controlling in continuous, high-dimensional systems such as anthropomorphic robots. Policy search often requires a large number of samples for obtaining a stable policy update estimator due to its high flexibility. However, this is prohibitive when the sampling cost is expensive. In this paper, we extend a EM-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, called Reward-weighted Regression with sample Reuse, is demonstrated through a robot learning experiment.