Peters, J Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society; Max Planck Institute for Biological Cybernetics, Max Planck Society;
https://www.mitpressjournals.org/doi/10.1162/NECO_a_00199 (Verlagsversion)
Hachiya, H., Peters, J., & Sugiyama, M. (2011). Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning. Neural computation, 23(11), 2798-2832. doi:10.1162/NECO_a_00199.