Peters, J Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society; Max Planck Institute for Biological Cybernetics, Max Planck Society;
https://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.12-08-922 (Publisher version)
Morimura, T., Uchibe, E., Yoshimoto, J., Peters, J., & Doya, K. (2010). Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning. Neural computation, 22(2), 342-376. doi:10.1162/neco.2009.12-08-922.