A Non-Parametric Approach to Dynamic Programming

Kroemer, O; Peters, J

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Konferenzbeitrag

A Non-Parametric Approach to Dynamic Programming

MPG-Autoren

/persons/resource/persons84027

Kroemer, O
Dept. Empirical Inference, Max Planck Institute for Intelligent Systems, Max Planck Society;

/persons/resource/persons84135

Peters, J
Dept. Empirical Inference, Max Planck Institute for Intelligent Systems, Max Planck Society;

Externe Ressourcen

https://papers.nips.cc/paper/4182-a-non-parametric-approach-to-dynamic-programming.pdf
(Verlagsversion)

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Es sind keine frei zugänglichen Volltexte in PuRe verfügbar

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Kroemer, O., & Peters, J. (2012). A Non-Parametric Approach to Dynamic Programming. In J. Shawe-Taylor (Ed.), Advances in Neural Information Processing Systems 24 (pp. 1719-1727). Red Hook, NY, USA: Curran.

Zitierlink: https://hdl.handle.net/11858/00-001M-0000-0013-B872-6

Zusammenfassung

In this paper, we consider the problem of policy evaluation for continuousstate systems. We present a non-parametric approach to policy evaluation, which uses kernel density estimation to represent the system. The true form of the value function for this model can be determined, and can be computed using Galerkin’s method. Furthermore, we also present a unified view of several well-known policy evaluation methods. In particular, we show that the same Galerkin method can be used to derive Least-Squares Temporal Difference learning, Kernelized Temporal Difference learning, and a discrete-state Dynamic Programming solution, as well as our proposed method. In a numerical evaluation of these algorithms, the proposed approach performed better than the other methods.