de.mpg.escidoc.pubman.appbase.FacesBean
Deutsch
 
Hilfe Wegweiser Impressum Kontakt Einloggen
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Forschungspapier

Acquiring Target Stacking Skills by Goal-Parameterized Deep Reinforcement Learning

MPG-Autoren
http://pubman.mpdl.mpg.de/cone/persons/resource/persons85396

Li,  Wenbin
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society;

http://pubman.mpdl.mpg.de/cone/persons/resource/persons44451

Fritz,  Mario
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society;

Externe Ressourcen
Es sind keine Externen Ressourcen verfügbar
Volltexte (frei zugänglich)

arXiv:1711.00267.pdf
(Preprint), 445KB

Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Li, W., Bohg, J., & Fritz, M. (2017). Acquiring Target Stacking Skills by Goal-Parameterized Deep Reinforcement Learning. Retrieved from http://arxiv.org/abs/1711.00267.


Zitierlink: http://hdl.handle.net/21.11116/0000-0000-4345-7
Zusammenfassung
Understanding physical phenomena is a key component of human intelligence and enables physical interaction with previously unseen environments. In this paper, we study how an artificial agent can autonomously acquire this intuition through interaction with the environment. We created a synthetic block stacking environment with physics simulation in which the agent can learn a policy end-to-end through trial and error. Thereby, we bypass to explicitly model physical knowledge within the policy. We are specifically interested in tasks that require the agent to reach a given goal state that may be different for every new trial. To this end, we propose a deep reinforcement learning framework that learns policies which are parametrized by a goal. We validated the model on a toy example navigating in a grid world with different target positions and in a block stacking task with different target structures of the final tower. In contrast to prior work, our policies show better generalization across different goals.