Imitation and Reinforcement Learning

Kober, J; Peters, J

doi:10.1109/MRA.2010.936952

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

Imitation and Reinforcement Learning

MPS-Authors

/persons/resource/persons84021

Kober, J
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons84135

Peters, J
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5480345
(Publisher version)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Kober, J., & Peters, J. (2010). Imitation and Reinforcement Learning. IEEE Robotics and Automation Magazine, 17(2), 55-62. doi:10.1109/MRA.2010.936952.

Cite as: https://hdl.handle.net/11858/00-001M-0000-0013-BF76-2

Abstract

In this article, we present both novel learning algorithms and experiments using the dynamical system MPs. As such, we describe this MP representation in a way that it is straightforward to reproduce. We review an appropriate imitation learning method, i.e., locally weighted regression, and show how this method can be used both for initializing RL tasks as well as for modifying the start-up phase in a rhythmic task. We also show our current best-suited RL algorithm for this framework, i.e., PoWER. We present two complex motor tasks, i.e., ball-in-a-cup and ball paddling, learned on a real, physical Barrett WAM, using the methods presented in this article. Of particular interest is the ball-paddling application, as it requires a combination of both rhythmic and discrete dynamical systems MPs during the start-up phase to achieve a particular task.