Active Exploration for Robot Parameter Selection in Episodic Reinforcement 
Learning

Kroemer, O; Peters, J

doi:10.1109/ADPRL.2011.5967378

アイテム詳細

登録内容を編集ファイル形式で保存

ダウンロード電子メール

このアイテムの新しいバージョンが利用可能です:
https://pure.mpg.de/pubman/item/item_1788277_2

詳細要約

Active Exploration for Robot Parameter Selection in Episodic Reinforcement Learning

Kroemer, O., & Peters, J. (2011). Active Exploration for Robot Parameter Selection in Episodic Reinforcement Learning. In IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2011) (pp. 25-31). Piscataway, NJ, USA: IEEE.

Item is 公開

表示: 全項目非表示: 全項目

基本情報

表示: 非表示:

アイテムのパーマリンク: https://hdl.handle.net/11858/00-001M-0000-0013-BC3A-4 版のパーマリンク: https://hdl.handle.net/11858/00-001M-0000-0013-BC3B-2

資料種別: 会議論文

ファイル

表示: ファイル

作成者

表示:

非表示:

作成者:
Kroemer, O¹, 著者
Peters, J^{1, 2}, 著者

所属:
1Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_1497795
2Dept. Empirical Inference, Max Planck Institute for Intelligent Systems, Max Planck Society, ou_1497647

内容説明

表示:

非表示:

キーワード: -

要旨: As the complexity of robots and other autonomous systems increases, it becomes more important that these systems can adapt and optimize their settings actively. However, such optimization is rarely trivial. Sampling from the system is often expensive in terms of time and other costs, and excessive sampling should therefore be avoided. The parameter space is also usually continuous and multi-dimensional. Given the inherent exploration-exploitation dilemma of the problem, we propose treating it as an episodic reinforcement learning problem. In this reinforcement learning framework, the policy is defined by the system's parameters and the rewards are given by the system's performance. The rewards accumulate during each episode of a task. In this paper, we present a method for efficiently sampling and optimizing in continuous multidimensional spaces. The approach is based on Gaussian process regression, which can represent continuous non-linear mappings from parameters to system performance. We employ an upper confidence bound policy, which explicitly manages the trade-off between exploration and exploitation. Unlike many other policies for this kind of problem, we do not rely on a discretization of the action space. The presented method was evaluated on a real robot. The robot had to learn grasping parameters in order to adapt its grasping execution to different objects. The proposed method was also tested on a more general gain tuning problem. The results of the experiments show that the presented method can quickly determine suitable parameters and is applicable to real online learning applications.

資料詳細

表示:

非表示:

言語:

日付: 出版: 2011-04

出版の状態: 出版

ページ: -

出版情報: -

目次: -

査読: -

識別子（DOI, ISBNなど）: ISBN: 978-1-4244-9887-1
URI: http://www.ieee-ssci.org/2011/adprl-2011
DOI: 10.1109/ADPRL.2011.5967378
BibTex参照ID: 7050

学位: -

訴訟

表示:

Project information

表示:

出版物 1

表示:

非表示:

出版物名: IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2011)

種別: 会議論文集

著者・編者:

所属:

出版社, 出版地: Piscataway, NJ, USA : IEEE

ページ: - 巻号: - 通巻号: - 開始・終了ページ: 25 - 31 識別子（ISBN, ISSN, DOIなど）: -

アイテム詳細

基本情報

ファイル

関連URL

作成者

内容説明

資料詳細

関連イベント

訴訟

Project information

出版物 1