Balancing Safety and Exploitability in Opponent Modeling

Wang, Z; Boularias, A; Mülling, K; Peters, J; Burgard D. Roth, W.

アイテム詳細

登録内容を編集ファイル形式で保存

一時保存へ追加

このアイテムの新しいバージョンが利用可能です:
https://pure.mpg.de/pubman/item/item_1788096_2

詳細要約

Balancing Safety and Exploitability in Opponent Modeling

Wang, Z., Boularias, A., Mülling, K., & Peters, J. (2011). Balancing Safety and Exploitability in Opponent Modeling. In Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2011) (pp. 1515-1520). Menlo Park, CA, USA: AAAI Press.

Item is 公開

表示: 全項目非表示: 全項目

基本情報

表示: 非表示:

アイテムのパーマリンク: https://hdl.handle.net/11858/00-001M-0000-0013-BAD0-D 版のパーマリンク: https://hdl.handle.net/11858/00-001M-0000-0013-BAD1-B

資料種別: 会議論文

ファイル

表示: ファイル

作成者

表示:

非表示:

作成者:
Wang, Z¹, 著者
Boularias, A¹, 著者
Mülling, K¹, 著者
Peters, J^{1, 2}, 著者
Burgard D. Roth, W., 編集者

所属:
1Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_1497795
2Dept. Empirical Inference, Max Planck Institute for Intelligent Systems, Max Planck Society, ou_1497647

内容説明

表示:

非表示:

キーワード: -

要旨: Opponent modeling is a critical mechanism in repeated games. It allows a player to adapt its strategy in order to better respond to the presumed preferences of his opponents. We introduce a new modeling technique that adaptively balances exploitability and risk reduction. An opponent’s strategy is modeled with a set of possible strategies that contain the actual strategy with a high probability. The algorithm is safe as the expected payoff is above the minimax payoff with a high probability, and can exploit the opponents’ preferences when sufficient observations have been obtained. We apply them to normal-form games and stochastic games with a finite number of stages. The performance of the proposed approach is first demonstrated on repeated rock-paper-scissors games. Subsequently, the approach is evaluated in a human-robot table-tennis setting where the robot player learns to prepare to return a served ball. By modeling the human players, the robot chooses a forehand, backhand or middle preparation pose before they serve. The learned strategies can exploit the opponent’s preferences, leading to a higher rate of successful returns.

資料詳細

表示:

非表示:

言語:

日付: 出版: 2011-08

出版の状態: 出版

ページ: -

出版情報: -

目次: -

査読: -

識別子（DOI, ISBNなど）: ISBN: 978-1-577-35507-6
URI: http://www.aaai.org/Conferences/AAAI/aaai11.php
BibTex参照ID: WangBMP2011

学位: -

訴訟

表示:

Project information

表示:

出版物 1

表示:

非表示:

出版物名: Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2011)

種別: 会議論文集

著者・編者:

所属:

出版社, 出版地: Menlo Park, CA, USA : AAAI Press

ページ: - 巻号: - 通巻号: - 開始・終了ページ: 1515 - 1520 識別子（ISBN, ISSN, DOIなど）: -

アイテム詳細

基本情報

ファイル

関連URL

作成者

内容説明

資料詳細

関連イベント

訴訟

Project information

出版物 1