Balancing Safety and Exploitability in Opponent Modeling

Wang, Z; Boularias, A; Mülling, K; Peters, J; Burgard D. Roth, W.

Datensatz

DATENSATZ AKTIONENEXPORT

DownloadE-Mail

Bitte beachten Sie, dass eine neuere Version dieses Datensatzes verfügbar ist:
https://pure.mpg.de/pubman/item/item_1788096_2

DetailsÜbersicht

Balancing Safety and Exploitability in Opponent Modeling

Wang, Z., Boularias, A., Mülling, K., & Peters, J. (2011). Balancing Safety and Exploitability in Opponent Modeling. In Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2011) (pp. 1515-1520). Menlo Park, CA, USA: AAAI Press.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/11858/00-001M-0000-0013-BAD0-D Versions-Permalink: https://hdl.handle.net/11858/00-001M-0000-0013-BAD1-B

Genre: Konferenzbeitrag

ausblenden:

Urheber:
Wang, Z¹, Autor
Boularias, A¹, Autor
Mülling, K¹, Autor
Peters, J^{1, 2}, Autor
Burgard D. Roth, W., Herausgeber

Affiliations:
1Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_1497795
2Dept. Empirical Inference, Max Planck Institute for Intelligent Systems, Max Planck Society, ou_1497647

Inhalt

einblenden:

ausblenden:

Schlagwörter: -

Zusammenfassung: Opponent modeling is a critical mechanism in repeated games. It allows a player to adapt its strategy in order to better respond to the presumed preferences of his opponents. We introduce a new modeling technique that adaptively balances exploitability and risk reduction. An opponent’s strategy is modeled with a set of possible strategies that contain the actual strategy with a high probability. The algorithm is safe as the expected payoff is above the minimax payoff with a high probability, and can exploit the opponents’ preferences when sufficient observations have been obtained. We apply them to normal-form games and stochastic games with a finite number of stages. The performance of the proposed approach is first demonstrated on repeated rock-paper-scissors games. Subsequently, the approach is evaluated in a human-robot table-tennis setting where the robot player learns to prepare to return a served ball. By modeling the human players, the robot chooses a forehand, backhand or middle preparation pose before they serve. The learned strategies can exploit the opponent’s preferences, leading to a higher rate of successful returns.

Details

einblenden:

ausblenden:

Sprache(n):

Datum: Erschienen: 2011-08

Publikationsstatus: Erschienen

Seiten: -

Ort, Verlag, Ausgabe: -

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: ISBN: 978-1-577-35507-6
URI: http://www.aaai.org/Conferences/AAAI/aaai11.php
BibTex Citekey: WangBMP2011

Art des Abschluß: -

Veranstaltung

einblenden:

ausblenden:

Titel: Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2011)

Veranstaltungsort: San Francisco, CA, USA

Start-/Enddatum: -

Entscheidung

einblenden:

Projektinformation

einblenden:

Quelle 1

einblenden:

ausblenden:

Titel: Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2011)

Genre der Quelle: Konferenzband

Urheber:

Affiliations:

Ort, Verlag, Ausgabe: Menlo Park, CA, USA : AAAI Press

Seiten: - Band / Heft: - Artikelnummer: - Start- / Endseite: 1515 - 1520 Identifikator: -

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle 1