de.mpg.escidoc.pubman.appbase.FacesBean
English
 
Help Guide Disclaimer Contact us Login
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

Coherent Inference on Optimal Play in Game Trees

MPS-Authors
http://pubman.mpdl.mpg.de/cone/persons/resource/persons84387

Hennig,  P
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;

Locator
There are no locators available
Fulltext (public)
There are no public fulltexts available
Supplementary Material (public)
There is no public supplementary material available
Citation

Hennig, P., Stern, D., & Graepel, T. (2010). Coherent Inference on Optimal Play in Game Trees. In JMLR Workshop and Conference Proceedings Volume 9: AISTATS 2010 (pp. 326-333). Cambridge, MA, USA: JMLR.


Cite as: http://hdl.handle.net/11858/00-001M-0000-0013-C03C-1
Abstract
Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time.