Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Zeitschriftenartikel

Generalized Thompson sampling for sequential decision-making and causal inference

MPG-Autoren
/persons/resource/persons84121

Ortega,  PA
Research Group Sensorimotor Learning and Decision-making, Max Planck Institute for Intelligent Systems, Max Planck Society;
Research Group Sensorimotor Learning and Decision-Making, Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons83827

Braun,  DA
Research Group Sensorimotor Learning and Decision-Making, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Research Group Sensorimotor Learning and Decision-making, Max Planck Institute for Intelligent Systems, Max Planck Society;

Externe Ressourcen
Es sind keine externen Ressourcen hinterlegt
Volltexte (beschränkter Zugriff)
Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.
Volltexte (frei zugänglich)
Es sind keine frei zugänglichen Volltexte in PuRe verfügbar
Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Ortega, P., & Braun, D. (2014). Generalized Thompson sampling for sequential decision-making and causal inference. Complex Adaptive Systems Modeling, 2(2), 1-23. doi:10.1186/2194-3206-2-2.


Zitierlink: https://hdl.handle.net/11858/00-001M-0000-0027-804C-8
Zusammenfassung
Purpose Sampling an action according to the probability that the action is believed to be the optimal one is sometimes called Thompson sampling. Methods Although mostly applied to bandit problems, Thompson sampling can also be used to solve sequential adaptive control problems, when the optimal policy is known for each possible environment. The predictive distribution over actions can then be constructed by a Bayesian superposition of the policies weighted by their posterior probability of being optimal. Results Here we discuss two important features of this approach. First, we show in how far such generalized Thompson sampling can be regarded as an optimal strategy under limited information processing capabilities that constrain the sampling complexity of the decision-making process. Second, we show how such Thompson sampling can be extended to solve causal inference problems when interacting with an environment in a sequential fashion. Conclusion In summary, our results suggest that Thompson sampling might not merely be a useful heuristic, but a principled method to address problems of adaptive sequential decision-making and causal inference.