The Robot Routing Problem for Collecting Aggregate Stochastic Rewards

Dimitrova, Rayna; Gavran, Ivan; Majumdar, Rupak; Prabhu, Vinayak; Soudjani, Sadegh

アイテム詳細

登録内容を編集ファイル形式で保存

一時保存へ追加

タグ情報を表示リリース履歴を表示詳細要約

公開

成果報告書

The Robot Routing Problem for Collecting Aggregate Stochastic Rewards

MPS-Authors

/persons/resource/persons44322

Dimitrova, Rayna
Group R. Majumdar, Max Planck Institute for Software Systems, Max Planck Society;

/persons/resource/persons144761

Gavran, Ivan
Group R. Majumdar, Max Planck Institute for Software Systems, Max Planck Society;

/persons/resource/persons144534

Majumdar, Rupak
Group R. Majumdar, Max Planck Institute for Software Systems, Max Planck Society;

/persons/resource/persons144936

Prabhu, Vinayak
Group R. Majumdar, Max Planck Institute for Software Systems, Max Planck Society;

/persons/resource/persons144805

Soudjani, Sadegh
Group R. Majumdar, Max Planck Institute for Software Systems, Max Planck Society;

External Resource

There are no locators available

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

フルテキスト (公開)

arXiv:1704.05303.pdf
(プレプリント), 741KB

付随資料 (公開)

There is no public supplementary material available

引用

Dimitrova, R., Gavran, I., Majumdar, R., Prabhu, V., & Soudjani, S. (2017). The Robot Routing Problem for Collecting Aggregate Stochastic Rewards. Retrieved from http://arxiv.org/abs/1704.05303.

引用: https://hdl.handle.net/21.11116/0000-0000-ED2C-5

要旨

We propose a new model for formalizing reward collection problems on graphs with dynamically generated rewards which may appear and disappear based on a stochastic model. The *robot routing problem* is modeled as a graph whose nodes are stochastic processes generating potential rewards over discrete time. The rewards are generated according to the stochastic process, but at each step, an existing reward disappears with a given probability. The edges in the graph encode the (unit-distance) paths between the rewards' locations. On visiting a node, the robot collects the accumulated reward at the node at that time, but traveling between the nodes takes time. The optimization question asks to compute an optimal (or epsilon-optimal) path that maximizes the expected collected rewards. We consider the finite and infinite-horizon robot routing problems. For finite-horizon, the goal is to maximize the total expected reward, while for infinite horizon we consider limit-average objectives. We study the computational and strategy complexity of these problems, establish NP-lower bounds and show that optimal strategies require memory in general. We also provide an algorithm for computing epsilon-optimal infinite paths for arbitrary epsilon > 0.