Recent years have shown a growing interest on using multi-robot systems to address a wide range of applications such as logistics, agriculture, transportation and space exploration, where frequently the robot tasks have a cyclic long horizon nature. These are scenarios that require long-term planning and where the robots have to deal with unpredictable environments, and many reliability and efficiency concerns exist. This thesis addresses some of the underlying limitations associated to this kind of tasks carried out in such environments.
We present a novel modeling formalism, generalized stochastic Petri nets with rewards (GSPNRs) that enable capturing action selection, action execution, action duration uncertainty, uncertain outcomes and resources such as counters and battery levels. It also enables modeling the cooperation between different types of robots and defining team-level objectives in the form of reward functions. This representation mitigates state-space explosion by representing robots that share the same characteristics as tokens in the GSPNR model.
We introduce a novel exact method for policy synthesis on a GSPNR formulation. This method optimizes the long-run average reward criterion, while taking into account the uncertainty present in the model. We provide empirical evidence of the performance of this method in a forest fire monitoring scenario and compare it against discounted reward optimization methods and a carefully hand-crafted policy. The results show that, on the long-run, our method performs better than both these approaches. These results also exhibit the limitations of this method, specially the inability to find solutions for models with large state-spaces.
To address this issue we propose an approximate solution method that is able to accurately generalize to unseen parts of the state-space. We introduce an actor-critic deep reinforcement learning algorithm that learns in a GSPNR formulation. This is an on-policy method that learns the policy directly through policy gradient updates that account for action duration while optimizing for the long-run average reward. We assess this method in a simulation of a solar farm inspection scenario. The results show that our method outperforms proximal policy optimization methods. Our approach handles models with state-spaces 5 orders of magnitude greater than what exact methods are capable of. The results also show that this method is robust to losing or adding robots during execution and without further retraining.
Finally, we provide a toolbox that implements GSPNRs, and provides methods to design models, extract policies and execute those policies in the ROS middleware. This toolbox automates model design, through the replication of model templates, and enables automatic model generation from topological maps without the need to understand the GSPNR formalism. We showcase the toolbox methods while solving a solar farm inspection problem simulated in Gazebo.
In this scenario smaller robots perform inspection while a larger robot acts as a mobile battery charger. We demonstrate how to capture multiple battery discharge levels while mitigating state-space explosion and how to represent heterogeneous multi-robot systems while modeling cooperation between them.
In this scenario smaller robots perform inspection while a larger robot acts as a mobile battery charger. We demonstrate how to capture multiple battery discharge levels while mitigating state-space explosion and how to represent heterogeneous multi-robot systems while modeling cooperation between them.
Keywords: Multi-Robot Systems, Multi-Robot Coordination, Planning Under Uncertainty, Generalized Stochastic Petri Nets with Rewards, Long-Run Average Reward Optimization.