Reasoning and making decisions under uncertainty appears in numerous applications ranging from standard process control (chemical process control) to robotics (designing robots exploring optimally their unknown environment), sensor networks and tracking (positioning optimally sensors so as to optimize the received information) or finance (option pricing). Such problems are also closely related to experimental design (clinical trials) and active learning (exploring intelligently massive multimedia databases for information retrieval). Despite the growing interest of all these communities in such problems, there is currently no generic computational algorithm available for complex stochastic models. Closed form or exact solutions only exist for very simple models, as the linear Gaussian case and for discrete world problems. In order to deal with continuous spaces, researchers have proposed numerous approximations. Unfortunately, most of them still rely in Gaussian approximations or have been developed for very specific applications.
The objective is to develop modern Monte Carlo methods to solve discrete time stochastic optimal control problems in both the fully and partially observed cases for nonlinear and/or non-Gaussian models. It aims to combine modern computational tools developed actively in statistics (Markov chain Monte Carlo, Sequential Monte Carlo aka Particle filters) to ideas developed in automatic control, operation research (gradient estimation) and reinforcement learning (value function and policy parameterization). The focus is in developing generic algorithms that can be applied in different domains and provide a unified framework for this type of problems.
We expect that the development of these algorithms will have a very large impact in the wide range of applications mentioned above. This impact could be as large as the impact of the development of particle filtering techniques in optimal filtering.
The project is divided in four tasks. Task 1, 2 and 3 focus on the development of the methodology and algorithms and Task 4 applies the algorithms to several robotic case studies. In the first task, we will develop new algorithms to improve the efficiency and to reduce the variance of Monte Carlo approaches for open loop control problems (i.e. model predictive control). The fully and partially observed cases will be studied through the derivation of low variance gradient estimates and sampling Monte Carlo algorithms (Markov Chain Monte Carlo and Sequential Monte Carlo). The second and third tasks deal with the more complex closed loop case in the fully and partially observed cases respectively. The main idea to develop in these tasks is to cast the control problem as a simulation one. The main challenge here is to derive efficient sampling methods for the non standard complex distributions that appear. This will be achieved using the latest advances in Markov Chain Monte Carlo and a recently proposed sampling scheme called sequential Monte Carlo samplers. While the previous algorithms will be evaluated for general problems from different domains, the fourth task will transfer some of the algorithms proposed in the first three ones to the real robotic platforms used at the System and Robotics Institute in Lisbon. The algorithms will control the robot to localize, intercept and grasp uncertain moving objects from a humanoid robot using the vision and sound information provided by its on-board cameras and microphones.
The project focus on the methodological developments required to solve these problems. Hence, the main results of the project are the generic algorithms addressing each of the problems of Tasks 1, 2 and 3. Nevertheless, the algorithms will be thoroughly evaluated using simulated and real data and their convergence properties studied. The cases of study envisaged by the project include robotics and tracking scenarios (moving source sound localization, bearing only tracking, object interception), finance applications (portfolio management) and standard control problems. A special effort will be done in Task 4 to validate them in real robotics problems and integrate them within the projects developed at ISR. So as to ease the spreading of the results, the project will also provide open source code for the new algorithms.