Acronym MODI
Funding Reference FCT - PTDC/EEA-ACR/72201/2006
Dates 2007-10|2010-09

Motivated by applications in fields that range from robotics to virtual reality, the automatic generation of a 3D description of the real world environment has received the attention of a large number of researchers. Naturally, the use of expensive range sensors, i.e., sensors that provide explicit information about the 3D structure of the environment in front of it, and/or accurately calibrated video cameras, has lead to successful results. However, in many cases only uncalibrated video images are available, due to either obvious economic reasons, or the specific nature of the applications, e.g., modern content-based representations for digital video. Inferring 3D content from 2D images has been one of the overall goals of the Computer Vision research field. In this project, we will step further toward that goal.

Although the quest for the automatic understanding of 3D scenes has been around since the early days of Computer Vision, only recently, tools such as modern large-scale optimization techniques and statistical model-based methods, came into scene. In this context, we will address three main research topics: the correspondence problem, the analysis of non-rigid scenes, and featureless methods for 3D analysis.

In a general scenario, when inferring 3D content from a set of 2D images (obtained either by moving a single camera or by using a set of cameras), a key issue is the correspondence problem, i.e., the problem of determining which feature point in each 2D image corresponds to the same 3D point. This problem is usually solved in a local way, leading to inaccurate results. In opposition, we will use global constraints and develop non-convex large-scale optimization techniques to compute the globally optimal solution to the complete set of correspondences in a set of images.

A crux of most approaches to the automatic inference of 3D content is the underlying assumption of scene rigidity. In fact, these approaches can not deal with time-varying object shapes, which severely limits their application, since, for instance, most biological shapes are intrinsically deformable (skin, organs) or articulated (bones). In this project, we seek to generalize the rigidity assumption and come up with optimization techniques able to deal with both problems of computing correspondences between 2D images and inferring 3D content, in a non-rigid world.

The research line outlined above, as the majority of current methods, is based on an intermediate step that computes local features, e.g., image points. This intermediate step, in general computationally expensive, is often seen as the bottleneck of current solutions for the problem of inferring 3D models from 2D images. In contrast, featureless methods, i.e, methods that process directly the whole image data, without computing inter-image correspondences of pointwise features, have succeed in more constrained scenarios. In this project, we will use statistical modelling techniques to develop new featureless methods that provide partial descriptions of the 3D world. These descriptions will also enable innovative research lines that combine featureless methods with feature-based ones.

Research Groups Signal and Image Processing Group (SIPG)
ISR/IST Responsible
Pedro Aguiar
Alessio Del Bue
João Paulo Costeira