The field of statistical pattern recognition of visual information using digital images is experiencing a boom of scientific discoveries and technological applications. The ultimate goal of this field is to make computers understand a scene captured with a digital camera, in the following way: given a still picture, how can a computer automatically identify what is present (image annotation and context identification), and estimate the 3-D pose and segmentation of the visual objects. The solution to this problem involves the reverse engineering process of how an image is formed. This process comprises an analysis that estimates a 3-D model that may have generated the scene, followed by its verification in the image. This problem is essentially ill-posed because several different models (i.e., different interpretations) can lead to similar pictures. Therefore, the computer has to decide on the most likely model (among several ambiguous models) using image features, statistical models of visual objects, and relations between visual objects to constrain the complex search space for scene interpretations. This application introduces a proposal for a novel methodology to solve the problem above based on a principled probabilistic model that combines hierarchical context classification, visual class recognition, 2-D segmentation, and 3-D pose recovery from 2-D images. This project is relevant for the scientific community and for the industry. For the industry, the technologies developed in this project can improve the accuracy of image search and annotation systems, such as Google images, Yahoo images, Theseus, and Quaero. For the scientific community, the 3-D model abstraction will allow for the recognition of new visual classes with consistent shape information and varying appearance. Moreover, the use of multi-level hierarchical models can lead to efficient search methods in very large databases, and a more effective visual context abstraction.