2-1/2 D Interpretation of Single Intensity Images

One of the main goals of computer vision is to extract spatial and geometric information about the external world seen in an image. The extracted spatial information could be as simple as identifying empty spaces for navigational or obstacle avoidance purposes, or it could be for manipulation or object recognition purposes.

In an intensity image, the light received by the camera is a function of several factors such as the shape and reflective properties of the surfaces, the number, type, and relative direction of light sources, and the viewing direction. These factors are highly interrelated making the direct recovery of the 3D shape of the imaged objects difficult. Therefore, the extraction of a 3D description from an image involves an inference process.

One approach to extraction of 3D information from images is the use of quantitative shape recovery methods using stereo, shading, texture, etc. Another approach is the extraction of qualitative 3D information from images. An early example of such a process is extracting line drawings from images with possible 3D intrepretations attached. This is attractive because it is boundary based and terse. Boundaries have been shown to be useful in human vision [1]. The problem is that it is difficult to reliably extract the perfect line drawings from images which are necessary for most line labeling algorithms to work properly.

Visual perceptual organization and 3D interpretation do not consist of a single cohesive process, but rather of a diverse collection of processes working towards the goal of interpreting images. Therefore, one approach to improve the reliability of such qualitative or quantitative 3D inference processes is to use the integration of multiple vision modules. These modules would include (a) low level modules, (b) perceptual organization modules, and (c) a 3D interpretation modules. Integrated processes can use each others' constraints and expertise to disambiguate and complete the difficult parts of the 3D interpretation process.

The POLL (Perceptual Organization and Line Labeling) system is an example implementation of such an integrated 3D interpretation process [2]. POLL is implemented as a blackboard system in which multiple modules at multiple levels of abstraction cooperate to obtain a 3D line labeling from a single intensity image. Any defects from the extraction of low level line drawing modules are diagnosed and completed using the higher level interpretive constraints coming from a 3D line labeling module.

The following is an example result of the POLL system. The input image is and the labeled line drawing generated by the POLL system is shown below:

References

  1. I. Biederman, ``Human Image Understanding: Recent Research and a Theory,'' Computer Vision, Graphics, and Image Processing, pp. 29-73, vol. 32, 1985.
  2. Related publications by Mihran Tuceryan

  3. D. Trytten and M. Tuceryan, ``The Construction of Labeled Line Drawings from Intensity Images.'' In Pattern Recognition Journal, vol. 28, no. 2, pp. 171-198, 1995. (Abstract)