I've now moved to a Research Associate Position at University College London.
Please visit my new site here (this one is no longer updated).
Why is Vision Challenging?
The human visual system has little problem performing the tasks of recovering structure and interpreting scenes around us and this would lead us to assume that the task should be a relatively simple one for a computer. In fact this was the assumption, or perhaps optimism, of the early computer vision researchers who estimated time frames in the order of months for visual reconstruction tasks. We have yet to produce a fully automated visual system over the intervening years, however we have attained a much greater understanding of the problems involved and are able to explain how complex and challenging the task actually is.
Figure 1 portrays the principal parameters that combine to make up the 2D image captured, at an instant in time, by a photo. Here we have neglected the evolution of time and thus have made an assumption of rigidity, that nothing moves or deforms, which in turn has already greatly simplified the problem. The probability theorist Jaynes remarked that "seeing is inference from incomplete information." [Jaynes 2003], a statement that cuts to the heart of the problem. The combination of all of these complicated factors followed by projection onto a single 2D image plane results in a huge loss of information. When presented with the image alone it is no longer possible to measure any of the aforementioned parameters directly, instead we are left with ambiguities, which at best may place constraints on some of these factors.
The loss of information during image formation may be best illustrated by an example. If we consider a red part of an image then we may place constraints on the possible lighting and texture/colour of the object observed but we cannot measure directly whether a white object is being illuminated by a red light or a red object by a white light. The challenge of recovering 3D shape is then to infer an estimate of the geometry of the object in question such that it would generate the images observed under the same remaining parameters of viewpoint, lighting, material, texture and occlusion.
The study of computer vision looks to attempt the inversion of the imaging process by studying the constraints on the interplay of the factors that make up an image and then to generate (either explicitly or in a learning framework) cues and priors which may be combined with image measurements to resolve the information loss and thus the ambiguities. For example, a general scene has a very high dimensionality in terms of the freedom in geometry and reflectance, so there will be insufficient information to estimate these values without strong priors on both the object shape and the surface reflectance. The study of different scene representations and the establishment of techniques which enforce different assumptions lies at the heart of 3D computer vision research.
[Jaynes 2003] E. T. Jaynes. Probability Theory: The Logic of Science. Cambridge University Press, 2003.