|
Automatic face recognition has long been established as one of
the most active research areas in computer vision. In spite of the
large number of developed algorithms, its real-world performance
has been, to say the least, disappointing. Even in very controlled
imaging conditions, such as those used for passport photographs, the
error rate is as high as 10%, while in less controlled environments
the performance degrades even further. The goal of this work is to use video to achieve greater robustness of face recognition by resolving some of the inherent ambiguities (shape, texture, illumination etc.) of single-shot recognition. Indeed, the nature of many practical applications is such that more than a single image of a face is available. In surveillance, for example, the face can be tracked to provide a temporal sequence of a moving face. For access-control use of face recognition the user may be assumed to be cooperative and hence be instructed to move in front of a fixed camera. This is important as a number of technical advantages of using video exist: person-specific dynamics can be learnt, or more effective face representations be obtained (e.g. super-resolution images or a 3D face model) than in the single-shot recognition setup. |
![]() |
|
In most cases humans are at the centre of interest in video.
Our aim of this research is to retrieve, and rank by confidence, shots
based on the presence of specific persons. Possible applications include: DVD browsing: Current DVD technology allows users to quickly jump to the chosen part of a film using an on-screen index. However, the available locations are predefined. Our technology could allow the user to rapidly browse scenes by formulating contextual queries. Content-based web search: Many web search engines have very popular image search features (e.g. http://www.google.co.uk/imghp). Currently, search is performed based on the keywords that appear in picture filenames or in the surrounding web page content. By focusing on the content of images, the retrieval can be made much more accurate. Our approach consists of computing a numerical value, a distance, expressing the degree of belief that two face images belong to the same person. Low distance, ideally zero, signifies that images are of the same person, whilst a large one signifies that they are of different people. The method involves computing a series of transformations of the original image, each aimed at removing the effects of a particular extrinsic imaging factor. The end result is a signature image of a person, which depends mainly on the person's identity (and expression) and can be readily classified. |
|
|
Variations in head pose and illumination are some of the most
practically challenging aspects of face recognition. The effects
of changing pose are usually less problematic and can oftentimes
be overcome by acquiring data over a time period e.g. by tracking
a face.
In contrast to pose, illumination changes are much more difficult to
deal with: the illumination setup in
which a face is imaged is in most cases not possible to control, its
physics difficult to accurately model and training data containing
typical appearance variability is usually not available. Thermal spectrum imagery is useful in this regard as it is virtually insensitive to illumination changes. On the other hand, it lacks much of the individual, discriminating facial detail contained in visual images. In this sense, the two modalities can be seen as complementing each other. The key idea is that robustness to extreme illumination changes can be achieved by fusing the two modalities. |
|
|
Incremental learning of GMMs is a
surprisingly difficult task. One of the main challenges of this
problem is the model complexity selection which is required to be
dynamic by the very nature of the incremental learning framework.
Intuitively, if all information that is available at any time is
the current GMM estimate, a single novel point never
carries enough information to cause an increase in the number of
components. We define and consider a special, but particularly common and useful class of Gaussian mixtures - temporally coherent GMMs. Unlike previous approaches which universally assume that new data comes in blocks, each representable by a GMM, this allows our method to perform well also in the important case when novel data points arrive one-by-one, while requiring little additional memory. The key concept is that of "Historical GMM", which is the oldest GMM fit of the same complexity as the current one. |
|
|
Segmentation of medical images is very common and useful procedure. It can be
directly used for volumetric estimates, or as a preprocessing step before
further analysis of data, such as morphological abnormality recognition.
Hence, there is a great demand for fast and accurate segmentation. However, inherent underlying problems contained in this task, despite continuous efforts put into solving it, still mean that fully automatic segmentation has not been achieved. Imaging methods such magnetic resonance imaging (MRI) or ultrasound scans typically produce low contrast or noisy images, with sometimes anisotropic distortions, while scanned tissues exhibit extraordinary variability in density and shape. Our work employs statistical and inference techniques to learn on the fly from minimal user feedback, rapidly producing robust 3D segmentation results. |
![]() |