Research Overview


Face recognition using appearance manifolds


Automatic face recognition has long been established as one of the most active research areas in computer vision. In spite of the large number of developed algorithms, its real-world performance has been, to say the least, disappointing. Even in very controlled imaging conditions, such as those used for passport photographs, the error rate is as high as 10%, while in less controlled environments the performance degrades even further.

The goal of this work is to use video to achieve greater robustness of face recognition by resolving some of the inherent ambiguities (shape, texture, illumination etc.) of single-shot recognition.

Indeed, the nature of many practical applications is such that more than a single image of a face is available. In surveillance, for example, the face can be tracked to provide a temporal sequence of a moving face. For access-control use of face recognition the user may be assumed to be cooperative and hence be instructed to move in front of a fixed camera. This is important as a number of technical advantages of using video exist: person-specific dynamics can be learnt, or more effective face representations be obtained (e.g. super-resolution images or a 3D face model) than in the single-shot recognition setup.

Selected publications:

  • O. Arandjelović and R. Cipolla. Face Recognition from Video using the Generic Shape-Illumination Manifold. In Proc. European Conference on Computer Vision, Vol. 4, pages 27-40, 2006. [LINK]

  • O. Arandjelović and R. Cipolla. A Pose-Wise Linear Illumination Manifold Model for Face Recognition using Video. (under review) [LINK]


  • Face recognition-based video retrieval and organization


    In most cases humans are at the centre of interest in video. Our aim of this research is to retrieve, and rank by confidence, shots based on the presence of specific persons. Possible applications include:

    DVD browsing: Current DVD technology allows users to quickly jump to the chosen part of a film using an on-screen index. However, the available locations are predefined. Our technology could allow the user to rapidly browse scenes by formulating contextual queries.

    Content-based web search: Many web search engines have very popular image search features (e.g. http://www.google.co.uk/imghp). Currently, search is performed based on the keywords that appear in picture filenames or in the surrounding web page content. By focusing on the content of images, the retrieval can be made much more accurate.

    Our approach consists of computing a numerical value, a distance, expressing the degree of belief that two face images belong to the same person. Low distance, ideally zero, signifies that images are of the same person, whilst a large one signifies that they are of different people.

    The method involves computing a series of transformations of the original image, each aimed at removing the effects of a particular extrinsic imaging factor. The end result is a signature image of a person, which depends mainly on the person's identity (and expression) and can be readily classified.

    Selected relevant publications:

  • O. Arandjelović and A. Zisserman. Automatic Face Recognition for Film Character Retrieval in Feature-Length Films. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1, pages 860-867, 2005. [LINK]


  • Multi-modal face biometrics


    Variations in head pose and illumination are some of the most practically challenging aspects of face recognition. The effects of changing pose are usually less problematic and can oftentimes be overcome by acquiring data over a time period e.g. by tracking a face. In contrast to pose, illumination changes are much more difficult to deal with: the illumination setup in which a face is imaged is in most cases not possible to control, its physics difficult to accurately model and training data containing typical appearance variability is usually not available.

    Thermal spectrum imagery is useful in this regard as it is virtually insensitive to illumination changes. On the other hand, it lacks much of the individual, discriminating facial detail contained in visual images. In this sense, the two modalities can be seen as complementing each other. The key idea is that robustness to extreme illumination changes can be achieved by fusing the two modalities.

    Selected relevant publications:

  • O. Arandjelović, R. I. Hammoud and R. Cipolla. Thermal and Reflectance Based Personal Identification Methodology in Challenging Variable Illuminations. (under review) [LINK]


  • Statistics and machine learning


    Incremental learning of GMMs is a surprisingly difficult task. One of the main challenges of this problem is the model complexity selection which is required to be dynamic by the very nature of the incremental learning framework. Intuitively, if all information that is available at any time is the current GMM estimate, a single novel point never carries enough information to cause an increase in the number of components.

    We define and consider a special, but particularly common and useful class of Gaussian mixtures - temporally coherent GMMs. Unlike previous approaches which universally assume that new data comes in blocks, each representable by a GMM, this allows our method to perform well also in the important case when novel data points arrive one-by-one, while requiring little additional memory. The key concept is that of "Historical GMM", which is the oldest GMM fit of the same complexity as the current one.

    Selected relevant publications:

  • O. Arandjelović and R. Cipolla. Incremental Learning of Temporally-Coherent Gaussian Mixture Models. In Proc. IAPR British Machine Vision Conference, Vol. 2, pages 759-768, 2005. [LINK]


  • Semi-supervised 3D segmentation of MRI images


    Segmentation of medical images is very common and useful procedure. It can be directly used for volumetric estimates, or as a preprocessing step before further analysis of data, such as morphological abnormality recognition. Hence, there is a great demand for fast and accurate segmentation.

    However, inherent underlying problems contained in this task, despite continuous efforts put into solving it, still mean that fully automatic segmentation has not been achieved. Imaging methods such magnetic resonance imaging (MRI) or ultrasound scans typically produce low contrast or noisy images, with sometimes anisotropic distortions, while scanned tissues exhibit extraordinary variability in density and shape. Our work employs statistical and inference techniques to learn on the fly from minimal user feedback, rapidly producing robust 3D segmentation results.

    Selected relevant publications:

  • O. Arandjelović. Live-Wire 3D Medical Images Segmentation. BA Dissertation, Department of Engineering, University of Oxford, June, 2002. [LINK]