Vision Seminars

Forthcoming events

The seminar programme is continuously evolving, so please check back for updates.

Easter 2005

??/6/2005 (???)
12-1pm LR??
Photo Dr. Andrew Fitzgibbon
University of Oxford
Applied natural image statistics for computer vision
I shall describe recent work on the regularization of certain inverse problems in computer vision using natural image statistics. I shall look first at the problem of multiview stereo reconstruction recast as image-based rendering, and show results which are of significantly higher quality than those attainable without such priors. The second problem I shall address is layer extraction with subpixel accuracy, again a difficult inverse problem, for which good priors/regularizers are essential if a useful solution is to be obtained. I will show examples of how this sort of work applies to the demanding problems of generating realistic cinematic special effects. Finally I shall discuss some of the potential future applications of this and related tools.

See also

Past events

Lent 2005

21/3/2005 (Monday)
3-4pm LR5
Photo Dr Trevor Darrell
MIT Computer Science and AI Lab
Visual Recognition for Perceptive Interfaces
Recent advances in visual recognition have enabled new forms of perceptive human-computer interfaces. In this talk I will present two new vision algorithms for tracking articulated pose and for recognizing objects, and will describe their use in interactive systems. I'll first describe a method for recognition of human pose which combines example-based matching with model-based refinement, and tracks robustly over time by manipulating modes of an observed likelihood function. I'll show applications of pose tracking to gesture recognition in interactive dialog systems, allowing users to gesture at objects in virtual environments or the real world. I'll next present a new method for detection of general objects and categories, where objects are represented as sets of local patches.

We develop an efficient discriminative method for recognition from sets of unordered features, using a new kernel function that is computable in linear time and approximates the true correspondence-based similarity between sets of points. I'll show how this method can be used for object and landmark recognition with mobile devices, and allow users to retrieve relevant desired information about their current experience.

7/3/2005 (Monday)
12-1pm LR7
Photo Dr. Phil McLauchlan
Imagineer Systems
Imagineer: A Computer Vision Startup
Imagineer Systems Ltd was founded four years ago by Allan Jaenicke and myself with the aim of building innovative products based around computer vision technology. We released our first product, Mokey, in September 2001. The main application of Mokey is called "wire and rig removal", involving 2D tracking of backgrounds in order to paint out foreground objects. More recently we have used our core 2D tracking technology to create Monet, a new product designed for "element replacement" tasks. With Monet you can insert images into TV screens, change the background of an image sequence, or replace one logo with another. It brings together a unique set of tools for handling difficult tracking situations, camera distortion, curved surfaces, shadows and highlights, with the aim of achieving the illusion that the resulting images look as as natural as possible. In my seminar I shall cover some of the history of the company and summarise the algorithms and software we have developed.
21/2/2005 (Monday)
12-1pm LR11
Photo Dr. Richard Bowden
University of Surrey
Body Part Detection and Tracking and its application to Unconstrained Sign Language Recognition
The talk will consist of two complementary parts. Firstly two approaches to locating and tracking parts of the human body will be presented. Following this a flexible monocular system capable of recognising sign lexicons far greater in number than previous approaches will be presented.

Our main interest is not in 3D biometric accuracy, but rather a sufficient discriminatory representation for visual interaction. The first tracking approach employs background suppression and a manually designed approximation to body shape to detect people in a particle filter framework. Using a mixture model of body configurations, we disambiguate the hands of the subject, and predict the likely position of the elbows. In a second approach, the face, torso, legs and hands are detected in cluttered scenes using boosted body part detectors trained using AdaBoost. We present a probabilistic framework of assembling the detected parts into a full 2D human configuration. Body configurations are assembled from the detection's using RANSAC and a joint likelihood model for each configuration by combining the pose likelihood, detector likelihood and corresponding skin model likelihoods. The greatest resultant likelihood is chosen to represent the person of interest.

The detection and tracking is then extended to sign language recognition. The power of the system is due to four key elements: (i) Head and hand detection based upon boosting which removes the need for temperamental colour segmentation; (ii) A body centred description of activity which overcomes issues with camera placement, calibration and user variability; (iii) A two stage classification in which stage I generates a high level linguistic description of activity which naturally generalises and hence reduces training; (iv) A stage II classifier bank which does not require HMMs, further reducing training requirements. The outcome of which is a system capable of running in real-time, and generating extremely high recognition rates for large lexicons with as little as a single training instance per sign. We demonstrate classification rates as high as 92% for a lexicon of 164 words with extremely low training requirements outperforming previous approaches where thousands of training examples are required.

See also

7/2/2005 (Monday)
12-1pm LR11
Photo Professor Graham Finlayson
University of East Anglia
Colour, constancy, invariance, shadows and the chromagenic constraint
Colour constancy is the ability to disambiguate the colour of the prevailing light from the colour of objects in the world. A pink image region might be evidence of a pink surface under a white light or the converse and as such the conventional solutions work by aggregating information. If all RGBs in an image are somewhat reddish then the light is probably reddish (or so the reasoning goes). However, after 50 years of research, estimating the light colour reliably is still one of the main problems facing computer vision and digital photography.

In this talk I will discuss the progress we are making to solving this problem. I will begin by showing that a restricted colour constancy formulation is relatively easy to solve: it is straightforward to recover a stable grey scale image that is invariant to changes in light colour. This invariant proves useful for some vision tasks such as indexing or shadow removal. In the second part of the presentation I will show that the colour constancy problem with respect to two images taken with and without a 'chromagenic' coloured filter placed in front of the camera is easier to solve. Moreover, we speculate that the human visual system might exploit the same trick.

See also

Michaelmas 2004

8/11/2004 (Monday)
12-1pm LR10
Photo Dr. Anton van den Hengel
Adelaide University
A unifying framework for approximated maximum likelihood estimation
Parameter estimation on the basis of image-based measurements is a central problem in computer vision. It arises in the estimation of entities such as the fundamental matrix, homography matrix, and the trifocal tensor, amongst others. Many competing methods, with varied statistical underpinnings, have been developed for particular estimation problems. It emerges that it is possible to rationalise FNS, Meer's HEIV, Kanatani's renormalisation, and Hartley's 8-point method, in terms of their minimisation of a particular cost function. Unifying these methods in this manner provides a basis for their extension to constrained parameter estimation. This talk thus describes a framework within which several methods may be related, facilitating both theoretical and algorithmic comparison.

See also:

21/10/2004 (Thursday)
12-1pm LR10
Photo Dr. Andrew Davison
University of Oxford
Real-Time SLAM with a Single Camera
Recently we have generalised the simultaneous localisation and mapping (SLAM) methodology popular in mobile robotics to demonstrate real-time 3D motion estimation and scene mapping with a generally moving, agile single camera --- turning it into a flexible position sensor, and moving real-time SLAM into the "pure vision" domain where previous emphasis has been on off-line structure from motion methods. This is a particularly challenging SLAM problem, but one with a host of interesting potential applications in fields such as robotics, wearable computing and augmented reality. The presentation will include a live demo, and also present some of our most recent developments in scene surface orientation and interaction.

See also:

If you are interested in giving a seminar presentation or if you would like more information about the seminar series please contact
Ben Tordoff