[Univ of Cambridge][Dept of Engineering]
Thomas Woodley

Research

My work in visual tracking can be broadly divided into three areas: the trade-off between offline and online-trained approaches, the choice of features to use for a particular tracking task, and the creation of a working visual gesture interface.

Offline -vs- Online

Offline-trained approaches (for example, Viola and Jones) will have a strong appearance representation, but this is typically limited in the appearance it captures, and takes a long time to train. Conversely, online-trained approaches (for example, Grabner and Bischof) need no prior training and will adapt to appearance changes, but will suffer from drift as continuous adaptation to current appearance makes no reference to what the object 'should' look like.

The figures below show the Grabner and Bischof online learning tracker in action- firstly tracking a face through pose changes, and secondly being used as a control interface for the classic 'pong' video game. Finally, an example of how the unrestricted adaptivity of the tracker can cause it to drift onto occluding objects.

oltrack oltrack oltrack


pong1pong3



failure1 failure2
failure3 failure3



Our first paper fused the attractive qualities of both approaches to create a tracker which would be flexible enough to adapt to slight appearance changes of an object over time, but would still maintain proximity to a pre-defined appearance model. It did this by determining locally where the current appearance didn't match the appearance model, and postponing online learning in these areas. For more details look here.

Subsequent work has aimed to improve on this by a) increasing the range of appearances captured by the appearance model, and b) moving the creation of the appearance model online, so that there are no fixed priors or time-consuming offline training phases. More information will follow at a later date.

Learning to Track with Multiple Observers

There's a myriad of features available to today's computer vision scientist, and they seem to be picked from at random in many tracking papers. In this work we view existing tracking algorithms as 'black box' units and learn in a prinicpled manner which trackers, and then combinations of trackers work best for a particular tracking scenario. Results are presented for an exhausitve set of component trackers, and suggested combinations in a parallel and cascaded structure. For more details look here.

Vision-based Gesture Interface

We've used the results from the previous work to create a tracking algorithm to be used in live gesture interface demos, combining both offline/online information, and multiple image cues. Our most recent work was shown at the IFA trade fair in Berlin, and received some press attention. For more details look here.