My work in visual tracking can be broadly divided into three areas: the trade-off between offline and online-trained approaches, the choice of features to use for a particular tracking task, and the creation of a working visual gesture interface.
Offline -vs- Online
Offline-trained approaches (for example, Viola
will have a strong appearance representation, but this is typically
limited in the appearance it captures, and takes a long time to train.
Conversely, online-trained approaches (for example, Grabner
need no prior training and will adapt to appearance changes, but will
suffer from drift as continuous adaptation to current
appearance makes no reference to what the object 'should' look like.
The figures below show the Grabner and Bischof online learning tracker in action- firstly tracking a face through pose changes, and secondly being used as a control interface for the classic 'pong' video game. Finally, an example of how the unrestricted adaptivity of the tracker can cause it to drift onto occluding objects.
Our first paper fused the attractive qualities of both approaches to create a tracker which would be flexible enough to adapt to slight appearance changes of an object over time, but would still maintain proximity to a pre-defined appearance model. It did this by determining locally where the current appearance didn't match the appearance model, and postponing online learning in these areas. For more details look here.
Subsequent work has aimed to improve on this by a) increasing the range of appearances captured by the appearance model, and b) moving the creation of the appearance model online, so that there are no fixed priors or time-consuming offline training phases. More information will follow at a later date.
Learning to Track with Multiple Observers
There's a myriad of features available to today's computer vision scientist, and they seem to be picked from at random in many tracking papers. In this work we view existing tracking algorithms as 'black box' units and learn in a prinicpled manner which trackers, and then combinations of trackers work best for a particular tracking scenario. Results are presented for an exhausitve set of component trackers, and suggested combinations in a parallel and cascaded structure. For more details look here.
Vision-based Gesture Interface
We've used the results from the previous work to create a tracking algorithm to be used in live gesture interface demos, combining both offline/online information, and multiple image cues. Our most recent work was shown at the IFA trade fair in Berlin, and received some press attention. For more details look here.