HOME > Research

Web-site updated on May, 18th 2013.
Refer to http://fabiogalasso.org for an up-to-date site.

Label Propagation in Video Sequences

Pixelwise labelled video sequences are essential for learning multi-class classifiers for video segmentation or scene recognition. However hand labelling video sequences frame-by-frame is tedious, time consuming (30 to 45 mins per frame) and it is difficult to maintain frame-to-frame label consistency.

Label propagation in video sequences

We_formulate the problem as follows: given few hand-labelled frames of a video sequence, (typically the first and last frames), we aim at providing pixelwise labelling of all other frames of the video sequence along with the class probabilities.

A sample result of label propagation obtained by using our probabilistic model. The proposed methods based on image patches and semantic regions are superior to a naive solution based on optical flow

Our novel probabilistic model_and inference algorithm can be based_on pixelwise correspondences obtained_with a variety of_methods. We have_compared qualitatively and quantitatively the label propagation results given by using optic flow estimates, alongside more sophisticated approaches based on image patches, as seen in epitomic models, or extraction of semantically consistent regions.

We have used the propagated labels to train a state of the art Random forest classifier for video segmentation. Compared with training on fully ground truthed data, the classification results demonstrate a minimal loss in accuracy, which supports and encourages the use of the proposed label propagation algorithm.

3D shape reconstruction from homogeneous textures

Shape-From-Texture is the area of Computer Vision which studies the recovery of the shape of a textured object or scene.

New unpublished result. A dry dam and the side view of the reconstructed surface. Normals are recovered locally at each pixel location

Psychological studies show that texture is important in the perception of shape. Moreover it is ubiquitous.

However, the task is ill-posed for a machine and some initial assumption is always necessary to identify the characteristic of the texture conveying the shape information.

New unpublished result. A sunflower field and the side view of the reconstructed surface. Normals are recovered locally at each pixel location

We assume the most general stochastic homogeneity - the generating process is stationary under translation - which allows to consider many natural textures.

Given the initial assumptions, Shape-From-Texture is generally seen as two distinct problems: the estimation of the texture distortion, due to the geometry of the scene and the projection from the 3D world to the 2D image; and the 3D shape reconstruction, given by interpreting the measured distortion. We have contributed to both these aspects.

Video sequence showing a deflating balloon, covered with a homogeneously-textured cloth. The reconstructed sequence is fluid and captures the small variations over time, although the normals are estimated locally and each frame is processed independently. Download: [Video_and_reconstruction]

In_particular, the distortion information is characterized by the use of local spatial frequencies. A novel method is introduced which combines Fourier analysis and the use of Gabor functions to recover all the main instantaneous frequencies of the texture at each pixel.

The 3D reconstruction from the multi-scale distortion is then solved at different levels of complexity: simple planar surfaces, general surfaces with the frontal appearance of the texture available, general surfaces where no further assumption is needed, both in orthographic and perspective views.

Accuracy and robustness are demostrated in all cases. Most importantly the model applies to complex natural textures.