SegNet is a deep encoder-decoder architecture for multi-class pixelwise segmentation researched and developed by members of the Computer Vision and Robotics Group at the University of Cambridge, UK.
The demo above is an example of a real-time urban road scene segmentation using a trained SegNet. Several unseen examples from the wild from Google Images are provided as motivational examples. It is also possible to search for a street address or upload an image. We will make our best effort to update the demo when more training data becomes available.
SegNet is also effective for indoor scene understanding. This video and images below show examples of the system running on some test sequences.
The architecture consists of a sequence of non-linear processing layers (encoders) and a corresponding set of decoders followed by a pixelwise classifier. Typically, each encoder consists of one or more convolutional layers with batch normalisation and a ReLU non-linearity, followed by non-overlapping maxpooling and sub-sampling. The sparse encoding due to the pooling process is upsampled in the decoder using the maxpooling indices in the encoding sequence (see the figure below). One key ingredient of the SegNet is the use of max-pooling indices in the decoders to perform upsampling of low resolution feature maps. This has the important advantages of retaining high frequency details in the segmented images and also reducing the total number of trainable parameters in the decoders. The entire architecture can be trained end-to-end using stochastic gradient descent. The raw SegNet predictions tend to be smooth even without a CRF based post-processing.
SegNet is described in an article in the journal IEEE Transactions on Pattern Analysis and Machine Intelligence. A detailed description of Bayesian SegNet and SegNet architectures can be found in these first two papers. Please cite these papers to refer to SegNet architecture and its details. The third arXiv submission was also submitted as a paper to CVPR' 15.
Alex Kendall, Vijay Badrinarayanan and Roberto Cipolla "Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding." arXiv preprint arXiv:1511.02680, 2015. ( .pdf ) ( bibtex )
Vijay Badrinarayanan, Ankur Handa and Roberto Cipolla "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling." arXiv preprint arXiv:1505.07293, 2015. ( .pdf ) ( bibtex )
A software implementation of this project can be found on our GitHub repository. The implementation is based on Caffe and our modification to support SegNet is licensed for non-commercial use (license summary).
A detailed tutorial introducing the software and explaining how to train SegNet on the CamVid dataset can be found here.
An example script for computing the BF measure can be downloaded here.