Getting Started with SegNet

This is a tutorial on how to train a SegNet model for multi-class pixel wise classification. By the end of this tutorial you will be able to take a single colour image, such as the one on the left, and produce a labelled output like the image on the right.

For the tutorial on Bayesian SegNet, scroll down here.

Input Image
Output Segmentation

Setting Up Caffe and the Dataset

This implementation of SegNet [1] is built on top of the Caffe deep learning library. The first step is to download the SegNet source code, which can be found on our GitHub repository here. Our code to support SegNet is licensed for non-commercial use (license summary). To install SegNet, please follow the Caffe installation instructions here. Make sure you also compile Caffe's python wrapper.

Note that this tutorial assumes that you download all files into the folder /SegNet/ on your machine. Please modify the commands where appropriate if you choose to use a different directory.

SegNet learns to predict pixel-wise class labels from supervised learning. Therefore we require a dataset of input images with corresponding ground truth labels. Label images must be single channel, with each pixel labelled with its class. For this tutorial, we are going to use the CamVid dataset [2] which contains 367 training and 233 testing images of road scenes. The dataset is taken around Cambridge, UK, and contains day and dusk scenes. We are going to use an 11 class version with an image size of 360 by 480. Download this data in the format required for SegNet, and the rest of the files required for this tutorial, from this GitHub repository.

Your file structure should now look like this:

/SegNet/
	CamVid/
		test/
		testannot/
		train/
		trainannot/
		test.txt
		train.txt
	Models/
		# SegNet and SegNet-Basic model files for training and testing
	Scripts/
		compute_bn_statistics.py
		test_segmentation_camvid.py
	caffe-segnet/
		# caffe implementation

We now need to modify CamVid/train.txt and CamVid/test.txt so that SegNet knows where to find the data. SegNet requires a text file of white-space separated paths to images (.jpg or .png) and corresponding label images (.png) alternatively, e.g.
/path/to/image1.png /another/path/to/label1.png /path/to/image2.png /path/label2.png ...
Please open up these two files in a text editor and use the find&replace tool to change '/SegNet/...' to the absolute path of your data.

Training SegNet

The next step is to set up a model for training. You can train with either SegNet or SegNet basic (see [1] for details). First, open the model file Models/segnet_train.prototxt and inference model file Models/segnet_inference.prototxt. You will need to modify the data input source line in all the model's data layers. Replace this with the absolute directory to your data file. Depending on your GPU size, you may need to modify the batch size in the training model. On a 12GB GPU such as a NVIDIA K40 or Titan X you should be able to use a batch size of 10 or 6 for SegNet-Basic or SegNet, respectively. If you have a smaller GPU then try and make this as large as will fit, however even a batch size of as low as 2 or 3 should still train well. Secondly, please open the solver file Models/segnet_solver.prototxt and change two lines; the net and snapshot_prefix directories should match the directory to your data.

Repeat the above steps for the SegNet-Basic model, inference model and solver prototxt files. Create a folder to store your training weights and solver details with mkdir /SegNet/Models/Training

We are now ready to train SegNet! Open up a terminal and issue these commands:

./SegNet/caffe-segnet/build/tools/caffe train -gpu 0 -solver /SegNet/Models/segnet_solver.prototxt  # This will begin training SegNet on GPU 0
./SegNet/caffe-segnet/build/tools/caffe train -gpu 0 -solver /SegNet/Models/segnet_basic_solver.prototxt  # This will begin training SegNet-Basic on GPU 0
./SegNet/caffe-segnet/build/tools/caffe train -gpu 0 -solver /SegNet/Models/segnet_solver.prototxt -weights /SegNet/Models/VGG_ILSVRC_16_layers.caffemodel  # This will begin training SegNet on GPU 0 with a pretrained encoder

The third command initialises the encoder weights from the VGG model trained on ImageNet. If you wish to try this, you can download these weights here.

Training on this small dataset shouldn't take too long. After about 50-100 epochs you should see it converge. You should be looking for greater than 90% training accuracy. Once you are happy that the model has converged then we can now test it.

Testing SegNet

First, open up the scripts Scripts/compute_bn_statistics.py and Scripts/test_segmentation_camvid.py and change line 10 to the directory to your SegNet Caffe installation.

The Batch Normalisation layers [3] in SegNet shift the input feature maps according to their mean and variance statistics for each mini batch during training. At test time we must use the statistics for the entire dataset. To do this run the script Scripts/compute_bn_statistics.py using the following commands. Make sure you change the training weight file to the one which you wish to use.

python /Segnet/Scripts/compute_bn_statistics.py /SegNet/Models/segnet_train.prototxt /SegNet/Models/Training/segnet_iter_10000.caffemodel /Segnet/Models/Inference/  # compute BN statistics for SegNet
python /Segnet/Scripts/compute_bn_statistics.py /SegNet/Models/segnet_basic_train.prototxt /SegNet/Models/Training/segnet_basic_iter_10000.caffemodel /Segnet/Models/Inference/  # compute BN statistics for SegNet-Basic

The script saves the final test weights in the output directory as /SegNet/Models/Inference/test_weights.caffemodel Please rename them to something more descriptive.

Now we can view the output of SegNet! test_segmentation_camvid.py will display the input image, ground truth and segmentation prediction for each test image. Try these commands, changing the weight file to the one which you just processed above to with correct inference statistics:

python /SegNet/Scripts/test_segmentation_camvid.py --model /SegNet/Models/segnet_inference.prototxt --weights /SegNet/Models/Inference/test_weights.caffemodel --iter 233  # Test SegNet
python /SegNet/Scripts/test_segmentation_camvid.py --model /SegNet/Models/segnet_basic_inference.prototxt --weights /SegNet/Models/Inference/test_weights.caffemodel --iter 233  # Test SegNetBasic

Results

The following table shows the performance we achieved with SegNet on the CamVid dataset. If you have followed this tutorial correctly, you should be able to achieve the first two results. The final result was trained on 3.5K additional labelled images from publicly available datasets, see the paper for further details. The webdemo has been trained on further data which is not publicly available, and on an extra class (road marking).

Model Global Accuracy Class Accuracy Mean I/U
Segnet-Basic 82.8% 62.3% 46.3%
SegNet (Pretrained Encoder) 88.6% 65.9% 50.2%
SegNet (3.5K dataset) 86.8% 81.3% 69.1%

Input Image

Input Image
Input Image
Input Image
Input Image

SegNet-Basic Segmentation

Output Segmentation SegNet-Basic
Output Segmentation SegNet-Basic
Output Segmentation SegNet-Basic
Output Segmentation SegNet-Basic

SegNet Segmentation

Output Segmentation SegNet
Output Segmentation SegNet
Output Segmentation SegNet
Output Segmentation SegNet

Congratulations, that's it! How does it look? You can try out our trained model on the SegNet webdemo here.

Bayesian SegNet

This is a tutorial on Bayesian SegNet [4], a probabilistic extension to SegNet. By the end of this tutorial you will be able to train a model which can take an image like the one on the left, and produce a segmentation (center) and a measure of model uncertainty (right).

Input Image
Output Segmentation
Model Uncertainty

Bayesian SegNet is an implementation of a Bayesian convolutional neural network which can produce an estimate of model uncertainty for semantic segmentation. It uses Monte Carlo Dropout [5] at test time to generate a posterior distribution of pixel class labels. [4] shows that this gives a significant increase in segmentation accuracy and provides a measure of model uncertainty.

Model uncertainty can be used to understand with what confidence we can trust image segmentations and to determine to what degree of specificity we can assign a semantic label. For example, can we say that the label is a truck, or simply a moving vehicle? This can have a strong effect on a robot's behavioural decisions.

This model uncertainty is significantly different to the ‘probabilities’ obtained from a softmax classifier. The softmax function approximates relative probabilities between the class labels, but not an overall measure of the model’s uncertainty. For a more in-depth explaination, check out Yarin's blog post "What My Deep Model Doesn't Know...".

Training

The Bayesian SegNet model is identical in architecture, with Dropout layers introduced after the deepest six Encoder and Decoder units. To train the model you should follow the same proceedure outlined above, except this time using Models/bayesian_segnet_train.prototxt and Models/bayesian_segnet_solver.prototxt

The model will take slightly longer to train because Dropout also acts as a regulariser. The batch normalisation statistics can then be calculated from the trained model as described above. Note that the dropout layers use the weight averaging technique when computing these batch normalisation statistics.

Testing

First, open up the script Scripts/test_bayesian_segnet.py and change line 14 to the directory to your SegNet Caffe installation.

Bayesian SegNet is a stochastic model and uses Monte Carlo dropout sampling to obtain uncertainties over the weights. To test this, we need to prepare a minibatch of samples, where each image in the minibatch is the same image. To do this, use test_bayesian_segnet.py which will display the input image, ground truth, segmentation prediction and model uncertainty for each test image. Try these commands, changing the weight file to the one which you just processed above with correct inference statistics:

python /SegNet/Scripts/test_bayesian_segnet.py --model /SegNet/Models/bayesian_segnet_inference.prototxt --weights /SegNet/Models/Inference/test_weights.caffemodel --colours /SegNet/Scripts/camvid11.png --data /SegNet/CamVid/test.txt  # Test Bayesian SegNet
python /SegNet/Scripts/test_bayesian_segnet.py --model /SegNet/Models/bayesian_segnet_basic_inference.prototxt --weights /SegNet/Models/Inference/test_weights.caffemodel --colours /SegNet/Scripts/camvid11.png --data /SegNet/CamVid/test.txt  # Test Bayesian SegNet Basic

Results

Here are some example qualitative results from the CamVid dataset. It is also possible to view the model uncertainty for individual classes - some examples are shown here.

Input image

Input Image
Input Image
Input Image
Input Image

Bayesian SegNet segmentation

Input Image
Input Image
Input Image
Input Image

Average model uncertainty

Input Image
Input Image
Input Image
Input Image

Model uncertainty for the car class

Input Image
Input Image
Input Image
Input Image

Model uncertainty for the road class

Input Image
Input Image
Input Image
Input Image

Model uncertainty for the building class

Input Image
Input Image
Input Image
Input Image

Thank you for your interest in SegNet! I'd love to hear about any exciting results you produce, or you have any further questions, so please get in touch. My contact details are below. To discuss any issues you experience with the tutorial, please open up an issue on the GitHub repository.

Alex Kendall, November 2015

References

[1] Badrinarayanan, Vijay, Alex Kendall, and Roberto Cipolla. "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation." arXiv preprint arXiv:1511.00561 (2015).
[2] Brostow, Gabriel J., Julien Fauqueur, and Roberto Cipolla. "Semantic object classes in video: A high-definition ground truth database." Pattern Recognition Letters 30.2 (2009): 88-97.
[3] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).
[4] Kendall, Alex, Vijay Badrinarayanan, and Roberto Cipolla. "Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding." arXiv preprint arXiv:1511.02680 (2015).
[5] Gal, Yarin, and Zoubin Ghahramani. "Dropout as a Bayesian approximation: Representing model uncertainty in deep learning." arXiv preprint arXiv:1506.02142 (2015).