Setting Up Caffe and the Dataset
This implementation of SegNet  is built on top of the Caffe deep learning library. The first step is to download the SegNet source code, which can be found on our GitHub repository here. Our code to support SegNet is licensed for non-commercial use (license summary). To install SegNet, please follow the Caffe installation instructions here. Make sure you also compile Caffe's python wrapper.
Note that this tutorial assumes that you download all files into the folder
/SegNet/ on your machine. Please modify the commands where appropriate if you choose to use a different directory.
SegNet learns to predict pixel-wise class labels from supervised learning. Therefore we require a dataset of input images with corresponding ground truth labels. Label images must be single channel, with each pixel labelled with its class. For this tutorial, we are going to use the CamVid dataset  which contains 367 training and 233 testing images of road scenes. The dataset is taken around Cambridge, UK, and contains day and dusk scenes. We are going to use an 11 class version with an image size of 360 by 480. Download this data in the format required for SegNet, and the rest of the files required for this tutorial, from this GitHub repository.
Your file structure should now look like this:
/SegNet/ CamVid/ test/ testannot/ train/ trainannot/ test.txt train.txt Models/ # SegNet and SegNet-Basic model files for training and testing Scripts/ compute_bn_statistics.py test_segmentation_camvid.py caffe-segnet/ # caffe implementation
We now need to modify
CamVid/test.txt so that SegNet knows where to find the data. SegNet requires a text file of white-space separated paths to images (.jpg or .png) and corresponding label images (.png) alternatively, e.g.
/path/to/image1.png /another/path/to/label1.png /path/to/image2.png /path/label2.png ...
Please open up these two files in a text editor and use the find&replace tool to change '/SegNet/...' to the absolute path of your data.
The next step is to set up a model for training. You can train with either SegNet or SegNet basic (see  for details). First, open the model file
Models/segnet_train.prototxt and inference model file
Models/segnet_inference.prototxt. You will need to modify the data input source line in all the model's data layers. Replace this with the absolute directory to your data file. Depending on your GPU size, you may need to modify the batch size in the training model. On a 12GB GPU such as a NVIDIA K40 or Titan X you should be able to use a batch size of 10 or 6 for SegNet-Basic or SegNet, respectively. If you have a smaller GPU then try and make this as large as will fit, however even a batch size of as low as 2 or 3 should still train well. Secondly, please open the solver file
Models/segnet_solver.prototxt and change two lines; the net and snapshot_prefix directories should match the directory to your data.
Repeat the above steps for the SegNet-Basic model, inference model and solver prototxt files. Create a folder to store your training weights and solver details with
We are now ready to train SegNet! Open up a terminal and issue these commands:
./SegNet/caffe-segnet/build/tools/caffe train -gpu 0 -solver /SegNet/Models/segnet_solver.prototxt # This will begin training SegNet on GPU 0 ./SegNet/caffe-segnet/build/tools/caffe train -gpu 0 -solver /SegNet/Models/segnet_basic_solver.prototxt # This will begin training SegNet-Basic on GPU 0 ./SegNet/caffe-segnet/build/tools/caffe train -gpu 0 -solver /SegNet/Models/segnet_solver.prototxt -weights /SegNet/Models/VGG_ILSVRC_16_layers.caffemodel # This will begin training SegNet on GPU 0 with a pretrained encoder
The third command initialises the encoder weights from the VGG model trained on ImageNet. If you wish to try this, you can download these weights here.
Training on this small dataset shouldn't take too long. After about 50-100 epochs you should see it converge. You should be looking for greater than 90% training accuracy. Once you are happy that the model has converged then we can now test it.
First, open up the scripts
Scripts/test_segmentation_camvid.py and change line 10 to the directory to your SegNet Caffe installation.
The Batch Normalisation layers  in SegNet shift the input feature maps according to their mean and variance statistics for each mini batch during training. At test time we must use the statistics for the entire dataset. To do this run the script
Scripts/compute_bn_statistics.py using the following commands. Make sure you change the training weight file to the one which you wish to use.
python /Segnet/Scripts/compute_bn_statistics.py /SegNet/Models/segnet_train.prototxt /SegNet/Models/Training/segnet_iter_10000.caffemodel /Segnet/Models/Inference/ # compute BN statistics for SegNet python /Segnet/Scripts/compute_bn_statistics.py /SegNet/Models/segnet_basic_train.prototxt /SegNet/Models/Training/segnet_basic_iter_10000.caffemodel /Segnet/Models/Inference/ # compute BN statistics for SegNet-Basic
The script saves the final test weights in the output directory as
/SegNet/Models/Inference/test_weights.caffemodel Please rename them to something more descriptive.
Now we can view the output of SegNet!
test_segmentation_camvid.py will display the input image, ground truth and segmentation prediction for each test image. Try these commands, changing the weight file to the one which you just processed above to with correct inference statistics:
python /SegNet/Scripts/test_segmentation_camvid.py --model /SegNet/Models/segnet_inference.prototxt --weights /SegNet/Models/Inference/test_weights.caffemodel --iter 233 # Test SegNet python /SegNet/Scripts/test_segmentation_camvid.py --model /SegNet/Models/segnet_basic_inference.prototxt --weights /SegNet/Models/Inference/test_weights.caffemodel --iter 233 # Test SegNetBasic
The following table shows the performance we achieved with SegNet on the CamVid dataset. If you have followed this tutorial correctly, you should be able to achieve the first two results. The final result was trained on 3.5K additional labelled images from publicly available datasets, see the paper for further details. The webdemo has been trained on further data which is not publicly available, and on an extra class (road marking).
|Model||Global Accuracy||Class Accuracy||Mean I/U|
|SegNet (Pretrained Encoder)||88.6%||65.9%||50.2%|
|SegNet (3.5K dataset)||86.8%||81.3%||69.1%|
Congratulations, that's it! How does it look? You can try out our trained model on the SegNet webdemo here.
Bayesian SegNet is an implementation of a Bayesian convolutional neural network which can produce an estimate of model uncertainty for semantic segmentation. It uses Monte Carlo Dropout  at test time to generate a posterior distribution of pixel class labels.  shows that this gives a significant increase in segmentation accuracy and provides a measure of model uncertainty.
Model uncertainty can be used to understand with what confidence we can trust image segmentations and to determine to what degree of specificity we can assign a semantic label. For example, can we say that the label is a truck, or simply a moving vehicle? This can have a strong effect on a robot's behavioural decisions.
This model uncertainty is significantly different to the ‘probabilities’ obtained from a softmax classifier. The softmax function approximates relative probabilities between the class labels, but not an overall measure of the model’s uncertainty. For a more in-depth explaination, check out Yarin's blog post "What My Deep Model Doesn't Know...".
The Bayesian SegNet model is identical in architecture, with Dropout layers introduced after the deepest six Encoder and Decoder units. To train the model you should follow the same proceedure outlined above, except this time using
The model will take slightly longer to train because Dropout also acts as a regulariser. The batch normalisation statistics can then be calculated from the trained model as described above. Note that the dropout layers use the weight averaging technique when computing these batch normalisation statistics.
First, open up the script
Scripts/test_bayesian_segnet.py and change line 14 to the directory to your SegNet Caffe installation.
Bayesian SegNet is a stochastic model and uses Monte Carlo dropout sampling to obtain uncertainties over the weights. To test this, we need to prepare a minibatch of samples, where each image in the minibatch is the same image. To do this, use
test_bayesian_segnet.py which will display the input image, ground truth, segmentation prediction and model uncertainty for each test image. Try these commands, changing the weight file to the one which you just processed above with correct inference statistics:
python /SegNet/Scripts/test_bayesian_segnet.py --model /SegNet/Models/bayesian_segnet_inference.prototxt --weights /SegNet/Models/Inference/test_weights.caffemodel --colours /SegNet/Scripts/camvid11.png --data /SegNet/CamVid/test.txt # Test Bayesian SegNet python /SegNet/Scripts/test_bayesian_segnet.py --model /SegNet/Models/bayesian_segnet_basic_inference.prototxt --weights /SegNet/Models/Inference/test_weights.caffemodel --colours /SegNet/Scripts/camvid11.png --data /SegNet/CamVid/test.txt # Test Bayesian SegNet Basic
Here are some example qualitative results from the CamVid dataset. It is also possible to view the model uncertainty for individual classes - some examples are shown here.
Bayesian SegNet segmentation
Average model uncertainty
Model uncertainty for the car class
Model uncertainty for the road class
Model uncertainty for the building class
Thank you for your interest in SegNet! I'd love to hear about any exciting results you produce, or you have any further questions, so please get in touch. My contact details are below. To discuss any issues you experience with the tutorial, please open up an issue on the GitHub repository.
Alex Kendall, November 2015
 Badrinarayanan, Vijay, Alex Kendall, and Roberto Cipolla. "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation." arXiv preprint arXiv:1511.00561 (2015).
 Brostow, Gabriel J., Julien Fauqueur, and Roberto Cipolla. "Semantic object classes in video: A high-definition ground truth database." Pattern Recognition Letters 30.2 (2009): 88-97.
 Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).
 Kendall, Alex, Vijay Badrinarayanan, and Roberto Cipolla. "Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding." arXiv preprint arXiv:1511.02680 (2015).
 Gal, Yarin, and Zoubin Ghahramani. "Dropout as a Bayesian approximation: Representing model uncertainty in deep learning." arXiv preprint arXiv:1506.02142 (2015).