## Getting Started with SegNet

This is a tutorial on how to train a SegNet model for multi-class pixel wise classification. By the end of this tutorial you will be able to take a single colour image, such as the one on the left, and produce a labelled output like the image on the right.

For the tutorial on Bayesian SegNet, scroll down here.

**Setting Up Caffe and the Dataset**

This implementation of SegNet [1] is built on top of the Caffe deep learning library. The first step is to download the SegNet source code, which can be found on our GitHub repository here. Our code to support SegNet is licensed for non-commercial use (license summary). To install SegNet, please follow the Caffe installation instructions here. Make sure you also compile Caffe's python wrapper.

Note that this tutorial assumes that you download all files into the folder `/SegNet/`

on your machine. Please modify the commands where appropriate if you choose to use a different directory.

SegNet learns to predict pixel-wise class labels from supervised learning. Therefore we require a dataset of input images with corresponding ground truth labels. Label images must be single channel, with each pixel labelled with its class. For this tutorial, we are going to use the CamVid dataset [2] which contains 367 training and 233 testing images of road scenes. The dataset is taken around Cambridge, UK, and contains day and dusk scenes. We are going to use an 11 class version with an image size of 360 by 480. Download this data in the format required for SegNet, and the rest of the files required for this tutorial, from this GitHub repository.

Your file structure should now look like this:

```
/SegNet/
CamVid/
test/
testannot/
train/
trainannot/
test.txt
train.txt
Models/
# SegNet and SegNet-Basic model files for training and testing
Scripts/
compute_bn_statistics.py
test_segmentation_camvid.py
caffe-segnet/
# caffe implementation
```

We now need to modify `CamVid/train.txt`

and `CamVid/test.txt`

so that SegNet knows where to find the data. SegNet requires a text file of white-space separated paths to images (.jpg or .png) and corresponding label images (.png) alternatively, e.g. `/path/to/image1.png /another/path/to/label1.png /path/to/image2.png /path/label2.png ...`

Please open up these two files in a text editor and use the find&replace tool to change '/SegNet/...' to the absolute path of your data.

**Training SegNet**

The next step is to set up a model for training. You can train with either SegNet or SegNet basic (see [1] for details). First, open the model file `Models/segnet_train.prototxt`

and inference model file `Models/segnet_inference.prototxt`

. You will need to modify the data input source line in all the model's data layers. Replace this with the absolute directory to your data file. Depending on your GPU size, you may need to modify the batch size in the training model. On a 12GB GPU such as a NVIDIA K40 or Titan X you should be able to use a batch size of 10 or 6 for SegNet-Basic or SegNet, respectively. If you have a smaller GPU then try and make this as large as will fit, however even a batch size of as low as 2 or 3 should still train well. Secondly, please open the solver file `Models/segnet_solver.prototxt`

and change two lines; the net and snapshot_prefix directories should match the directory to your data.

Repeat the above steps for the SegNet-Basic model, inference model and solver prototxt files. Create a folder to store your training weights and solver details with `mkdir /SegNet/Models/Training`

We are now ready to train SegNet! Open up a terminal and issue these commands:

```
./SegNet/caffe-segnet/build/tools/caffe train -gpu 0 -solver /SegNet/Models/segnet_solver.prototxt # This will begin training SegNet on GPU 0
./SegNet/caffe-segnet/build/tools/caffe train -gpu 0 -solver /SegNet/Models/segnet_basic_solver.prototxt # This will begin training SegNet-Basic on GPU 0
./SegNet/caffe-segnet/build/tools/caffe train -gpu 0 -solver /SegNet/Models/segnet_solver.prototxt -weights /SegNet/Models/VGG_ILSVRC_16_layers.caffemodel # This will begin training SegNet on GPU 0 with a pretrained encoder
```

The third command initialises the encoder weights from the VGG model trained on ImageNet. If you wish to try this, you can download these weights here.

Training on this small dataset shouldn't take too long. After about 50-100 epochs you should see it converge. You should be looking for greater than 90% training accuracy. Once you are happy that the model has converged then we can now test it.

**Testing SegNet**

First, open up the scripts `Scripts/compute_bn_statistics.py`

and `Scripts/test_segmentation_camvid.py`

and change line 10 to the directory to your SegNet Caffe installation.

The Batch Normalisation layers [3] in SegNet shift the input feature maps according to their mean and variance statistics for each mini batch during training. At test time we must use the statistics for the entire dataset. To do this run the script `Scripts/compute_bn_statistics.py`

using the following commands. Make sure you change the training weight file to the one which you wish to use.

```
python /Segnet/Scripts/compute_bn_statistics.py /SegNet/Models/segnet_train.prototxt /SegNet/Models/Training/segnet_iter_10000.caffemodel /Segnet/Models/Inference/ # compute BN statistics for SegNet
python /Segnet/Scripts/compute_bn_statistics.py /SegNet/Models/segnet_basic_train.prototxt /SegNet/Models/Training/segnet_basic_iter_10000.caffemodel /Segnet/Models/Inference/ # compute BN statistics for SegNet-Basic
```

The script saves the final test weights in the output directory as `/SegNet/Models/Inference/test_weights.caffemodel`

Please rename them to something more descriptive.

Now we can view the output of SegNet! `test_segmentation_camvid.py`

will display the input image, ground truth and segmentation prediction for each test image. Try these commands, changing the weight file to the one which you just processed above to with correct inference statistics:

```
python /SegNet/Scripts/test_segmentation_camvid.py --model /SegNet/Models/segnet_inference.prototxt --weights /SegNet/Models/Inference/test_weights.caffemodel --iter 233 # Test SegNet
python /SegNet/Scripts/test_segmentation_camvid.py --model /SegNet/Models/segnet_basic_inference.prototxt --weights /SegNet/Models/Inference/test_weights.caffemodel --iter 233 # Test SegNetBasic
```

**Results**

The following table shows the performance we achieved with SegNet on the CamVid dataset. If you have followed this tutorial correctly, you should be able to achieve the first two results. The final result was trained on 3.5K additional labelled images from publicly available datasets, see the paper for further details. The webdemo has been trained on further data which is not publicly available, and on an extra class (road marking).

Model | Global Accuracy | Class Accuracy | Mean I/U |
---|---|---|---|

Segnet-Basic | 82.8% | 62.3% | 46.3% |

SegNet (Pretrained Encoder) | 88.6% | 65.9% | 50.2% |

SegNet (3.5K dataset) | 86.8% | 81.3% | 69.1% |

Input Image

SegNet-Basic Segmentation

SegNet Segmentation

Congratulations, that's it! How does it look? You can try out our trained model on the SegNet webdemo here.

## Bayesian SegNet

This is a tutorial on Bayesian SegNet [4], a probabilistic extension to SegNet. By the end of this tutorial you will be able to train a model which can take an image like the one on the left, and produce a segmentation (center) and a measure of model uncertainty (right).

Bayesian SegNet is an implementation of a Bayesian convolutional neural network which can produce an estimate of model uncertainty for semantic segmentation. It uses Monte Carlo Dropout [5] at test time to generate a posterior distribution of pixel class labels. [4] shows that this gives a significant increase in segmentation accuracy and provides a measure of model uncertainty.

Model uncertainty can be used to understand with what confidence we can trust image segmentations and to determine to what degree of specificity we can assign a semantic label. For example, can we say that the label is a truck, or simply a moving vehicle? This can have a strong effect on a robot's behavioural decisions.

This model uncertainty is significantly different to the ‘probabilities’ obtained from a softmax classifier. The softmax function approximates relative probabilities between the class labels, but not an overall measure of the model’s uncertainty. For a more in-depth explaination, check out Yarin's blog post "What My Deep Model Doesn't Know...".

**Training**

The Bayesian SegNet model is identical in architecture, with Dropout layers introduced after the deepest six Encoder and Decoder units. To train the model you should follow the same proceedure outlined above, except this time using `Models/bayesian_segnet_train.prototxt`

and `Models/bayesian_segnet_solver.prototxt`

The model will take slightly longer to train because Dropout also acts as a regulariser. The batch normalisation statistics can then be calculated from the trained model as described above. Note that the dropout layers use the weight averaging technique when computing these batch normalisation statistics.

**Testing**

First, open up the script `Scripts/test_bayesian_segnet.py`

and change line 14 to the directory to your SegNet Caffe installation.

Bayesian SegNet is a stochastic model and uses Monte Carlo dropout sampling to obtain uncertainties over the weights. To test this, we need to prepare a minibatch of samples, where each image in the minibatch is the same image. To do this, use `test_bayesian_segnet.py`

which will display the input image, ground truth, segmentation prediction and model uncertainty for each test image. Try these commands, changing the weight file to the one which you just processed above with correct inference statistics:

```
python /SegNet/Scripts/test_bayesian_segnet.py --model /SegNet/Models/bayesian_segnet_inference.prototxt --weights /SegNet/Models/Inference/test_weights.caffemodel --colours /SegNet/Scripts/camvid11.png --data /SegNet/CamVid/test.txt # Test Bayesian SegNet
python /SegNet/Scripts/test_bayesian_segnet.py --model /SegNet/Models/bayesian_segnet_basic_inference.prototxt --weights /SegNet/Models/Inference/test_weights.caffemodel --colours /SegNet/Scripts/camvid11.png --data /SegNet/CamVid/test.txt # Test Bayesian SegNet Basic
```

**Results**

Here are some example qualitative results from the CamVid dataset. It is also possible to view the model uncertainty for individual classes - some examples are shown here.

Input image

Bayesian SegNet segmentation

Average model uncertainty

Model uncertainty for the car class

Model uncertainty for the road class

Model uncertainty for the building class

Thank you for your interest in SegNet! I'd love to hear about any exciting results you produce, or you have any further questions, so please get in touch. My contact details are below. To discuss any issues you experience with the tutorial, please open up an issue on the GitHub repository.

**Alex Kendall**, November 2015

**References**

[1] Badrinarayanan, Vijay, Alex Kendall, and Roberto Cipolla. "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation." arXiv preprint arXiv:1511.00561 (2015).

[2] Brostow, Gabriel J., Julien Fauqueur, and Roberto Cipolla. "Semantic object classes in video: A high-definition ground truth database." Pattern Recognition Letters 30.2 (2009): 88-97.

[3] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).

[4] Kendall, Alex, Vijay Badrinarayanan, and Roberto Cipolla. "Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding." arXiv preprint arXiv:1511.02680 (2015).

[5] Gal, Yarin, and Zoubin Ghahramani. "Dropout as a Bayesian approximation: Representing model uncertainty in deep learning." arXiv preprint arXiv:1506.02142 (2015).