ssd_keras-master/weight_sampling_tutorial.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Weight Sampling Tutorial\n",
    "\n",
    "If you want to fine-tune one of the trained original SSD models on your own dataset, chances are that your dataset doesn't have the same number of classes as the trained model you're trying to fine-tune.\n",
    "\n",
    "This notebook explains a few options for how to deal with this situation. In particular, one solution is to sub-sample (or up-sample) the weight tensors of all the classification layers so that their shapes correspond to the number of classes in your dataset.\n",
    "\n",
    "This notebook explains how this is done."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 0. Our example\n",
    "\n",
    "I'll use a concrete example to make the process clear, but of course the process explained here is the same for any dataset.\n",
    "\n",
    "Consider the following example. You have a dataset on road traffic objects. Let this dataset contain annotations for the following object classes of interest:\n",
    "\n",
    "`['car', 'truck', 'pedestrian', 'bicyclist', 'traffic_light', 'motorcycle', 'bus', 'stop_sign']`\n",
    "\n",
    "That is, your dataset contains annotations for 8 object classes.\n",
    "\n",
    "You would now like to train an SSD300 on this dataset. However, instead of going through all the trouble of training a new model from scratch, you would instead like to use the fully trained original SSD300 model that was trained on MS COCO and fine-tune it on your dataset.\n",
    "\n",
    "The problem is: The SSD300 that was trained on MS COCO predicts 80 different classes, but your dataset has only 8 classes. The weight tensors of the classification layers of the MS COCO model don't have the right shape for your model that is supposed to learn only 8 classes. Bummer.\n",
    "\n",
    "So what options do we have?\n",
    "\n",
    "### Option 1: Just ignore the fact that we need only 8 classes\n",
    "\n",
    "The maybe not so obvious but totally obvious option is: We could just ignore the fact that the trained MS COCO model predicts 80 different classes, but we only want to fine-tune it on 8 classes. We could simply map the 8 classes in our annotated dataset to any 8 indices out of the 80 that the MS COCO model predicts. The class IDs in our dataset could be indices 1-8, they could be the indices `[0, 3, 8, 1, 2, 10, 4, 6, 12]`, or any other 8 out of the 80. Whatever we would choose them to be. The point is that we would be training only 8 out of every 80 neurons that predict the class for a given box and the other 72 would simply not be trained. Nothing would happen to them, because the gradient for them would always be zero, because these indices don't appear in our dataset.\n",
    "\n",
    "This would work, and it wouldn't even be a terrible option. Since only 8 out of the 80 classes would get trained, the model might get gradually worse at predicting the other 72 clases, but we don't care about them anyway, at least not right now. And if we ever realize that we now want to predict more than 8 different classes, our model would be expandable in that sense. Any new class we want to add could just get any one of the remaining free indices as its ID. We wouldn't need to change anything about the model, it would just be a matter of having the dataset annotated accordingly.\n",
    "\n",
    "Still, in this example we don't want to take this route. We don't want to carry around the computational overhead of having overly complex classifier layers, 90 percent of which we don't use anyway, but still their whole output needs to be computed in every forward pass.\n",
    "\n",
    "So what else could we do instead?\n",
    "\n",
    "### Option 2: Just ignore those weights that are causing problems\n",
    "\n",
    "We could build a new SSD300 with 8 classes and load into it the weights of the MS COCO SSD300 for all layers except the classification layers. Would that work? Yes, that would work. The only conflict is with the weights of the classification layers, and we can avoid this conflict by simply ignoring them. While this solution would be easy, it has a significant downside: If we're not loading trained weights for the classification layers of our new SSD300 model, then they will be initialized randomly. We'd still benefit from the trained weights for all the other layers, but the classifier layers would need to be trained from scratch.\n",
    "\n",
    "Not the end of the world, but we like pre-trained stuff, because it saves us a lot of training time. So what else could we do?\n",
    "\n",
    "### Option 3: Sub-sample the weights that are causing problems\n",
    "\n",
    "Instead of throwing the problematic weights away like in option 2, we could also sub-sample them. If the weight tensors of the classification layers of the MS COCO model don't have the right shape for our new model, we'll just **make** them have the right shape. This way we can still benefit from the pre-trained weights in those classification layers. Seems much better than option 2.\n",
    "\n",
    "The great thing in this example is: MS COCO happens to contain all of the eight classes that we care about. So when we sub-sample the weight tensors of the classification layers, we won't just do so randomly. Instead, we'll pick exactly those elements from the tensor that are responsible for the classification of the 8 classes that we care about.\n",
    "\n",
    "However, even if the classes in your dataset were entirely different from the classes in any of the fully trained models, it would still make a lot of sense to use the weights of the fully trained model. Any trained weights are always a better starting point for the training than random initialization, even if your model will be trained on entirely different object classes.\n",
    "\n",
    "And of course, in case you happen to have the opposite problem, where your dataset has **more** classes than the trained model you would like to fine-tune, then you can simply do the same thing in the opposite direction: Instead of sub-sampling the classification layer weights, you would then **up-sample** them. Works just the same way as what we'll be doing below.\n",
    "\n",
    "Let's get to it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import h5py\n",
    "import numpy as np\n",
    "import shutil\n",
    "\n",
    "from misc_utils.tensor_sampling_utils import sample_tensors"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Load the trained weights file and make a copy\n",
    "\n",
    "First, we'll load the HDF5 file that contains the trained weights that we need (the source file). In our case this is \"`VGG_coco_SSD_300x300_iter_400000.h5`\" (download link available in the README of this repo), which are the weights of the original SSD300 model that was trained on MS COCO.\n",
    "\n",
    "Then, we'll make a copy of that weights file. That copy will be our output file (the destination file)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# TODO: Set the path for the source weights file you want to load.\n",
    "\n",
    "weights_source_path = '../../trained_weights/SSD/VGG_coco_SSD_300x300_iter_400000.h5'\n",
    "\n",
    "# TODO: Set the path and name for the destination weights file\n",
    "#       that you want to create.\n",
    "\n",
    "weights_destination_path = '../../trained_weights/SSD/VGG_coco_SSD_300x300_iter_400000_subsampled_8_classes.h5'\n",
    "\n",
    "# Make a copy of the weights file.\n",
    "shutil.copy(weights_source_path, weights_destination_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Load both the source weights file and the copy we made.\n",
    "# We will load the original weights file in read-only mode so that we can't mess up anything.\n",
    "weights_source_file = h5py.File(weights_source_path, 'r')\n",
    "weights_destination_file = h5py.File(weights_destination_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Figure out which weight tensors we need to sub-sample\n",
    "\n",
    "Next, we need to figure out exactly which weight tensors we need to sub-sample. As mentioned above, the weights for all layers except the classification layers are fine, we don't need to change anything about those.\n",
    "\n",
    "So which are the classification layers in SSD300? Their names are:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "classifier_names = ['conv4_3_norm_mbox_conf',\n",
    "                    'fc7_mbox_conf',\n",
    "                    'conv6_2_mbox_conf',\n",
    "                    'conv7_2_mbox_conf',\n",
    "                    'conv8_2_mbox_conf',\n",
    "                    'conv9_2_mbox_conf']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Figure out which slices to pick\n",
    "\n",
    "The following section is optional. I'll look at one classification layer and explain what we want to do, just for your understanding. If you don't care about that, just skip ahead to the next section.\n",
    "\n",
    "We know which weight tensors we want to sub-sample, but we still need to decide which (or at least how many) elements of those tensors we want to keep. Let's take a look at the first of the classifier layers, \"`conv4_3_norm_mbox_conf`\". Its two weight tensors, the kernel and the bias, have the following shapes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Shape of the 'conv4_3_norm_mbox_conf' weights:\n",
      "\n",
      "kernel:\t (3, 3, 512, 324)\n",
      "bias:\t (324,)\n"
     ]
    }
   ],
   "source": [
    "conv4_3_norm_mbox_conf_kernel = weights_source_file[classifier_names[0]][classifier_names[0]]['kernel:0']\n",
    "conv4_3_norm_mbox_conf_bias = weights_source_file[classifier_names[0]][classifier_names[0]]['bias:0']\n",
    "\n",
    "print(\"Shape of the '{}' weights:\".format(classifier_names[0]))\n",
    "print()\n",
    "print(\"kernel:\\t\", conv4_3_norm_mbox_conf_kernel.shape)\n",
    "print(\"bias:\\t\", conv4_3_norm_mbox_conf_bias.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So the last axis has 324 elements. Why is that?\n",
    "\n",
    "- MS COCO has 80 classes, but the model also has one 'backgroud' class, so that makes 81 classes effectively.\n",
    "- The 'conv4_3_norm_mbox_loc' layer predicts 4 boxes for each spatial position, so the 'conv4_3_norm_mbox_conf' layer has to predict one of the 81 classes for each of those 4 boxes.\n",
    "\n",
    "That's why the last axis has 4 * 81 = 324 elements.\n",
    "\n",
    "So how many elements do we want in the last axis for this layer?\n",
    "\n",
    "Let's do the same calculation as above:\n",
    "\n",
    "- Our dataset has 8 classes, but our model will also have a 'background' class, so that makes 9 classes effectively.\n",
    "- We need to predict one of those 9 classes for each of the four boxes at each spatial position.\n",
    "\n",
    "That makes 4 * 9 = 36 elements.\n",
    "\n",
    "Now we know that we want to keep 36 elements in the last axis and leave all other axes unchanged. But which 36 elements out of the original 324 elements do we want?\n",
    "\n",
    "Should we just pick them randomly? If the object classes in our dataset had absolutely nothing to do with the classes in MS COCO, then choosing those 36 elements randomly would be fine (and the next section covers this case, too). But in our particular example case, choosing these elements randomly would be a waste. Since MS COCO happens to contain exactly the 8 classes that we need, instead of sub-sampling randomly, we'll just take exactly those elements that were trained to predict our 8 classes.\n",
    "\n",
    "Here are the indices of the 9 classes in MS COCO that we are interested in:\n",
    "\n",
    "`[0, 1, 2, 3, 4, 6, 8, 10, 12]`\n",
    "\n",
    "The indices above represent the following classes in the MS COCO datasets:\n",
    "\n",
    "`['background', 'person', 'bicycle', 'car', 'motorcycle', 'bus', 'truck', 'traffic_light', 'stop_sign']`\n",
    "\n",
    "How did I find out those indices? I just looked them up in the annotations of the MS COCO dataset.\n",
    "\n",
    "While these are the classes we want, we don't want them in this order. In our dataset, the classes happen to be in the following order as stated at the top of this notebook:\n",
    "\n",
    "`['background', 'car', 'truck', 'pedestrian', 'bicyclist', 'traffic_light', 'motorcycle', 'bus', 'stop_sign']`\n",
    "\n",
    "For example, '`traffic_light`' is class ID 5 in our dataset but class ID 10 in the SSD300 MS COCO model. So the order in which I actually want to pick the 9 indices above is this:\n",
    "\n",
    "`[0, 3, 8, 1, 2, 10, 4, 6, 12]`\n",
    "\n",
    "So out of every 81 in the 324 elements, I want to pick the 9 elements above. This gives us the following 36 indices:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0, 3, 8, 1, 2, 10, 4, 6, 12, 81, 84, 89, 82, 83, 91, 85, 87, 93, 162, 165, 170, 163, 164, 172, 166, 168, 174, 243, 246, 251, 244, 245, 253, 247, 249, 255]\n"
     ]
    }
   ],
   "source": [
    "n_classes_source = 81\n",
    "classes_of_interest = [0, 3, 8, 1, 2, 10, 4, 6, 12]\n",
    "\n",
    "subsampling_indices = []\n",
    "for i in range(int(324/n_classes_source)):\n",
    "    indices = np.array(classes_of_interest) + i * n_classes_source\n",
    "    subsampling_indices.append(indices)\n",
    "subsampling_indices = list(np.concatenate(subsampling_indices))\n",
    "\n",
    "print(subsampling_indices)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These are the indices of the 36 elements that we want to pick from both the bias vector and from the last axis of the kernel tensor.\n",
    "\n",
    "This was the detailed example for the '`conv4_3_norm_mbox_conf`' layer. And of course we haven't actually sub-sampled the weights for this layer yet, we have only figured out which elements we want to keep. The piece of code in the next section will perform the sub-sampling for all the classifier layers."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Sub-sample the classifier weights\n",
    "\n",
    "The code in this section iterates over all the classifier layers of the source weights file and performs the following steps for each classifier layer:\n",
    "\n",
    "1. Get the kernel and bias tensors from the source weights file.\n",
    "2. Compute the sub-sampling indices for the last axis. The first three axes of the kernel remain unchanged.\n",
    "3. Overwrite the corresponding kernel and bias tensors in the destination weights file with our newly created sub-sampled kernel and bias tensors.\n",
    "\n",
    "The second step does what was explained in the previous section.\n",
    "\n",
    "In case you want to **up-sample** the last axis rather than sub-sample it, simply set the `classes_of_interest` variable below to the length you want it to have. The added elements will be initialized either randomly or optionally with zeros. Check out the documentation of `sample_tensors()` for details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# TODO: Set the number of classes in the source weights file. Note that this number must include\n",
    "#       the background class, so for MS COCO's 80 classes, this must be 80 + 1 = 81.\n",
    "n_classes_source = 81\n",
    "# TODO: Set the indices of the classes that you want to pick for the sub-sampled weight tensors.\n",
    "#       In case you would like to just randomly sample a certain number of classes, you can just set\n",
    "#       `classes_of_interest` to an integer instead of the list below. Either way, don't forget to\n",
    "#       include the background class. That is, if you set an integer, and you want `n` positive classes,\n",
    "#       then you must set `classes_of_interest = n + 1`.\n",
    "classes_of_interest = [0, 3, 8, 1, 2, 10, 4, 6, 12]\n",
    "# classes_of_interest = 9 # Uncomment this in case you want to just randomly sub-sample the last axis instead of providing a list of indices.\n",
    "\n",
    "for name in classifier_names:\n",
    "    # Get the trained weights for this layer from the source HDF5 weights file.\n",
    "    kernel = weights_source_file[name][name]['kernel:0'].value\n",
    "    bias = weights_source_file[name][name]['bias:0'].value\n",
    "\n",
    "    # Get the shape of the kernel. We're interested in sub-sampling\n",
    "    # the last dimension, 'o'.\n",
    "    height, width, in_channels, out_channels = kernel.shape\n",
    "    \n",
    "    # Compute the indices of the elements we want to sub-sample.\n",
    "    # Keep in mind that each classification predictor layer predicts multiple\n",
    "    # bounding boxes for every spatial location, so we want to sub-sample\n",
    "    # the relevant classes for each of these boxes.\n",
    "    if isinstance(classes_of_interest, (list, tuple)):\n",
    "        subsampling_indices = []\n",
    "        for i in range(int(out_channels/n_classes_source)):\n",
    "            indices = np.array(classes_of_interest) + i * n_classes_source\n",
    "            subsampling_indices.append(indices)\n",
    "        subsampling_indices = list(np.concatenate(subsampling_indices))\n",
    "    elif isinstance(classes_of_interest, int):\n",
    "        subsampling_indices = int(classes_of_interest * (out_channels/n_classes_source))\n",
    "    else:\n",
    "        raise ValueError(\"`classes_of_interest` must be either an integer or a list/tuple.\")\n",
    "    \n",
    "    # Sub-sample the kernel and bias.\n",
    "    # The `sample_tensors()` function used below provides extensive\n",
    "    # documentation, so don't hesitate to read it if you want to know\n",
    "    # what exactly is going on here.\n",
    "    new_kernel, new_bias = sample_tensors(weights_list=[kernel, bias],\n",
    "                                          sampling_instructions=[height, width, in_channels, subsampling_indices],\n",
    "                                          axes=[[3]], # The one bias dimension corresponds to the last kernel dimension.\n",
    "                                          init=['gaussian', 'zeros'],\n",
    "                                          mean=0.0,\n",
    "                                          stddev=0.005)\n",
    "    \n",
    "    # Delete the old weights from the destination file.\n",
    "    del weights_destination_file[name][name]['kernel:0']\n",
    "    del weights_destination_file[name][name]['bias:0']\n",
    "    # Create new datasets for the sub-sampled weights.\n",
    "    weights_destination_file[name][name].create_dataset(name='kernel:0', data=new_kernel)\n",
    "    weights_destination_file[name][name].create_dataset(name='bias:0', data=new_bias)\n",
    "\n",
    "# Make sure all data is written to our output file before this sub-routine exits.\n",
    "weights_destination_file.flush()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That's it, we're done.\n",
    "\n",
    "Let's just quickly inspect the shapes of the weights of the '`conv4_3_norm_mbox_conf`' layer in the destination weights file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Shape of the 'conv4_3_norm_mbox_conf' weights:\n",
      "\n",
      "kernel:\t (3, 3, 512, 36)\n",
      "bias:\t (36,)\n"
     ]
    }
   ],
   "source": [
    "conv4_3_norm_mbox_conf_kernel = weights_destination_file[classifier_names[0]][classifier_names[0]]['kernel:0']\n",
    "conv4_3_norm_mbox_conf_bias = weights_destination_file[classifier_names[0]][classifier_names[0]]['bias:0']\n",
    "\n",
    "print(\"Shape of the '{}' weights:\".format(classifier_names[0]))\n",
    "print()\n",
    "print(\"kernel:\\t\", conv4_3_norm_mbox_conf_kernel.shape)\n",
    "print(\"bias:\\t\", conv4_3_norm_mbox_conf_bias.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Nice! Exactly what we wanted, 36 elements in the last axis. Now the weights are compatible with our new SSD300 model that predicts 8 positive classes.\n",
    "\n",
    "This is the end of the relevant part of this tutorial, but we can do one more thing and verify that the sub-sampled weights actually work. Let's do that in the next section."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Verify that our sub-sampled weights actually work\n",
    "\n",
    "In our example case above we sub-sampled the fully trained weights of the SSD300 model trained on MS COCO from 80 classes to just the 8 classes that we needed.\n",
    "\n",
    "We can now create a new SSD300 with 8 classes, load our sub-sampled weights into it, and see how the model performs on a few test images that contain objects for some of those 8 classes. Let's do it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Using TensorFlow backend.\n"
     ]
    }
   ],
   "source": [
    "from keras.optimizers import Adam\n",
    "from keras import backend as K\n",
    "from keras.models import load_model\n",
    "\n",
    "from models.keras_ssd300 import ssd_300\n",
    "from keras_loss_function.keras_ssd_loss import SSDLoss\n",
    "from keras_layers.keras_layer_AnchorBoxes import AnchorBoxes\n",
    "from keras_layers.keras_layer_DecodeDetections import DecodeDetections\n",
    "from keras_layers.keras_layer_DecodeDetectionsFast import DecodeDetectionsFast\n",
    "from keras_layers.keras_layer_L2Normalization import L2Normalization\n",
    "\n",
    "from data_generator.object_detection_2d_data_generator import DataGenerator\n",
    "from data_generator.object_detection_2d_photometric_ops import ConvertTo3Channels\n",
    "from data_generator.object_detection_2d_patch_sampling_ops import RandomMaxCropFixedAR\n",
    "from data_generator.object_detection_2d_geometric_ops import Resize"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.1. Set the parameters for the model.\n",
    "\n",
    "As always, set the parameters for the model. We're going to set the configuration for the SSD300 MS COCO model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "img_height = 300 # Height of the input images\n",
    "img_width = 300 # Width of the input images\n",
    "img_channels = 3 # Number of color channels of the input images\n",
    "subtract_mean = [123, 117, 104] # The per-channel mean of the images in the dataset\n",
    "swap_channels = [2, 1, 0] # The color channel order in the original SSD is BGR, so we should set this to `True`, but weirdly the results are better without swapping.\n",
    "# TODO: Set the number of classes.\n",
    "n_classes = 8 # Number of positive classes, e.g. 20 for Pascal VOC, 80 for MS COCO\n",
    "scales = [0.07, 0.15, 0.33, 0.51, 0.69, 0.87, 1.05] # The anchor box scaling factors used in the original SSD300 for the MS COCO datasets.\n",
    "# scales = [0.1, 0.2, 0.37, 0.54, 0.71, 0.88, 1.05] # The anchor box scaling factors used in the original SSD300 for the Pascal VOC datasets.\n",
    "aspect_ratios = [[1.0, 2.0, 0.5],\n",
    "                 [1.0, 2.0, 0.5, 3.0, 1.0/3.0],\n",
    "                 [1.0, 2.0, 0.5, 3.0, 1.0/3.0],\n",
    "                 [1.0, 2.0, 0.5, 3.0, 1.0/3.0],\n",
    "                 [1.0, 2.0, 0.5],\n",
    "                 [1.0, 2.0, 0.5]] # The anchor box aspect ratios used in the original SSD300; the order matters\n",
    "two_boxes_for_ar1 = True\n",
    "steps = [8, 16, 32, 64, 100, 300] # The space between two adjacent anchor box center points for each predictor layer.\n",
    "offsets = [0.5, 0.5, 0.5, 0.5, 0.5, 0.5] # The offsets of the first anchor box center points from the top and left borders of the image as a fraction of the step size for each predictor layer.\n",
    "clip_boxes = False # Whether or not you want to limit the anchor boxes to lie entirely within the image boundaries\n",
    "variances = [0.1, 0.1, 0.2, 0.2] # The variances by which the encoded target coordinates are scaled as in the original implementation\n",
    "normalize_coords = True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.2. Build the model\n",
    "\n",
    "Build the model and load our newly created, sub-sampled weights into it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 1: Build the Keras model\n",
    "\n",
    "K.clear_session() # Clear previous models from memory.\n",
    "\n",
    "model = ssd_300(image_size=(img_height, img_width, img_channels),\n",
    "                n_classes=n_classes,\n",
    "                mode='inference',\n",
    "                l2_regularization=0.0005,\n",
    "                scales=scales,\n",
    "                aspect_ratios_per_layer=aspect_ratios,\n",
    "                two_boxes_for_ar1=two_boxes_for_ar1,\n",
    "                steps=steps,\n",
    "                offsets=offsets,\n",
    "                clip_boxes=clip_boxes,\n",
    "                variances=variances,\n",
    "                normalize_coords=normalize_coords,\n",
    "                subtract_mean=subtract_mean,\n",
    "                divide_by_stddev=None,\n",
    "                swap_channels=swap_channels,\n",
    "                confidence_thresh=0.5,\n",
    "                iou_threshold=0.45,\n",
    "                top_k=200,\n",
    "                nms_max_output_size=400,\n",
    "                return_predictor_sizes=False)\n",
    "\n",
    "print(\"Model built.\")\n",
    "\n",
    "# 2: Load the sub-sampled weights into the model.\n",
    "\n",
    "# Load the weights that we've just created via sub-sampling.\n",
    "weights_path = weights_destination_path\n",
    "\n",
    "model.load_weights(weights_path, by_name=True)\n",
    "\n",
    "print(\"Weights file loaded:\", weights_path)\n",
    "\n",
    "# 3: Instantiate an Adam optimizer and the SSD loss function and compile the model.\n",
    "\n",
    "adam = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)\n",
    "\n",
    "ssd_loss = SSDLoss(neg_pos_ratio=3, alpha=1.0)\n",
    "\n",
    "model.compile(optimizer=adam, loss=ssd_loss.compute_loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.3. Load some images to test our model on\n",
    "\n",
    "We sub-sampled some of the road traffic categories from the trained SSD300 MS COCO weights, so let's try out our model on a few road traffic images. The Udacity road traffic dataset linked to in the `ssd7_training.ipynb` notebook lends itself to this task. Let's instantiate a `DataGenerator` and load the Udacity dataset. Everything here is preset already, but if you'd like to learn more about the data generator and its capabilities, take a look at the detailed tutorial in [this](https://github.com/pierluigiferrari/data_generator_object_detection_2d) repository."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of images in the dataset: 22241\n"
     ]
    }
   ],
   "source": [
    "dataset = DataGenerator()\n",
    "\n",
    "# TODO: Set the paths to your dataset here.\n",
    "images_path = '../../datasets/Udacity_Driving/driving_dataset_consolidated_small/'\n",
    "labels_path = '../../datasets/Udacity_Driving/driving_dataset_consolidated_small/labels.csv'\n",
    "\n",
    "dataset.parse_csv(images_dir=images_path,\n",
    "                  labels_filename=labels_path,\n",
    "                  input_format=['image_name', 'xmin', 'xmax', 'ymin', 'ymax', 'class_id'], # This is the order of the first six columns in the CSV file that contains the labels for your dataset. If your labels are in XML format, maybe the XML parser will be helpful, check the documentation.\n",
    "                  include_classes='all',\n",
    "                  random_sample=False)\n",
    "\n",
    "print(\"Number of images in the dataset:\", dataset.get_dataset_size())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Make sure the batch generator generates images of size `(300, 300)`. We'll first randomly crop the largest possible patch with aspect ratio 1.0 and then resize to `(300, 300)`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "convert_to_3_channels = ConvertTo3Channels()\n",
    "random_max_crop = RandomMaxCropFixedAR(patch_aspect_ratio=img_width/img_height)\n",
    "resize = Resize(height=img_height, width=img_width)\n",
    "\n",
    "generator = dataset.generate(batch_size=1,\n",
    "                             shuffle=True,\n",
    "                             transformations=[convert_to_3_channels,\n",
    "                                              random_max_crop,\n",
    "                                              resize],\n",
    "                             returns={'processed_images',\n",
    "                                      'processed_labels',\n",
    "                                      'filenames'},\n",
    "                             keep_images_without_gt=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Image: ../../datasets/Udacity_Driving/driving_dataset_consolidated_small/1479505696943867867.jpg\n",
      "\n",
      "Ground truth boxes:\n",
      "\n",
      "[[  1   0 148  37 173]\n",
      " [  1  40 139  86 172]\n",
      " [  1  79 143  95 158]\n",
      " [  1 128 143 144 154]\n",
      " [  1 149 111 256 210]]\n"
     ]
    }
   ],
   "source": [
    "# Generate samples\n",
    "\n",
    "batch_images, batch_labels, batch_filenames = next(generator)\n",
    "\n",
    "i = 0 # Which batch item to look at\n",
    "\n",
    "print(\"Image:\", batch_filenames[i])\n",
    "print()\n",
    "print(\"Ground truth boxes:\\n\")\n",
    "print(batch_labels[i])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.4. Make predictions and visualize them"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Make a prediction\n",
    "\n",
    "y_pred = model.predict(batch_images)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Predicted boxes:\n",
      "\n",
      "   class   conf xmin   ymin   xmax   ymax\n",
      "[[  1.     0.95  40.68 137.04  87.31 167.75]\n",
      " [  1.     0.81   0.43 148.85  35.93 172.36]\n",
      " [  2.     0.8  148.55 113.82 259.65 209.92]\n",
      " [  5.     0.31  75.24  24.65  85.85  52.44]]\n"
     ]
    }
   ],
   "source": [
    "# Decode the raw prediction.\n",
    "\n",
    "i = 0\n",
    "\n",
    "confidence_threshold = 0.5\n",
    "\n",
    "y_pred_thresh = [y_pred[k][y_pred[k,:,1] > confidence_threshold] for k in range(y_pred.shape[0])]\n",
    "\n",
    "np.set_printoptions(precision=2, suppress=True, linewidth=90)\n",
    "print(\"Predicted boxes:\\n\")\n",
    "print('    class    conf  xmin    ymin    xmax    ymax')\n",
    "print(y_pred_thresh[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAArgAAAKvCAYAAACBE7wMAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzsvXncLVlZHvpU1Z6+4Xxn7HNOn56bbnoQbEQUVBRBEDUq\nXi+i15hgIugVvIleTIJgEpOYXLk3/uIQIxKUJBqURKLBOEtwuEhigyBjNz2Pp0+f+XzjHit/vO+7\natWqVbVX7V3fd3affp/f75z9VdWaa1XVep/1DlGaplAoFAqFQqFQKK4UxJe7AQqFQqFQKBQKRZPQ\nBa5CoVAoFAqF4oqCLnAVCoVCoVAoFFcUdIGrUCgUCoVCobiioAtchUKhUCgUCsUVBV3gKhQKhUKh\nUCiuKOgCV6FQKBQKhUJxRWHXFrhRFH1dFEX3RlF0fxRFb92tehQKhUKhUCgUChvRbgR6iKIoAfB5\nAK8C8DiAuwH8H2mafrbxyhQKhUKhUCgUCgutXSr3SwHcn6bpgwAQRdGvAXgNAO8Cd9+RI+lVN9zo\nLSjahcbNsqSvbEcDjdyNfs6LRWzTbuNK63PVXK/T17JyrrTxUuwNQt7BezW3nk2xPKWvu/NdvZwj\neWW+iRZ9bl6u9j30sY+dSdP0qmnpdmuBew2Ax6zjxwG8uCzxVTfciB//87utMxPzV4KksUZF/AxM\nouJtcZnsKIoqj/MXp99myS/FxKl7fWoRQZBy3OJ8xacTt+58o0L0VyrHpQRZufUfD9+Og5SXRuEa\nN2WtjgLaJLe7qu+LEgJ7UtGfOveurD++MqLL/LEJ+dDGk6lJsvJqdKdOWkGU1mhMg/Ddu0lJU0Lm\nSp2+h/R4rwxEpj2raQPPiX1tlnemgecd59YY0oY4nn9003QcXK9gskuvhqjGOmGWZ7RQRuW15t79\ns7wZ7Pdv6ryLy1pW1Wb3fW63KbSvqWfQyx6D6rEl/PUoeiSk3t1a4E5FFEXfC+B7AeDI9dfzQ09D\n1/TaQB6qqGIhWvbSmWVhG/ICkzY1t3yvhq+ls7xn53k579ayJ3vg5p84VW0MWdg+EzBL+0ufjxp3\ntakX/+VeQFdB5kgTH9HdQtX9nzY3gha6IQKGc3x5lviE6X2qEGSdvjb5bAXnl7bI+6nq+ZC6mngU\nZ2i3fO8aX+jW6E8Tz2jld4KvzrKYdGE/J6HPSL58p6U8Sar6Po3ss9uUF3HKUWeqRM6vjUnNebtb\nQvITAK6zjq/lcwZpmr4rTdMXpWn6on1HpjLNCoVCoVAoFApFEHaLwb0bwK1RFN0EWth+B4DvDM1s\nSwzppHpbxycJFpKaLUBREyhmKts+iqrEiRmYpGLbahdRCSPF12ra7Ex0CDyjPXtZVWoBM7VlcdHE\n+LuqMABmGv7Yacssd3DS0NbMVJYxt0U3/9wuPLLWiVLVDed0FWMSNzArfe0o7WPFbTDs0yy7O075\ndcrwMS1NMPVy/+cpa9psC25LQNJp7axizPZ+Y6lcjW8aS+nbuZyLxZ+FOS/U3+zHuFwNzh6nPKrG\nbRob6Ru/0PdT/mL4WCYFFQh/wb6274US364scNM0HUVR9AMAfh80l38pTdPP7EZdCoVCoVAoFAqF\njV3TwU3T9HcA/M5ula9QKBQKhUKhUPhw2YzMXESR37gsMxArz1dEfcV/18uBW37Q9vczaf8bwLRe\nVVnFzrZ93uSmhG9DJr8RUqeFVds/sxmX+Tfc6pXRgNFcQHVhHh/cEarftqSx5yN8M7PCnGrm2nPb\ne4H3szpVE/e52ZePu4XpK7601fIurSg/TJ1oftOz8s1gO039uzOLwWTYHZpW7ixb8T4lsV02yJ3B\n0KrsHTyPx406M6hq2363jEYzA8E0dzxLvb4xmJQaB4ejqEZRjrJnyasqV4IqA8S6RmMaqlehUCgU\nCoVCcUVhYRhcIAVij680ZzlfFEh8xmGO4rNjQOYVakqYW5N2BoHXV497yk0TQqbNRtaU+/6dhf0J\nybs7Qm+5TN4UY0vXrdLmcSfzDKD1Q/z5Xl4n7nlk7NDsbapyz1fLhdGUJgQZFi3UHPF3qJZf5yBj\nqsuPacytXJ9cVidmeUQz7Kr5zjbxNIeMistQuueB6c/IPCxcSN6gfjT4+vM+S2U71FUGoSX3284y\njxvSsnGZ5dn1LO1KUZm0ZuXK4CoUCoVCoVAorigsBoMbgZb4Xlc305hCD+tbYGOjXNIQwqQyvsMc\n9MM0iSJEpzhMszDAPUhJR7I2VrCljv+0EC22+VgbakvV+JW1to4UV8dlU6V0LeUEMfKXl8+qZOeM\n8/jFQfY4T2/VLBHlpumL2QzvtFsXMm5NsAy7pWcpKNPl87dld1Hnns73bHHUrwC3c3uNy/3OAMLm\nbbYDVI5QdrTa1V5A/pLzwnKGRKwLY3vzNbl5/NEfwzGN8XYdd3nLCAj4IJeqvonyPEwflxpvuUrW\nOryYmrUqFAqFQqFQKBSLj8VgcBki2dgSelxHecMU5JZbfUx1TmvbzNUDCJEkZgvfF9qKyH/aKbcg\na1aUXlJ+QwxHGYtcxdoljVg2V+i4LQZ5UwtNMXt1GKNZWNPdQnm7Z29j06zAboUeLiu3Ttx5gWG5\nGnq+w/Q3XS7Kv5u32/Mtb92xt8yp27ckamb2NdGPkMAIcq9cXVC7X+79nE33M6AtNa5PY3srW+J8\nKGbRg80Y1vnY3klJamOP5NTnT+u0LdeW/P0tDaxTcXt2U8NdGVyFQqFQKBQKxRWFxWFwI1uis04X\n0uUPaxn3Onl9unaZPzpHqqwQm+pIaJlThnKr0iZRVW5kdMyqU88rOZfBL1016ftyjjJ2mYTcKx06\n/xg7Xgg8OycCOeOGZMyu+yyC/eXFDbFPjbCIk/pl+PTVmujTXmtTGi8UXpsHfx5J2RSDOZt1t5/R\nbcoBeSmLVeMez8KOV+UvhI4PmPt7tYPizv16njayfrnjMhPjOcM7oepLM8+MaoQdz5sNzVxvcRfE\nj8rR44tpSJtKCvK9O2eZpnXzKIOrUCgUCoVCobiisBAMboS8pJoTWucSRidcHq3jQ6JplDFrVXlr\nReUpifIzi5/JEJbRJImofpuJkzpFki2zlsw7XqjWf6sjMRlWqKIjdaKIuRa7QRHsdonsSPbYyrls\n/tSJhOO17hU2tgm93YaYpUZGdob748sRWkrV891IVClPf5pk8i6Hzf5Uu4iG65tFZ7lYRjnSdPru\n1LQ+hVj613lXzgVnF6Sy764dTC5rWFsq32UBOzKFNgTV6lQTkGaWd79bbh1vPlVwdWzLSvWdL1jk\nhKyfSnaHJum4mNbdnagct4jbVG+HVxlchUKhUCgUCsUVBV3gKhQKhUKhUCiuKCyEigIAxPY2hfXn\ntO0W3+VMnSAulFdazlTquygLlDfNV6FDx6fuVU9hkb9NkbTF13fz13SXX2XbIDLmRbdhgNu3KAo3\nCSjWRuVX7ZaVhp/0Rjus8GdSVl7hsP7WUD21jN3RiagzTntRrw+L4iQfmM8uyTGtCaxw5ouzo5Gt\n0suHOsEldhONtaKB/oS45gpCA20ZN6S64UJUCQrhfSujRUyfqbO0wUXI167OO1fqmec5qzSWY7WY\nWfpe1tcg9ZbCp9hWQQ0zWw8xmpsGZXAVCoVCoVAoFFcUFobBzRmZWedL3VqwFOF19eUcS7CIahcr\ns0i0rLhdMIQKKMt1WeaVw8rKKaYtSjZ+YzZf8UXWN+8qrXkehaVKGbeqGmp5npm/pU1JfFKO2/wq\nF02CAnNRJ0zqFIZ6L1BWZVOGE02gqZYU3f2VMOkVde+mo3Mfqpj0JhmPeXcPFoV9qXIRuddYpGdo\nnqbYc76Mqa36LhTfkc24/TPl1S6tPO8s9VR+Hwprh6rGVD9F3nfPFIbWGzxoyoDFNZ5mYxgXWHd1\nvQqFQqFQKBQKxRWExWFwkcInz0Ql7j+qmUmXISxxd19L6KsKZZdvTZjUEM7bNCG1VwVVKOgDX0b3\nWm47w9jL5hrTlJ7otJ2
      "text/plain": [
       "<matplotlib.figure.Figure at 0x7f74d44e4668>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Visualize the predictions.\n",
    "\n",
    "from matplotlib import pyplot as plt\n",
    "\n",
    "%matplotlib inline\n",
    "\n",
    "plt.figure(figsize=(20,12))\n",
    "plt.imshow(batch_images[i])\n",
    "\n",
    "current_axis = plt.gca()\n",
    "\n",
    "classes = ['background', 'car', 'truck', 'pedestrian', 'bicyclist',\n",
    "           'traffic_light', 'motorcycle', 'bus', 'stop_sign'] # Just so we can print class names onto the image instead of IDs\n",
    "\n",
    "# Draw the predicted boxes in blue\n",
    "for box in y_pred_thresh[i]:\n",
    "    class_id = box[0]\n",
    "    confidence = box[1]\n",
    "    xmin = box[2]\n",
    "    ymin = box[3]\n",
    "    xmax = box[4]\n",
    "    ymax = box[5]\n",
    "    label = '{}: {:.2f}'.format(classes[int(class_id)], confidence)\n",
    "    current_axis.add_patch(plt.Rectangle((xmin, ymin), xmax-xmin, ymax-ymin, color='blue', fill=False, linewidth=2))  \n",
    "    current_axis.text(xmin, ymin, label, size='x-large', color='white', bbox={'facecolor':'blue', 'alpha':1.0})\n",
    "\n",
    "# Draw the ground truth boxes in green (omit the label for more clarity)\n",
    "for box in batch_labels[i]:\n",
    "    class_id = box[0]\n",
    "    xmin = box[1]\n",
    "    ymin = box[2]\n",
    "    xmax = box[3]\n",
    "    ymax = box[4]\n",
    "    label = '{}'.format(classes[int(class_id)])\n",
    "    current_axis.add_patch(plt.Rectangle((xmin, ymin), xmax-xmin, ymax-ymin, color='green', fill=False, linewidth=2))  \n",
    "    #current_axis.text(box[1], box[3], label, size='x-large', color='white', bbox={'facecolor':'green', 'alpha':1.0})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "Seems as if our sub-sampled weights were doing a good job, sweet. Now we can fine-tune this model on our dataset with 8 classes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}