ssd_keras-master/ssd300_training.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# SSD300 Training Tutorial\n",
    "\n",
    "This tutorial explains how to train an SSD300 on the Pascal VOC datasets. The preset parameters reproduce the training of the original SSD300 \"07+12\" model. Training SSD512 works simiarly, so there's no extra tutorial for that. The same goes for training on other datasets.\n",
    "\n",
    "You can find a summary of a full training here to get an impression of what it should look like:\n",
    "[SSD300 \"07+12\" training summary](https://github.com/pierluigiferrari/ssd_keras/blob/master/training_summaries/ssd300_pascal_07%2B12_training_summary.md)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/dlsaavedra/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.\n",
      "  from ._conv import register_converters as _register_converters\n",
      "Using TensorFlow backend.\n"
     ]
    }
   ],
   "source": [
    "from keras.optimizers import Adam, SGD\n",
    "from keras.callbacks import ModelCheckpoint, LearningRateScheduler, TerminateOnNaN, CSVLogger\n",
    "from keras import backend as K\n",
    "from keras.models import load_model\n",
    "from math import ceil\n",
    "import numpy as np\n",
    "from matplotlib import pyplot as plt\n",
    "\n",
    "from models.keras_ssd512 import ssd_512\n",
    "from models.keras_ssd300 import ssd_300\n",
    "from keras_loss_function.keras_ssd_loss import SSDLoss\n",
    "from keras_layers.keras_layer_AnchorBoxes import AnchorBoxes\n",
    "from keras_layers.keras_layer_DecodeDetections import DecodeDetections\n",
    "from keras_layers.keras_layer_DecodeDetectionsFast import DecodeDetectionsFast\n",
    "from keras_layers.keras_layer_L2Normalization import L2Normalization\n",
    "\n",
    "from ssd_encoder_decoder.ssd_input_encoder import SSDInputEncoder\n",
    "from ssd_encoder_decoder.ssd_output_decoder import decode_detections, decode_detections_fast\n",
    "\n",
    "from data_generator.object_detection_2d_data_generator import DataGenerator\n",
    "from data_generator.object_detection_2d_geometric_ops import Resize\n",
    "from data_generator.object_detection_2d_photometric_ops import ConvertTo3Channels\n",
    "from data_generator.data_augmentation_chain_original_ssd import SSDDataAugmentation\n",
    "from data_generator.object_detection_2d_misc_utils import apply_inverse_transforms\n",
    "\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 0. Preliminary note\n",
    "\n",
    "All places in the code where you need to make any changes are marked `TODO` and explained accordingly. All code cells that don't contain `TODO` markers just need to be executed."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Set the model configuration parameters\n",
    "\n",
    "This section sets the configuration parameters for the model definition. The parameters set here are being used both by the `ssd_300()` function that builds the SSD300 model as well as further down by the constructor for the `SSDInputEncoder` object that is needed to run the training. Most of these parameters are needed to define the anchor boxes.\n",
    "\n",
    "The parameters as set below produce the original SSD300 architecture that was trained on the Pascal VOC datsets, i.e. they are all chosen to correspond exactly to their respective counterparts in the `.prototxt` file that defines the original Caffe implementation. Note that the anchor box scaling factors of the original SSD implementation vary depending on the datasets on which the models were trained. The scaling factors used for the MS COCO datasets are smaller than the scaling factors used for the Pascal VOC datasets. The reason why the list of scaling factors has 7 elements while there are only 6 predictor layers is that the last scaling factor is used for the second aspect-ratio-1 box of the last predictor layer. Refer to the documentation for details.\n",
    "\n",
    "As mentioned above, the parameters set below are not only needed to build the model, but are also passed to the `SSDInputEncoder` constructor further down, which is responsible for matching and encoding ground truth boxes and anchor boxes during the training. In order to do that, it needs to know the anchor box parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "img_height = 300 # Height of the model input images\n",
    "img_width = 300 # Width of the model input images\n",
    "img_channels = 3 # Number of color channels of the model input images\n",
    "mean_color = [123, 117, 104] # The per-channel mean of the images in the dataset. Do not change this value if you're using any of the pre-trained weights.\n",
    "swap_channels = [2, 1, 0] # The color channel order in the original SSD is BGR, so we'll have the model reverse the color channel order of the input images.\n",
    "n_classes = 20 # Number of positive classes, e.g. 20 for Pascal VOC, 80 for MS COCO\n",
    "scales_pascal = [0.1, 0.2, 0.37, 0.54, 0.71, 0.88, 1.05] # The anchor box scaling factors used in the original SSD300 for the Pascal VOC datasets\n",
    "scales_coco = [0.07, 0.15, 0.33, 0.51, 0.69, 0.87, 1.05] # The anchor box scaling factors used in the original SSD300 for the MS COCO datasets\n",
    "scales = scales_pascal\n",
    "aspect_ratios = [[1.0, 2.0, 0.5],\n",
    "                 [1.0, 2.0, 0.5, 3.0, 1.0/3.0],\n",
    "                 [1.0, 2.0, 0.5, 3.0, 1.0/3.0],\n",
    "                 [1.0, 2.0, 0.5, 3.0, 1.0/3.0],\n",
    "                 [1.0, 2.0, 0.5],\n",
    "                 [1.0, 2.0, 0.5]] # The anchor box aspect ratios used in the original SSD300; the order matters\n",
    "two_boxes_for_ar1 = True\n",
    "steps = [8, 16, 32, 64, 100, 300] # The space between two adjacent anchor box center points for each predictor layer.\n",
    "offsets = [0.5, 0.5, 0.5, 0.5, 0.5, 0.5] # The offsets of the first anchor box center points from the top and left borders of the image as a fraction of the step size for each predictor layer.\n",
    "clip_boxes = False # Whether or not to clip the anchor boxes to lie entirely within the image boundaries\n",
    "variances = [0.1, 0.1, 0.2, 0.2] # The variances by which the encoded target coordinates are divided as in the original implementation\n",
    "normalize_coords = True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Build or load the model\n",
    "\n",
    "You will want to execute either of the two code cells in the subsequent two sub-sections, not both."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.1 Create a new model and load trained VGG-16 weights into it (or trained SSD weights)\n",
    "\n",
    "If you want to create a new SSD300 model, this is the relevant section for you. If you want to load a previously saved SSD300 model, skip ahead to section 2.2.\n",
    "\n",
    "The code cell below does the following things:\n",
    "1. It calls the function `ssd_300()` to build the model.\n",
    "2. It then loads the weights file that is found at `weights_path` into the model. You could load the trained VGG-16 weights or you could load the weights of a trained model. If you want to reproduce the original SSD training, load the pre-trained VGG-16 weights. In any case, you need to set the path to the weights file you want to load on your local machine. Download links to all the trained weights are provided in the [README](https://github.com/pierluigiferrari/ssd_keras/blob/master/README.md) of this repository.\n",
    "3. Finally, it compiles the model for the training. In order to do so, we're defining an optimizer (Adam) and a loss function (SSDLoss) to be passed to the `compile()` method.\n",
    "\n",
    "Normally, the optimizer of choice would be Adam (commented out below), but since the original implementation uses plain SGD with momentum, we'll do the same in order to reproduce the original training. Adam is generally the superior optimizer, so if your goal is not to have everything exactly as in the original training, feel free to switch to Adam. You might need to adjust the learning rate scheduler below slightly in case you use Adam.\n",
    "\n",
    "Note that the learning rate that is being set here doesn't matter, because further below we'll pass a learning rate scheduler to the training function, which will overwrite any learning rate set here, i.e. what matters are the learning rates that are defined by the learning rate scheduler.\n",
    "\n",
    "`SSDLoss` is a custom Keras loss function that implements the multi-task that consists of a log loss for classification and a smooth L1 loss for localization. `neg_pos_ratio` and `alpha` are set as in the paper."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "__________________________________________________________________________________________________\n",
      "Layer (type)                    Output Shape         Param #     Connected to                     \n",
      "==================================================================================================\n",
      "input_1 (InputLayer)            (None, 300, 300, 3)  0                                            \n",
      "__________________________________________________________________________________________________\n",
      "identity_layer (Lambda)         (None, 300, 300, 3)  0           input_1[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "input_mean_normalization (Lambd (None, 300, 300, 3)  0           identity_layer[0][0]             \n",
      "__________________________________________________________________________________________________\n",
      "input_channel_swap (Lambda)     (None, 300, 300, 3)  0           input_mean_normalization[0][0]   \n",
      "__________________________________________________________________________________________________\n",
      "conv1_1 (Conv2D)                (None, 300, 300, 64) 1792        input_channel_swap[0][0]         \n",
      "__________________________________________________________________________________________________\n",
      "conv1_2 (Conv2D)                (None, 300, 300, 64) 36928       conv1_1[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "pool1 (MaxPooling2D)            (None, 150, 150, 64) 0           conv1_2[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv2_1 (Conv2D)                (None, 150, 150, 128 73856       pool1[0][0]                      \n",
      "__________________________________________________________________________________________________\n",
      "conv2_2 (Conv2D)                (None, 150, 150, 128 147584      conv2_1[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "pool2 (MaxPooling2D)            (None, 75, 75, 128)  0           conv2_2[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv3_1 (Conv2D)                (None, 75, 75, 256)  295168      pool2[0][0]                      \n",
      "__________________________________________________________________________________________________\n",
      "conv3_2 (Conv2D)                (None, 75, 75, 256)  590080      conv3_1[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv3_3 (Conv2D)                (None, 75, 75, 256)  590080      conv3_2[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "pool3 (MaxPooling2D)            (None, 38, 38, 256)  0           conv3_3[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv4_1 (Conv2D)                (None, 38, 38, 512)  1180160     pool3[0][0]                      \n",
      "__________________________________________________________________________________________________\n",
      "conv4_2 (Conv2D)                (None, 38, 38, 512)  2359808     conv4_1[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv4_3 (Conv2D)                (None, 38, 38, 512)  2359808     conv4_2[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "pool4 (MaxPooling2D)            (None, 19, 19, 512)  0           conv4_3[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv5_1 (Conv2D)                (None, 19, 19, 512)  2359808     pool4[0][0]                      \n",
      "__________________________________________________________________________________________________\n",
      "conv5_2 (Conv2D)                (None, 19, 19, 512)  2359808     conv5_1[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv5_3 (Conv2D)                (None, 19, 19, 512)  2359808     conv5_2[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "pool5 (MaxPooling2D)            (None, 19, 19, 512)  0           conv5_3[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "fc6 (Conv2D)                    (None, 19, 19, 1024) 4719616     pool5[0][0]                      \n",
      "__________________________________________________________________________________________________\n",
      "fc7 (Conv2D)                    (None, 19, 19, 1024) 1049600     fc6[0][0]                        \n",
      "__________________________________________________________________________________________________\n",
      "conv6_1 (Conv2D)                (None, 19, 19, 256)  262400      fc7[0][0]                        \n",
      "__________________________________________________________________________________________________\n",
      "conv6_padding (ZeroPadding2D)   (None, 21, 21, 256)  0           conv6_1[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv6_2 (Conv2D)                (None, 10, 10, 512)  1180160     conv6_padding[0][0]              \n",
      "__________________________________________________________________________________________________\n",
      "conv7_1 (Conv2D)                (None, 10, 10, 128)  65664       conv6_2[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv7_padding (ZeroPadding2D)   (None, 12, 12, 128)  0           conv7_1[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv7_2 (Conv2D)                (None, 5, 5, 256)    295168      conv7_padding[0][0]              \n",
      "__________________________________________________________________________________________________\n",
      "conv8_1 (Conv2D)                (None, 5, 5, 128)    32896       conv7_2[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv8_2 (Conv2D)                (None, 3, 3, 256)    295168      conv8_1[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv9_1 (Conv2D)                (None, 3, 3, 128)    32896       conv8_2[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv4_3_norm (L2Normalization)  (None, 38, 38, 512)  512         conv4_3[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv9_2 (Conv2D)                (None, 1, 1, 256)    295168      conv9_1[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv4_3_norm_mbox_conf (Conv2D) (None, 38, 38, 84)   387156      conv4_3_norm[0][0]               \n",
      "__________________________________________________________________________________________________\n",
      "fc7_mbox_conf (Conv2D)          (None, 19, 19, 126)  1161342     fc7[0][0]                        \n",
      "__________________________________________________________________________________________________\n",
      "conv6_2_mbox_conf (Conv2D)      (None, 10, 10, 126)  580734      conv6_2[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv7_2_mbox_conf (Conv2D)      (None, 5, 5, 126)    290430      conv7_2[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv8_2_mbox_conf (Conv2D)      (None, 3, 3, 84)     193620      conv8_2[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv9_2_mbox_conf (Conv2D)      (None, 1, 1, 84)     193620      conv9_2[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv4_3_norm_mbox_loc (Conv2D)  (None, 38, 38, 16)   73744       conv4_3_norm[0][0]               \n",
      "__________________________________________________________________________________________________\n",
      "fc7_mbox_loc (Conv2D)           (None, 19, 19, 24)   221208      fc7[0][0]                        \n",
      "__________________________________________________________________________________________________\n",
      "conv6_2_mbox_loc (Conv2D)       (None, 10, 10, 24)   110616      conv6_2[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv7_2_mbox_loc (Conv2D)       (None, 5, 5, 24)     55320       conv7_2[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv8_2_mbox_loc (Conv2D)       (None, 3, 3, 16)     36880       conv8_2[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv9_2_mbox_loc (Conv2D)       (None, 1, 1, 16)     36880       conv9_2[0][0]                    \n",
      "__________________________________________________________________________________________________\n",
      "conv4_3_norm_mbox_conf_reshape  (None, 5776, 21)     0           conv4_3_norm_mbox_conf[0][0]     \n",
      "__________________________________________________________________________________________________\n",
      "fc7_mbox_conf_reshape (Reshape) (None, 2166, 21)     0           fc7_mbox_conf[0][0]              \n",
      "__________________________________________________________________________________________________\n",
      "conv6_2_mbox_conf_reshape (Resh (None, 600, 21)      0           conv6_2_mbox_conf[0][0]          \n",
      "__________________________________________________________________________________________________\n",
      "conv7_2_mbox_conf_reshape (Resh (None, 150, 21)      0           conv7_2_mbox_conf[0][0]          \n",
      "__________________________________________________________________________________________________\n",
      "conv8_2_mbox_conf_reshape (Resh (None, 36, 21)       0           conv8_2_mbox_conf[0][0]          \n",
      "__________________________________________________________________________________________________\n",
      "conv9_2_mbox_conf_reshape (Resh (None, 4, 21)        0           conv9_2_mbox_conf[0][0]          \n",
      "__________________________________________________________________________________________________\n",
      "conv4_3_norm_mbox_priorbox (Anc (None, 38, 38, 4, 8) 0           conv4_3_norm_mbox_loc[0][0]      \n",
      "__________________________________________________________________________________________________\n",
      "fc7_mbox_priorbox (AnchorBoxes) (None, 19, 19, 6, 8) 0           fc7_mbox_loc[0][0]               \n",
      "__________________________________________________________________________________________________\n",
      "conv6_2_mbox_priorbox (AnchorBo (None, 10, 10, 6, 8) 0           conv6_2_mbox_loc[0][0]           \n",
      "__________________________________________________________________________________________________\n",
      "conv7_2_mbox_priorbox (AnchorBo (None, 5, 5, 6, 8)   0           conv7_2_mbox_loc[0][0]           \n",
      "__________________________________________________________________________________________________\n",
      "conv8_2_mbox_priorbox (AnchorBo (None, 3, 3, 4, 8)   0           conv8_2_mbox_loc[0][0]           \n",
      "__________________________________________________________________________________________________\n",
      "conv9_2_mbox_priorbox (AnchorBo (None, 1, 1, 4, 8)   0           conv9_2_mbox_loc[0][0]           \n",
      "__________________________________________________________________________________________________\n",
      "mbox_conf (Concatenate)         (None, 8732, 21)     0           conv4_3_norm_mbox_conf_reshape[0]\n",
      "                                                                 fc7_mbox_conf_reshape[0][0]      \n",
      "                                                                 conv6_2_mbox_conf_reshape[0][0]  \n",
      "                                                                 conv7_2_mbox_conf_reshape[0][0]  \n",
      "                                                                 conv8_2_mbox_conf_reshape[0][0]  \n",
      "                                                                 conv9_2_mbox_conf_reshape[0][0]  \n",
      "__________________________________________________________________________________________________\n",
      "conv4_3_norm_mbox_loc_reshape ( (None, 5776, 4)      0           conv4_3_norm_mbox_loc[0][0]      \n",
      "__________________________________________________________________________________________________\n",
      "fc7_mbox_loc_reshape (Reshape)  (None, 2166, 4)      0           fc7_mbox_loc[0][0]               \n",
      "__________________________________________________________________________________________________\n",
      "conv6_2_mbox_loc_reshape (Resha (None, 600, 4)       0           conv6_2_mbox_loc[0][0]           \n",
      "__________________________________________________________________________________________________\n",
      "conv7_2_mbox_loc_reshape (Resha (None, 150, 4)       0           conv7_2_mbox_loc[0][0]           \n",
      "__________________________________________________________________________________________________\n",
      "conv8_2_mbox_loc_reshape (Resha (None, 36, 4)        0           conv8_2_mbox_loc[0][0]           \n",
      "__________________________________________________________________________________________________\n",
      "conv9_2_mbox_loc_reshape (Resha (None, 4, 4)         0           conv9_2_mbox_loc[0][0]           \n",
      "__________________________________________________________________________________________________\n",
      "conv4_3_norm_mbox_priorbox_resh (None, 5776, 8)      0           conv4_3_norm_mbox_priorbox[0][0] \n",
      "__________________________________________________________________________________________________\n",
      "fc7_mbox_priorbox_reshape (Resh (None, 2166, 8)      0           fc7_mbox_priorbox[0][0]          \n",
      "__________________________________________________________________________________________________\n",
      "conv6_2_mbox_priorbox_reshape ( (None, 600, 8)       0           conv6_2_mbox_priorbox[0][0]      \n",
      "__________________________________________________________________________________________________\n",
      "conv7_2_mbox_priorbox_reshape ( (None, 150, 8)       0           conv7_2_mbox_priorbox[0][0]      \n",
      "__________________________________________________________________________________________________\n",
      "conv8_2_mbox_priorbox_reshape ( (None, 36, 8)        0           conv8_2_mbox_priorbox[0][0]      \n",
      "__________________________________________________________________________________________________\n",
      "conv9_2_mbox_priorbox_reshape ( (None, 4, 8)         0           conv9_2_mbox_priorbox[0][0]      \n",
      "__________________________________________________________________________________________________\n",
      "mbox_conf_softmax (Activation)  (None, 8732, 21)     0           mbox_conf[0][0]                  \n",
      "__________________________________________________________________________________________________\n",
      "mbox_loc (Concatenate)          (None, 8732, 4)      0           conv4_3_norm_mbox_loc_reshape[0][\n",
      "                                                                 fc7_mbox_loc_reshape[0][0]       \n",
      "                                                                 conv6_2_mbox_loc_reshape[0][0]   \n",
      "                                                                 conv7_2_mbox_loc_reshape[0][0]   \n",
      "                                                                 conv8_2_mbox_loc_reshape[0][0]   \n",
      "                                                                 conv9_2_mbox_loc_reshape[0][0]   \n",
      "__________________________________________________________________________________________________\n",
      "mbox_priorbox (Concatenate)     (None, 8732, 8)      0           conv4_3_norm_mbox_priorbox_reshap\n",
      "                                                                 fc7_mbox_priorbox_reshape[0][0]  \n",
      "                                                                 conv6_2_mbox_priorbox_reshape[0][\n",
      "                                                                 conv7_2_mbox_priorbox_reshape[0][\n",
      "                                                                 conv8_2_mbox_priorbox_reshape[0][\n",
      "                                                                 conv9_2_mbox_priorbox_reshape[0][\n",
      "__________________________________________________________________________________________________\n",
      "predictions (Concatenate)       (None, 8732, 33)     0           mbox_conf_softmax[0][0]          \n",
      "                                                                 mbox_loc[0][0]                   \n",
      "                                                                 mbox_priorbox[0][0]              \n",
      "==================================================================================================\n",
      "Total params: 26,285,486\n",
      "Trainable params: 26,285,486\n",
      "Non-trainable params: 0\n",
      "__________________________________________________________________________________________________\n"
     ]
    }
   ],
   "source": [
    "# 1: Build the Keras model.\n",
    "\n",
    "K.clear_session() # Clear previous models from memory.\n",
    "\n",
    "model = ssd_300(image_size=(img_height, img_width, img_channels),\n",
    "                n_classes=n_classes,\n",
    "                mode='training',\n",
    "                l2_regularization=0.0005,\n",
    "                scales=scales,\n",
    "                aspect_ratios_per_layer=aspect_ratios,\n",
    "                two_boxes_for_ar1=two_boxes_for_ar1,\n",
    "                steps=steps,\n",
    "                offsets=offsets,\n",
    "                clip_boxes=clip_boxes,\n",
    "                variances=variances,\n",
    "                normalize_coords=normalize_coords,\n",
    "                subtract_mean=mean_color,\n",
    "                swap_channels=swap_channels)\n",
    "\n",
    "# 2: Load some weights into the model.\n",
    "\n",
    "# TODO: Set the path to the weights you want to load.\n",
    "#weights_path = 'VGG_VOC0712Plus_SSD_300x300_ft_iter_160000.h5'\n",
    "weights_path = 'VGG_ILSVRC_16_layers_fc_reduced.h5'\n",
    "\n",
    "model.load_weights(weights_path, by_name=True)\n",
    "\n",
    "# 3: Instantiate an optimizer and the SSD loss function and compile the model.\n",
    "#    If you want to follow the original Caffe implementation, use the preset SGD\n",
    "#    optimizer, otherwise I'd recommend the commented-out Adam optimizer.\n",
    "\n",
    "#adam = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)\n",
    "sgd = SGD(lr=0.001, momentum=0.9, decay=0.0, nesterov=False)\n",
    "\n",
    "ssd_loss = SSDLoss(neg_pos_ratio=3, alpha=1.0)\n",
    "\n",
    "model.compile(optimizer=sgd, loss=ssd_loss.compute_loss)\n",
    "model.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 Load a previously created model\n",
    "\n",
    "If you have previously created and saved a model and would now like to load it, execute the next code cell. The only thing you need to do here is to set the path to the saved model HDF5 file that you would like to load.\n",
    "\n",
    "The SSD model contains custom objects: Neither the loss function nor the anchor box or L2-normalization layer types are contained in the Keras core library, so we need to provide them to the model loader.\n",
    "\n",
    "This next code cell assumes that you want to load a model that was created in 'training' mode. If you want to load a model that was created in 'inference' or 'inference_fast' mode, you'll have to add the `DecodeDetections` or `DecodeDetectionsFast` layer type to the `custom_objects` dictionary below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "ename": "ValueError",
     "evalue": "Cannot create group in read only mode.",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-23-9dcc0e6f5023>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      9\u001b[0m model = load_model(model_path, custom_objects={'AnchorBoxes': AnchorBoxes,\n\u001b[1;32m     10\u001b[0m                                                \u001b[0;34m'L2Normalization'\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mL2Normalization\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 11\u001b[0;31m                                                'compute_loss': ssd_loss.compute_loss})\n\u001b[0m",
      "\u001b[0;32m~/anaconda3/envs/model/lib/python3.6/site-packages/keras/engine/saving.py\u001b[0m in \u001b[0;36mload_model\u001b[0;34m(filepath, custom_objects, compile)\u001b[0m\n\u001b[1;32m    417\u001b[0m     \u001b[0mf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mh5dict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilepath\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'r'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    418\u001b[0m     \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 419\u001b[0;31m         \u001b[0mmodel\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_deserialize_model\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcustom_objects\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcompile\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    420\u001b[0m     \u001b[0;32mfinally\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    421\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mopened_new_file\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/envs/model/lib/python3.6/site-packages/keras/engine/saving.py\u001b[0m in \u001b[0;36m_deserialize_model\u001b[0;34m(f, custom_objects, compile)\u001b[0m\n\u001b[1;32m    219\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mobj\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    220\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 221\u001b[0;31m     \u001b[0mmodel_config\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'model_config'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    222\u001b[0m     \u001b[0;32mif\u001b[0m \u001b[0mmodel_config\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    223\u001b[0m         \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'No model found in config.'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/envs/model/lib/python3.6/site-packages/keras/utils/io_utils.py\u001b[0m in \u001b[0;36m__getitem__\u001b[0;34m(self, attr)\u001b[0m\n\u001b[1;32m    300\u001b[0m             \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    301\u001b[0m                 \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_only\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 302\u001b[0;31m                     \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Cannot create group in read only mode.'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    303\u001b[0m                 \u001b[0mval\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mH5Dict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcreate_group\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mattr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    304\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mval\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mValueError\u001b[0m: Cannot create group in read only mode."
     ]
    }
   ],
   "source": [
    "# TODO: Set the path to the `.h5` file of the model to be loaded. debió ser guardado como model.save('')\n",
    "model_path = 'VGG_VOC0712Plus_SSD_512x512_ft_iter_160000.h5'\n",
    "\n",
    "# We need to create an SSDLoss object in order to pass that to the model loader.\n",
    "ssd_loss = SSDLoss(neg_pos_ratio=3, alpha=1.0)\n",
    "\n",
    "K.clear_session() # Clear previous models from memory.\n",
    "\n",
    "model = load_model(model_path, custom_objects={'AnchorBoxes': AnchorBoxes,\n",
    "                                               'L2Normalization': L2Normalization,\n",
    "                                               'compute_loss': ssd_loss.compute_loss})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Set up the data generators for the training\n",
    "\n",
    "The code cells below set up the data generators for the training and validation datasets to train the model. The settings below reproduce the original SSD training on Pascal VOC 2007 `trainval` plus 2012 `trainval` and validation on Pascal VOC 2007 `test`.\n",
    "\n",
    "The only thing you need to change here are the filepaths to the datasets on your local machine. Note that parsing the labels from the XML annotations files can take a while.\n",
    "\n",
    "Note that the generator provides two options to speed up the training. By default, it loads the individual images for a batch from disk. This has two disadvantages. First, for compressed image formats like JPG, this is a huge computational waste, because every image needs to be decompressed again and again every time it is being loaded. Second, the images on disk are likely not stored in a contiguous block of memory, which may also slow down the loading process. The first option that `DataGenerator` provides to deal with this is to load the entire dataset into memory, which reduces the access time for any image to a negligible amount, but of course this is only an option if you have enough free memory to hold the whole dataset. As a second option, `DataGenerator` provides the possibility to convert the dataset into a single HDF5 file. This HDF5 file stores the images as uncompressed arrays in a contiguous block of memory, which dramatically speeds up the loading time. It's not as good as having the images in memory, but it's a lot better than the default option of loading them from their compressed JPG state every time they are needed. Of course such an HDF5 dataset may require significantly more disk space than the compressed images (around 9 GB total for Pascal VOC 2007 `trainval` plus 2012 `trainval` and another 2.6 GB for 2007 `test`). You can later load these HDF5 datasets directly in the constructor.\n",
    "\n",
    "The original SSD implementation uses a batch size of 32 for the training. In case you run into GPU memory issues, reduce the batch size accordingly. You need at least 7 GB of free GPU memory to train an SSD300 with 20 object classes with a batch size of 32.\n",
    "\n",
    "The `DataGenerator` itself is fairly generic. I doesn't contain any data augmentation or bounding box encoding logic. Instead, you pass a list of image transformations and an encoder for the bounding boxes in the `transformations` and `label_encoder` arguments of the data generator's `generate()` method, and the data generator will then apply those given transformations and the encoding to the data. Everything here is preset already, but if you'd like to learn more about the data generator and its data augmentation capabilities, take a look at the detailed tutorial in [this](https://github.com/pierluigiferrari/data_generator_object_detection_2d) repository.\n",
    "\n",
    "The data augmentation settings defined further down reproduce the data augmentation pipeline of the original SSD training. The training generator receives an object `ssd_data_augmentation`, which is a transformation object that is itself composed of a whole chain of transformations that replicate the data augmentation procedure used to train the original Caffe implementation. The validation generator receives an object `resize`, which simply resizes the input images.\n",
    "\n",
    "An `SSDInputEncoder` object, `ssd_input_encoder`, is passed to both the training and validation generators. As explained above, it matches the ground truth labels to the model's anchor boxes and encodes the box coordinates into the format that the model needs.\n",
    "\n",
    "In order to train the model on a dataset other than Pascal VOC, either choose `DataGenerator`'s appropriate parser method that corresponds to your data format, or, if `DataGenerator` does not provide a suitable parser for your data format, you can write an additional parser and add it. Out of the box, `DataGenerator` can handle datasets that use the Pascal VOC format (use `parse_xml()`), the MS COCO format (use `parse_json()`) and a wide range of CSV formats (use `parse_csv()`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing image set 'trainval.txt': 100%|██████████| 11540/11540 [00:28<00:00, 406.84it/s]\n",
      "Processing image set 'trainval.txt': 100%|██████████| 11540/11540 [00:31<00:00, 368.34it/s]\n"
     ]
    }
   ],
   "source": [
    "# 1: Instantiate two `DataGenerator` objects: One for training, one for validation.\n",
    "\n",
    "# Optional: If you have enough memory, consider loading the images into memory for the reasons explained above.\n",
    "\n",
    "train_dataset = DataGenerator(load_images_into_memory=False, hdf5_dataset_path=None)\n",
    "val_dataset = DataGenerator(load_images_into_memory=False, hdf5_dataset_path=None)\n",
    "\n",
    "# 2: Parse the image and label lists for the training and validation datasets. This can take a while.\n",
    "\n",
    "# TODO: Set the paths to the datasets here.\n",
    "\n",
    "# The directories that contain the images.\n",
    "VOC_2007_images_dir      = '../VOCdevkit/VOC2007/JPEGImages/'\n",
    "VOC_2012_images_dir      = '../VOCdevkit/VOC2012/JPEGImages/'\n",
    "\n",
    "# The directories that contain the annotations.\n",
    "VOC_2007_annotations_dir      = '../VOCdevkit/VOC2007/Annotations/'\n",
    "VOC_2012_annotations_dir      = '../VOCdevkit/VOC2012/Annotations/'\n",
    "\n",
    "# The paths to the image sets.\n",
    "VOC_2007_train_image_set_filename    = '../VOCdevkit/VOC2007/ImageSets/Main/train.txt'\n",
    "VOC_2012_train_image_set_filename    = '../VOCdevkit/VOC2012/ImageSets/Main/train.txt'\n",
    "VOC_2007_val_image_set_filename      = '../VOCdevkit/VOC2007/ImageSets/Main/val.txt'\n",
    "VOC_2012_val_image_set_filename      = '../VOCdevkit/VOC2012/ImageSets/Main/val.txt'\n",
    "VOC_2007_trainval_image_set_filename = '../VOCdevkit/VOC2007/ImageSets/Main/trainval.txt'\n",
    "VOC_2012_trainval_image_set_filename = '../VOCdevkit/VOC2012/ImageSets/Main/trainval.txt'\n",
    "VOC_2007_test_image_set_filename     = '../VOCdevkit/VOC2007/ImageSets/Main/test.txt'\n",
    "\n",
    "# The XML parser needs to now what object class names to look for and in which order to map them to integers.\n",
    "classes = ['background',\n",
    "           'aeroplane', 'bicycle', 'bird', 'boat',\n",
    "           'bottle', 'bus', 'car', 'cat',\n",
    "           'chair', 'cow', 'diningtable', 'dog',\n",
    "           'horse', 'motorbike', 'person', 'pottedplant',\n",
    "           'sheep', 'sofa', 'train', 'tvmonitor']\n",
    "\n",
    "train_dataset.parse_xml(images_dirs=[VOC_2012_images_dir],\n",
    "                        image_set_filenames=[VOC_2012_trainval_image_set_filename],\n",
    "                        annotations_dirs=[VOC_2012_annotations_dir],\n",
    "                        classes=classes,\n",
    "                        include_classes='all',\n",
    "                        exclude_truncated=False,\n",
    "                        exclude_difficult=False,\n",
    "                        ret=False)\n",
    "\n",
    "val_dataset.parse_xml(images_dirs=[VOC_2012_images_dir],\n",
    "                      image_set_filenames=[VOC_2012_trainval_image_set_filename],\n",
    "                      annotations_dirs=[VOC_2012_annotations_dir],\n",
    "                      classes=classes,\n",
    "                      include_classes='all',\n",
    "                      exclude_truncated=False,\n",
    "                      exclude_difficult=True,\n",
    "                      ret=False)\n",
    "\n",
    "# Optional: Convert the dataset into an HDF5 dataset. This will require more disk space, but will\n",
    "# speed up the training. Doing this is not relevant in case you activated the `load_images_into_memory`\n",
    "# option in the constructor, because in that cas the images are in memory already anyway. If you don't\n",
    "# want to create HDF5 datasets, comment out the subsequent two function calls.\n",
    "\n",
    "#train_dataset.create_hdf5_dataset(file_path='dataset_pascal_voc_07+12_trainval.h5',\n",
    " #                                 resize=False,\n",
    "  #                                variable_image_size=True,\n",
    "   #                               verbose=True)\n",
    "\n",
    "#val_dataset.create_hdf5_dataset(file_path='dataset_pascal_voc_07_test.h5',\n",
    " #                               resize=False,\n",
    "  #                              variable_image_size=True,\n",
    "   #                             verbose=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of images in the training dataset:\t 11540\n",
      "Number of images in the validation dataset:\t 11540\n"
     ]
    }
   ],
   "source": [
    "# 3: Set the batch size.\n",
    "\n",
    "batch_size = 32 # Change the batch size if you like, or if you run into GPU memory issues.\n",
    "\n",
    "# 4: Set the image transformations for pre-processing and data augmentation options.\n",
    "\n",
    "# For the training generator:\n",
    "ssd_data_augmentation = SSDDataAugmentation(img_height=img_height,\n",
    "                                            img_width=img_width,\n",
    "                                            background=mean_color)\n",
    "\n",
    "# For the validation generator:\n",
    "convert_to_3_channels = ConvertTo3Channels()\n",
    "resize = Resize(height=img_height, width=img_width)\n",
    "\n",
    "# 5: Instantiate an encoder that can encode ground truth labels into the format needed by the SSD loss function.\n",
    "\n",
    "# The encoder constructor needs the spatial dimensions of the model's predictor layers to create the anchor boxes.\n",
    "predictor_sizes = [model.get_layer('conv4_3_norm_mbox_conf').output_shape[1:3],\n",
    "                   model.get_layer('fc7_mbox_conf').output_shape[1:3],\n",
    "                   model.get_layer('conv6_2_mbox_conf').output_shape[1:3],\n",
    "                   model.get_layer('conv7_2_mbox_conf').output_shape[1:3],\n",
    "                   model.get_layer('conv8_2_mbox_conf').output_shape[1:3],\n",
    "                   model.get_layer('conv9_2_mbox_conf').output_shape[1:3]]\n",
    "\n",
    "ssd_input_encoder = SSDInputEncoder(img_height=img_height,\n",
    "                                    img_width=img_width,\n",
    "                                    n_classes=n_classes,\n",
    "                                    predictor_sizes=predictor_sizes,\n",
    "                                    scales=scales,\n",
    "                                    aspect_ratios_per_layer=aspect_ratios,\n",
    "                                    two_boxes_for_ar1=two_boxes_for_ar1,\n",
    "                                    steps=steps,\n",
    "                                    offsets=offsets,\n",
    "                                    clip_boxes=clip_boxes,\n",
    "                                    variances=variances,\n",
    "                                    matching_type='multi',\n",
    "                                    pos_iou_threshold=0.5,\n",
    "                                    neg_iou_limit=0.5,\n",
    "                                    normalize_coords=normalize_coords)\n",
    "\n",
    "# 6: Create the generator handles that will be passed to Keras' `fit_generator()` function.\n",
    "\n",
    "train_generator = train_dataset.generate(batch_size=batch_size,\n",
    "                                         shuffle=True,\n",
    "                                         transformations=[ssd_data_augmentation],\n",
    "                                         label_encoder=ssd_input_encoder,\n",
    "                                         returns={'processed_images',\n",
    "                                                  'encoded_labels'},\n",
    "                                         keep_images_without_gt=False)\n",
    "\n",
    "val_generator = val_dataset.generate(batch_size=batch_size,\n",
    "                                     shuffle=False,\n",
    "                                     transformations=[convert_to_3_channels,\n",
    "                                                      resize],\n",
    "                                     label_encoder=ssd_input_encoder,\n",
    "                                     returns={'processed_images',\n",
    "                                              'encoded_labels'},\n",
    "                                     keep_images_without_gt=False)\n",
    "\n",
    "# Get the number of samples in the training and validations datasets.\n",
    "train_dataset_size = train_dataset.get_dataset_size()\n",
    "val_dataset_size   = val_dataset.get_dataset_size()\n",
    "\n",
    "print(\"Number of images in the training dataset:\\t{:>6}\".format(train_dataset_size))\n",
    "print(\"Number of images in the validation dataset:\\t{:>6}\".format(val_dataset_size))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Set the remaining training parameters\n",
    "\n",
    "We've already chosen an optimizer and set the batch size above, now let's set the remaining training parameters. I'll set one epoch to consist of 1,000 training steps. The next code cell defines a learning rate schedule that replicates the learning rate schedule of the original Caffe implementation for the training of the SSD300 Pascal VOC \"07+12\" model. That model was trained for 120,000 steps with a learning rate of 0.001 for the first 80,000 steps, 0.0001 for the next 20,000 steps, and 0.00001 for the last 20,000 steps. If you're training on a different dataset, define the learning rate schedule however you see fit.\n",
    "\n",
    "I'll set only a few essential Keras callbacks below, feel free to add more callbacks if you want TensorBoard summaries or whatever. We obviously need the learning rate scheduler and we want to save the best models during the training. It also makes sense to continuously stream our training history to a CSV log file after every epoch, because if we didn't do that, in case the training terminates with an exception at some point or if the kernel of this Jupyter notebook dies for some reason or anything like that happens, we would lose the entire history for the trained epochs. Finally, we'll also add a callback that makes sure that the training terminates if the loss becomes `NaN`. Depending on the optimizer you use, it can happen that the loss becomes `NaN` during the first iterations of the training. In later iterations it's less of a risk. For example, I've never seen a `NaN` loss when I trained SSD using an Adam optimizer, but I've seen a `NaN` loss a couple of times during the very first couple of hundred training steps of training a new model when I used an SGD optimizer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define a learning rate schedule.\n",
    "\n",
    "def lr_schedule(epoch):\n",
    "    if epoch < 80:\n",
    "        return 0.001\n",
    "    elif epoch < 100:\n",
    "        return 0.0001\n",
    "    else:\n",
    "        return 0.00001"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define model callbacks.\n",
    "\n",
    "# TODO: Set the filepath under which you want to save the model.\n",
    "model_checkpoint = ModelCheckpoint(filepath='ssd300_pascal_07+12_epoch-{epoch:02d}_loss-{loss:.4f}_val_loss-{val_loss:.4f}.h5',\n",
    "                                   monitor='val_loss',\n",
    "                                   verbose=1,\n",
    "                                   save_best_only=True,\n",
    "                                   save_weights_only=False,\n",
    "                                   mode='auto',\n",
    "                                   period=1)\n",
    "#model_checkpoint.best = \n",
    "\n",
    "csv_logger = CSVLogger(filename='ssd300_pascal_07+12_training_log.csv',\n",
    "                       separator=',',\n",
    "                       append=True)\n",
    "\n",
    "learning_rate_scheduler = LearningRateScheduler(schedule=lr_schedule,\n",
    "                                                verbose=1)\n",
    "\n",
    "terminate_on_nan = TerminateOnNaN()\n",
    "\n",
    "callbacks = [model_checkpoint,\n",
    "             csv_logger,\n",
    "             learning_rate_scheduler,\n",
    "             terminate_on_nan]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<generator object DataGenerator.generate at 0x7f6f479c47d8>"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#s\n",
    "train_generator"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Train"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In order to reproduce the training of the \"07+12\" model mentioned above, at 1,000 training steps per epoch you'd have to train for 120 epochs. That is going to take really long though, so you might not want to do all 120 epochs in one go and instead train only for a few epochs at a time. You can find a summary of a full training [here](https://github.com/pierluigiferrari/ssd_keras/blob/master/training_summaries/ssd300_pascal_07%2B12_training_summary.md).\n",
    "\n",
    "In order to only run a partial training and resume smoothly later on, there are a few things you should note:\n",
    "1. Always load the full model if you can, rather than building a new model and loading previously saved weights into it. Optimizers like SGD or Adam keep running averages of past gradient moments internally. If you always save and load full models when resuming a training, then the state of the optimizer is maintained and the training picks up exactly where it left off. If you build a new model and load weights into it, the optimizer is being initialized from scratch, which, especially in the case of Adam, leads to small but unnecessary setbacks every time you resume the training with previously saved weights.\n",
    "2. In order for the learning rate scheduler callback above to work properly, `fit_generator()` needs to know which epoch we're in, otherwise it will start with epoch 0 every time you resume the training. Set `initial_epoch` to be the next epoch of your training. Note that this parameter is zero-based, i.e. the first epoch is epoch 0. If you had trained for 10 epochs previously and now you'd want to resume the training from there, you'd set `initial_epoch = 10` (since epoch 10 is the eleventh epoch). Furthermore, set `final_epoch` to the last epoch you want to run. To stick with the previous example, if you had trained for 10 epochs previously and now you'd want to train for another 10 epochs, you'd set `initial_epoch = 10` and `final_epoch = 20`.\n",
    "3. In order for the model checkpoint callback above to work correctly after a kernel restart, set `model_checkpoint.best` to the best validation loss from the previous training. If you don't do this and a new `ModelCheckpoint` object is created after a kernel restart, that object obviously won't know what the last best validation loss was, so it will always save the weights of the first epoch of your new training and record that loss as its new best loss. This isn't super-important, I just wanted to mention it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1/120\n",
      "\n",
      "Epoch 00001: LearningRateScheduler setting learning rate to 0.001.\n"
     ]
    },
    {
     "ename": "FileNotFoundError",
     "evalue": "[Errno 2] No such file or directory: '../VOCdevkit/VOC2012/JPEGImages/2010_001385.png'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mFileNotFoundError\u001b[0m                         Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-8-9e051c003cc7>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m     10\u001b[0m                               \u001b[0mvalidation_data\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mval_generator\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     11\u001b[0m                               \u001b[0mvalidation_steps\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mceil\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mval_dataset_size\u001b[0m\u001b[0;34m/\u001b[0m\u001b[0mbatch_size\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 12\u001b[0;31m                               initial_epoch=initial_epoch)\n\u001b[0m",
      "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/keras/legacy/interfaces.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m     89\u001b[0m                 warnings.warn('Update your `' + object_name + '` call to the ' +\n\u001b[1;32m     90\u001b[0m                               'Keras 2 API: ' + signature, stacklevel=2)\n\u001b[0;32m---> 91\u001b[0;31m             \u001b[0;32mreturn\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     92\u001b[0m         \u001b[0mwrapper\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_original_function\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     93\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mwrapper\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/keras/engine/training.py\u001b[0m in \u001b[0;36mfit_generator\u001b[0;34m(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)\u001b[0m\n\u001b[1;32m   1416\u001b[0m             \u001b[0muse_multiprocessing\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0muse_multiprocessing\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1417\u001b[0m             \u001b[0mshuffle\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mshuffle\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1418\u001b[0;31m             initial_epoch=initial_epoch)\n\u001b[0m\u001b[1;32m   1419\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1420\u001b[0m     \u001b[0;34m@\u001b[0m\u001b[0minterfaces\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlegacy_generator_methods_support\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/keras/engine/training_generator.py\u001b[0m in \u001b[0;36mfit_generator\u001b[0;34m(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)\u001b[0m\n\u001b[1;32m    179\u001b[0m             \u001b[0mbatch_index\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    180\u001b[0m             \u001b[0;32mwhile\u001b[0m \u001b[0msteps_done\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0msteps_per_epoch\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 181\u001b[0;31m                 \u001b[0mgenerator_output\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moutput_generator\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    182\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    183\u001b[0m                 \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mhasattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgenerator_output\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'__len__'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/keras/utils/data_utils.py\u001b[0m in \u001b[0;36mget\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    707\u001b[0m                     \u001b[0;34m\"`use_multiprocessing=False, workers > 1`.\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    708\u001b[0m                     \"For more information see issue #1638.\")\n\u001b[0;32m--> 709\u001b[0;31m             \u001b[0msix\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreraise\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0msys\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexc_info\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/six.py\u001b[0m in \u001b[0;36mreraise\u001b[0;34m(tp, value, tb)\u001b[0m\n\u001b[1;32m    691\u001b[0m             \u001b[0;32mif\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__traceback__\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mtb\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    692\u001b[0m                 \u001b[0;32mraise\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwith_traceback\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 693\u001b[0;31m             \u001b[0;32mraise\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    694\u001b[0m         \u001b[0;32mfinally\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    695\u001b[0m             \u001b[0mvalue\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/keras/utils/data_utils.py\u001b[0m in \u001b[0;36mget\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    683\u001b[0m         \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    684\u001b[0m             \u001b[0;32mwhile\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mis_running\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 685\u001b[0;31m                 \u001b[0minputs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mqueue\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mblock\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    686\u001b[0m                 \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mqueue\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtask_done\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    687\u001b[0m                 \u001b[0;32mif\u001b[0m \u001b[0minputs\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.6/multiprocessing/pool.py\u001b[0m in \u001b[0;36mget\u001b[0;34m(self, timeout)\u001b[0m\n\u001b[1;32m    642\u001b[0m             \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_value\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    643\u001b[0m         \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 644\u001b[0;31m             \u001b[0;32mraise\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_value\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    645\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    646\u001b[0m     \u001b[0;32mdef\u001b[0m \u001b[0m_set\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mobj\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.6/multiprocessing/pool.py\u001b[0m in \u001b[0;36mworker\u001b[0;34m(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)\u001b[0m\n\u001b[1;32m    117\u001b[0m         \u001b[0mjob\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwds\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtask\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    118\u001b[0m         \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 119\u001b[0;31m             \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    120\u001b[0m         \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    121\u001b[0m             \u001b[0;32mif\u001b[0m \u001b[0mwrap_exception\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0mfunc\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0m_helper_reraises_exception\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/keras/utils/data_utils.py\u001b[0m in \u001b[0;36mnext_sample\u001b[0;34m(uid)\u001b[0m\n\u001b[1;32m    624\u001b[0m         \u001b[0mThe\u001b[0m \u001b[0mnext\u001b[0m \u001b[0mvalue\u001b[0m \u001b[0mof\u001b[0m \u001b[0mgenerator\u001b[0m\u001b[0;31m \u001b[0m\u001b[0;31m`\u001b[0m\u001b[0muid\u001b[0m\u001b[0;31m`\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    625\u001b[0m     \"\"\"\n\u001b[0;32m--> 626\u001b[0;31m     \u001b[0;32mreturn\u001b[0m \u001b[0msix\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0m_SHARED_SEQUENCES\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0muid\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    627\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    628\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/Desktop/Tesis/8.-Object_Detection/keras-ssd-master/data_generator/object_detection_2d_data_generator.py\u001b[0m in \u001b[0;36mgenerate\u001b[0;34m(self, batch_size, shuffle, transformations, label_encoder, returns, keep_images_without_gt, degenerate_box_handling)\u001b[0m\n\u001b[1;32m   1016\u001b[0m                 \u001b[0mbatch_filenames\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfilenames\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mcurrent\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0mcurrent\u001b[0m\u001b[0;34m+\u001b[0m\u001b[0mbatch_size\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1017\u001b[0m                 \u001b[0;32mfor\u001b[0m \u001b[0mfilename\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mbatch_filenames\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1018\u001b[0;31m                     \u001b[0;32mwith\u001b[0m \u001b[0mImage\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilename\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mimage\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   1019\u001b[0m                         \u001b[0mbatch_X\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mimage\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0muint8\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1020\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/PIL/Image.py\u001b[0m in \u001b[0;36mopen\u001b[0;34m(fp, mode)\u001b[0m\n\u001b[1;32m   2546\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   2547\u001b[0m     \u001b[0;32mif\u001b[0m \u001b[0mfilename\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2548\u001b[0;31m         \u001b[0mfp\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mbuiltins\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilename\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"rb\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   2549\u001b[0m         \u001b[0mexclusive_fp\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   2550\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: '../VOCdevkit/VOC2012/JPEGImages/2010_001385.png'"
     ]
    }
   ],
   "source": [
    "# If you're resuming a previous training, set `initial_epoch` and `final_epoch` accordingly.\n",
    "initial_epoch   = 0\n",
    "final_epoch     = 120\n",
    "steps_per_epoch = 1000\n",
    "\n",
    "history = model.fit_generator(generator=train_generator,\n",
    "                              steps_per_epoch=steps_per_epoch,\n",
    "                              epochs=final_epoch,\n",
    "                              callbacks=callbacks,\n",
    "                              validation_data=val_generator,\n",
    "                              validation_steps=ceil(val_dataset_size/batch_size),\n",
    "                              initial_epoch=initial_epoch)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Make predictions\n",
    "\n",
    "Now let's make some predictions on the validation dataset with the trained model. For convenience we'll use the validation generator that we've already set up above. Feel free to change the batch size.\n",
    "\n",
    "You can set the `shuffle` option to `False` if you would like to check the model's progress on the same image(s) over the course of the training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 1: Set the generator for the predictions.\n",
    "\n",
    "predict_generator = val_dataset.generate(batch_size=1,\n",
    "                                         shuffle=True,\n",
    "                                         transformations=[convert_to_3_channels,\n",
    "                                                          resize],\n",
    "                                         label_encoder=None,\n",
    "                                         returns={'processed_images',\n",
    "                                                  'filenames',\n",
    "                                                  'inverse_transform',\n",
    "                                                  'original_images',\n",
    "                                                  'original_labels'},\n",
    "                                         keep_images_without_gt=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Image: ../../datasets/VOCdevkit/VOC2007/JPEGImages/003819.jpg\n",
      "\n",
      "Ground truth boxes:\n",
      "\n",
      "[[ 12 146  52 386 264]\n",
      " [ 12  69 208 322 360]\n",
      " [ 15   1   1 221 235]]\n"
     ]
    }
   ],
   "source": [
    "# 2: Generate samples.\n",
    "\n",
    "batch_images, batch_filenames, batch_inverse_transforms, batch_original_images, batch_original_labels = next(predict_generator)\n",
    "\n",
    "i = 0 # Which batch item to look at\n",
    "\n",
    "print(\"Image:\", batch_filenames[i])\n",
    "print()\n",
    "print(\"Ground truth boxes:\\n\")\n",
    "print(np.array(batch_original_labels[i]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 3: Make predictions.\n",
    "\n",
    "y_pred = model.predict(batch_images)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's decode the raw predictions in `y_pred`.\n",
    "\n",
    "Had we created the model in 'inference' or 'inference_fast' mode, then the model's final layer would be a `DecodeDetections` layer and `y_pred` would already contain the decoded predictions, but since we created the model in 'training' mode, the model outputs raw predictions that still need to be decoded and filtered. This is what the `decode_detections()` function is for. It does exactly what the `DecodeDetections` layer would do, but using Numpy instead of TensorFlow (i.e. on the CPU instead of the GPU).\n",
    "\n",
    "`decode_detections()` with default argument values follows the procedure of the original SSD implementation: First, a very low confidence threshold of 0.01 is applied to filter out the majority of the predicted boxes, then greedy non-maximum suppression is performed per class with an intersection-over-union threshold of 0.45, and out of what is left after that, the top 200 highest confidence boxes are returned. Those settings are for precision-recall scoring purposes though. In order to get some usable final predictions, we'll set the confidence threshold much higher, e.g. to 0.5, since we're only interested in the very confident predictions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 4: Decode the raw predictions in `y_pred`.\n",
    "\n",
    "y_pred_decoded = decode_detections(y_pred,\n",
    "                                   confidence_thresh=0.5,\n",
    "                                   iou_threshold=0.4,\n",
    "                                   top_k=200,\n",
    "                                   normalize_coords=normalize_coords,\n",
    "                                   img_height=img_height,\n",
    "                                   img_width=img_width)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We made the predictions on the resized images, but we'd like to visualize the outcome on the original input images, so we'll convert the coordinates accordingly. Don't worry about that opaque `apply_inverse_transforms()` function below, in this simple case it just aplies `(* original_image_size / resized_image_size)` to the box coordinates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Predicted boxes:\n",
      "\n",
      "   class   conf xmin   ymin   xmax   ymax\n",
      "[[  9.     0.8  364.79   5.24 496.51 203.59]\n",
      " [ 12.     1.   115.44  50.   384.22 330.76]\n",
      " [ 12.     0.86  68.99 212.78 331.63 355.72]\n",
      " [ 15.     0.95   2.62  20.18 235.83 253.07]]\n"
     ]
    }
   ],
   "source": [
    "# 5: Convert the predictions for the original image.\n",
    "\n",
    "y_pred_decoded_inv = apply_inverse_transforms(y_pred_decoded, batch_inverse_transforms)\n",
    "\n",
    "np.set_printoptions(precision=2, suppress=True, linewidth=90)\n",
    "print(\"Predicted boxes:\\n\")\n",
    "print('   class   conf xmin   ymin   xmax   ymax')\n",
    "print(y_pred_decoded_inv[i])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, let's draw the predicted boxes onto the image. Each predicted box says its confidence next to the category name. The ground truth boxes are also drawn onto the image in green for comparison."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA48AAAK0CAYAAACjowjVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzsvXnYLUdVLv5WT3vvbzhTQiYSyQQEAkiQMCPmAhJGFUEU\nnitERREREYSrEARvkEdAo8DFG9QA8hMEblBwYggCIYwJhMyEkORkPvPwDfvbQ3dX/f6otVZVV/f+\nThII5wD1Pk+yz7e7d3V1dXUNa73rXcoYg4iIiIiIiIiIiIiIiIiI9ZAc7ApERERERERERERERERE\nHPqIm8eIiIiIiIiIiIiIiIiIAyI72BVQudqOCkce7Hrcq8iww5TmqINdjYiIiIiIiIiIiIiIiHsK\ndbBjHpVSBm8+qFW49/FmwBijDnY1IiIiIiIiIiIiIiIi7ikibTUiIiIiIiIiIiIiIiLigIibx4iI\niIiIiIiIiIiIiIgDIm4eIyIiIiIiIiIiIiIiIg6IuHmMiIiIiIiIiIiIiIiIOCB+YjaPeZIf7CpE\nRERERERERERERET8yOKgp+rowhde/AXctO8m7BzuxG894rdQpAU+cvVH8MpPvRKTegIAeMWjXoHf\nO/33cPym43Hb0m34wBUfwNu+/DbUpgYAbP2DrfinK/8JWwZb8IJTX4Ab9t6Ax5z/GPzmab+J1zz2\nNThh8wlYK9dw9c6r8cKPvxB3rNwBAHj6yU/HOWecg4cc8RAsTZZwwbUX4LUXvhZr5RoA4P2/8H4c\nu+FYfOyaj+ENT3wDNg8244s3fxEv/feXYudw58FpsIiIiIiIiIiIQwRqUGzHuPzxTsP2w0Y/32FG\n05j2LeKg45DcPALA8x78PHz0mo/iie9/Ik7ecjLOf875GJZDvPozr8abnvQmnPXws/Cqz7wKl2+/\nHA86/EE471nnoZ/18adf+FMp45WPfiXO/dq5eOz5j0WWZHjE0Y/Aec86D7/xyd/ARbdchA29DXj0\nfR8t5z/0iIfi337t3/DuS96NF/3Li3DC5hPw3me9F4vFIn79E78u551+zOnYNdyFZ374mVjsLeLD\nz/0w/vKpf9k4JyIiIiIiIiLiJxLj8kiYjx7sWvx4Qb0gbsYjDgnca5tHpdSZAN4JIAXwD8aYv7g7\nv9872ouX/cfLoI3Gdbuvw9mfPxvvevq7cPbnz8brHv86PPejz8VnbvwMAODm/TfLcX/zeOkdl+LP\nLvoz+fsXT/lFDKdDfOK6T2BlugIAuHrn1XL8tY97LS7bdhle/ZlXAwC+u+e7+P1P/T7+9QX/irO/\ncDZuXboVADCpJ3jJJ1+CaT0FAJz3rfPwqke/6m63UURERERERERERERExI8K7pXNo1IqBfAeAE8F\ncDuAS5VS/2aMufaulnHJHZdAGy1/f+W2r6Cf9fHIYx6JuXwOH/+Vj8PAyPFUpRjkAxw+dzh2r+22\nZdx5SaPMC2+8EDftuwlb/2ArLrzpQnx+6+fxL9/5F+wZ7QEAnHrEqfj81s83fnPRzRchUQkefJ8H\ny+bxut3XycYRAO5cuRNHLkSDUERERERERERERETEjy/uLc/jowDcYIy5CQCUUh8B8AsA7vLm8UB4\n/v97Pq7fc33r+72jvfLv4XTYODYsh3jk3z8Sjz/u8XjKiU/Byx75Mrz9qW/Hkz/4ZFy27bK7fG1/\n4wgAxhgk6idGeygiIiIiIiIiIiIi4icQ99bm8b4AbvP+vh3Ao2ec24nTjzkdiUrE+/i44x6HcTXG\n5dsvx6gc4cTNJ+JTN3zqbldMG42Lb70YF996Md70xTfh2pdfixc+9IW4bNtluGbnNfjZ+/1s4/wn\nHf8kaKNxzc5r7va1fOR5j9ykCoDdcCr7T/GfKjlbwZjG6VDeyeEx9yvV8MY2fufB+OdTWeF5SvnH\n7HdaaylAqmMQ/N5A8++ogiqhe9ZG6teuVePiUpYxwSGv/eBOa9ZBodWoSZKgqqyYUpokjWM+wvsx\nRss1Ifds5LrufprnJCqBDo75TVxreyxNbV20NlCtsrw21XVQU0XXSeVm09R+t7gwBwDI0sS1jdyX\nu0mqAhJqj0bfCdrdAEiC9nLPxjT6Z2cBcNLOSik5qoMHbJ+r94zdrcJod64KvrPnmsZ3FRSmpTX0\njEbjVq3CZ+b3W3erpvE7AyPHsiS159PvjGq/t/DeSa2DYx3NLf3ba1MTtJFSCq0Xw7urVr82zSuE\np8t73lGXWf0b3nOCN4a0q+Vft9mWaB1p/yxJvDYNauj3o8ZdmRnX8Z9P61hYI9cuJug1jesZI++P\nj4RfFtV8940x6Pes8vd0WgIAeFjVtTNCGmbduAsBdKymH7ieqKR8RS/6wtwAwzVrPK1rGZT4bKmf\nrrksvopCfzAAAGzZvJmu58YeaSfqy+O1IdbWrJhcr9dvnKNr05gX7P+9Z0dlJPQe+e85/6qmZ5D3\nbdlpngPUNs1HlviXQaL4vmrs27ufviwAABs2bqbfa6TK3ttwuAoAWBuO6NxUnqtrZbqEac+Xfh9p\nHTOm8V43qt0xPzeOS/nt37T6sjGd4wjW+a5VD57a+J323j/um9ro9vn0j7qukabpuve0HrbimfgH\n3IQ/x3fucRk/yHIORSRJOmPwb47VBmi9f/4AbZRqnO/O0VA8DtER5Y3S/pAkmLE28Kcq6bcd/a81\njivVsZboWN92/hWU5s1VbqzuGju9cTxY38qI5M0hMj54bZzKeO+vS2yBvL7QMq94te9wPoVDA5fp\nr9vd8/SP6eYPvTFXyjLNeUkpJcdXh6u7jTH3aVUowEETzFFK/TaA3551/LC5w/CeZ7wH7/zGO3Hi\n5hNxzhnn4L3fei+WJ8t465ffirc++a0wMPjcTZ9DlmR46BEPxWlHn4Y//twfz7zmcx74HJy4+UR8\n6ZYvYddwF37mmJ/BcRuPw7W7rEP0HV99By77nctw7tPOxXu/+V4cv+l4vPvp78aHrvwQblu+bWa5\ndwWbthzL9w0AKMsSWWabP9zAKaVko8Pf5bldcNR1jaqqAEAGaf59mqaiNsvf5XkhdeCyKupbSZKg\nru35YV3yPEc1sQvvomePra7aCdbUGnmeUX10oy6AwWQyaZRZFLYO0+lUruc2el5Hp++4LK21nM/1\nyijlSlmWch6fw22EpP2Cz88vYM+ePfTv+cZ1auMWOWFZ5WTcGiR4E11Vlfybz+Eye70eplWzbdM0\nlXoNaTOzuLgIABiPx8h5cKCBJFH2d0lqsLK6r3EdGFu/XrEIGLsIPWyTbecnPO5hAIAjNi8CE9ow\n0wKySBRA9zimjtCnhZkYBxr/TqRdenlzuNDUDwGNPKPnr3kRUXq/tujTOVmWSR8c0eaOxjKUlYZJ\nuJ/WjXsej8eg/TEy+m5C7ViXFQzs+dxvd5oCt91xOwDgmu9cS2Xa3yttoDLXXwBgUto652mKzPAC\nzZZZQtNnJccOn1sAABQlbVbzBNOpvXbK74dSqGkwX10bN45VlQZ44x70V6219JuS6qVS1w9V3Zwg\n/LGja/EqzzPY6CilpPxwPEmSRH6XBe+Av0jkc9I0bfQhwC3+/Y0bn+PXuXWM/u73+1K/JGgr/31K\nZbI38h23u/IWDlx+OOamib9g5udj+/e0rrzFMr3nbI3RBnO8aUq4TGAw12vUOc9tW02nUzzwpKMB\nALfduh0AMF6jMWF1in5hN27VdExlUn0TA0Vj+RL1o6mh91Fl6BW2/DyxdX7M6Q/Bty79KgBg/9Ce\npwpbpywrMD9ny1pZWrbXM/b3kzrF8afY8eNFv/orAIDlZcvm0TAyF6ixrcN3LrsEl116BQDgxJMe\nCADo9W1brS2PJUWWpnG+MnZu6OUZyrF9rv25DbbMlN6ZyRgJLXhWp/Z+jnrgKQCAjccchWpKZfA7\nkGgkqX0Xufv1M7txNnoJH/nYJ+1p/Z8CADzt2c+37WBWsZDYOe1bl3wFAHDpt660xwaLGPStES7n\nOZje+7Gu3VxDqOu6NVc
      "text/plain": [
       "<matplotlib.figure.Figure at 0x7f83f41e2cc0>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 5: Draw the predicted boxes onto the image\n",
    "\n",
    "# Set the colors for the bounding boxes\n",
    "colors = plt.cm.hsv(np.linspace(0, 1, n_classes+1)).tolist()\n",
    "classes = ['background',\n",
    "           'aeroplane', 'bicycle', 'bird', 'boat',\n",
    "           'bottle', 'bus', 'car', 'cat',\n",
    "           'chair', 'cow', 'diningtable', 'dog',\n",
    "           'horse', 'motorbike', 'person', 'pottedplant',\n",
    "           'sheep', 'sofa', 'train', 'tvmonitor']\n",
    "\n",
    "plt.figure(figsize=(20,12))\n",
    "plt.imshow(batch_original_images[i])\n",
    "\n",
    "current_axis = plt.gca()\n",
    "\n",
    "for box in batch_original_labels[i]:\n",
    "    xmin = box[1]\n",
    "    ymin = box[2]\n",
    "    xmax = box[3]\n",
    "    ymax = box[4]\n",
    "    label = '{}'.format(classes[int(box[0])])\n",
    "    current_axis.add_patch(plt.Rectangle((xmin, ymin), xmax-xmin, ymax-ymin, color='green', fill=False, linewidth=2))  \n",
    "    current_axis.text(xmin, ymin, label, size='x-large', color='white', bbox={'facecolor':'green', 'alpha':1.0})\n",
    "\n",
    "for box in y_pred_decoded_inv[i]:\n",
    "    xmin = box[2]\n",
    "    ymin = box[3]\n",
    "    xmax = box[4]\n",
    "    ymax = box[5]\n",
    "    color = colors[int(box[0])]\n",
    "    label = '{}: {:.2f}'.format(classes[int(box[0])], box[1])\n",
    "    current_axis.add_patch(plt.Rectangle((xmin, ymin), xmax-xmin, ymax-ymin, color=color, fill=False, linewidth=2))  \n",
    "    current_axis.text(xmin, ymin, label, size='x-large', color='white', bbox={'facecolor':color, 'alpha':1.0})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  },
  "widgets": {
   "state": {},
   "version": "1.1.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}