This commit is contained in:
dl-desktop
2020-02-06 16:47:03 -03:00
parent 6328265287
commit b586f22bf0
318 changed files with 25111 additions and 664 deletions

4
ssd_keras-master/.gitignore vendored Executable file
View File

@@ -0,0 +1,4 @@
*.jpg
*.jpeg
*.weights
*.h5

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,22 @@
# Contributing Guidelines
---
Contributions to this repository are welcome, but before you create a pull request, consider the following guidelines:
1. The To-do list in the README of this repository defines the main topics for which contributions are welcome. If you want to contribute, ideally contribute to one of the topics listed there.
2. If you'd like to contribute features that are not mentioned on the to-do list in the README, make sure to explain why your proposed change adds value, i.e. what relevant use case it solves. The benefit of any new feature will be compared against the cost of maintaining it and your contribution will be accepter or rejected based on this trade-off.
3. One pull request should be about one specific feature or improvement, i.e. it should not contain multiple unrelated changes. If you want to contribute multiple features and/or improvements, create a separate pull request for every individual feature or improvement.
3. When you create a pull request, make sure to explain properly
* why your propsed change adds value, i.e. what problem or use case it solves,
* all the API changes it will introduce, if any,
* all behavioral changes in any existing parts of the project it will introduce, if any.
4. This should go without saying, but you are responsible for updating any parts of the code or the tutorial notebooks that are affected by your introduced changes.
5. Any submitted code must conform to the coding standards and style of this repository. There is no formal guide for coding standards and style, but here are a few things to note:
* Any new modules, classes or functions must provide proper docstrings unless they are trivial. These docstrings must have sections for Arguments, Returns, and Raises (if applicable). For every argument of a function, the docstring must explain precisely what the argument does, what data type it expects, whether or not it is optional, and any requirements for the range of values it expects. The same goes for the returns. Use existing docstrings as templates.
* Naming:
* `ClassNames` consist of capitalized words without underscores.
* `module_names.py` consist of lower case words connected with underscores.
* `function_names` consist of lower case words connected with underscores.
* `variable_names` consist of lower case words connected with underscores.
* All module, class, function, and variable names must be descriptive in order to meet the goal that all code should always be as self-explanatory as possible. A longer and descriptive name is always preferable over a shorter and non-descriptive name. Abbreviations are generally to be avoided unless the full words would really make the name too long.
* More in-line comments are better than fewer in-line comments and all comments should be precise and succinct.

View File

@@ -0,0 +1,29 @@
### If you open a GitHub issue, here is the policy:
Your issue must be about one of the following:
1. a bug,
2. a feature request,
3. a documentation issue, or
4. a question that is **specific to this SSD implementation**.
You will only get help if you adhere to the following guidelines:
* Before you open an issue, search the open **and closed** issues first. Your problem/question might already have been solved/answered before.
* If you're getting unexpected behavior from code I wrote, open an issue and I'll try to help. If you're getting unexpected behavior from code **you** wrote, you'll have to fix it yourself. E.g. if you made a ton of changes to the code or the tutorials and now it doesn't work anymore, that's your own problem. I don't want to spend my time debugging your code.
* Make sure you're using the latest master. If you're 30 commits behind and have a problem, the only answer you'll likely get is to pull the latest master and try again.
* Read the documentation. All of it. If the answer to your problem/question can be found in the documentation, you might not get an answer, because, seriously, you could really have figured this out yourself.
* If you're asking a question, it must be specific to this SSD implementation. General deep learning or object detection questions will likely get closed without an answer. E.g. a question like "How do I get the mAP of an SSD for my own dataset?" has nothing to do with this particular SSD implementation, because computing the mAP works the same way for any object detection model. You should ask such a question in an appropriate forum or on the [Data Science section of StackOverflow](https://datascience.stackexchange.com/) instead.
* If you get an error:
* Provide the full stack trace of the error you're getting, not just the error message itself.
* Make sure any code you post is properly formatted as such.
* Provide any useful information about your environment, e.g.:
* Operating System
* Which commit of this repository you're on
* Keras version
* TensorFlow version
* Provide a minimal reproducible example, i.e. post code and explain clearly how you ended up with this error.
* Provide any useful information about your specific use case and parameters:
* What model are you trying to use/train?
* Describe the dataset you're using.
* List the values of any parameters you changed that might be relevant.

View File

@@ -0,0 +1,176 @@
Copyright 2018 Pierluigi Ferrari.
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.

View File

@@ -0,0 +1,283 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Thu May 16 16:09:31 2019
@author: dlsaavedra
"""
from keras.optimizers import Adam, SGD
from keras.callbacks import ModelCheckpoint, LearningRateScheduler, TerminateOnNaN, CSVLogger
from keras import backend as K
from keras.models import load_model
from math import ceil
import numpy as np
from matplotlib import pyplot as plt
from models.keras_ssd512 import ssd_512
from models.keras_ssd300 import ssd_300
from keras_loss_function.keras_ssd_loss import SSDLoss
from keras_layers.keras_layer_AnchorBoxes import AnchorBoxes
from keras_layers.keras_layer_DecodeDetections import DecodeDetections
from keras_layers.keras_layer_DecodeDetectionsFast import DecodeDetectionsFast
from keras_layers.keras_layer_L2Normalization import L2Normalization
from ssd_encoder_decoder.ssd_input_encoder import SSDInputEncoder
from ssd_encoder_decoder.ssd_output_decoder import decode_detections, decode_detections_fast
from data_generator.object_detection_2d_data_generator import DataGenerator
from data_generator.object_detection_2d_geometric_ops import Resize
from data_generator.object_detection_2d_photometric_ops import ConvertTo3Channels
from data_generator.data_augmentation_chain_original_ssd import SSDDataAugmentation
from data_generator.object_detection_2d_misc_utils import apply_inverse_transforms
#%%
img_height = 300 # Height of the model input images
img_width = 300 # Width of the model input images
img_channels = 3 # Number of color channels of the model input images
mean_color = [123, 117, 104] # The per-channel mean of the images in the dataset. Do not change this value if you're using any of the pre-trained weights.
swap_channels = [2, 1, 0] # The color channel order in the original SSD is BGR, so we'll have the model reverse the color channel order of the input images.
n_classes = 20 # Number of positive classes, e.g. 20 for Pascal VOC, 80 for MS COCO
scales_pascal = [0.1, 0.2, 0.37, 0.54, 0.71, 0.88, 1.05] # The anchor box scaling factors used in the original SSD300 for the Pascal VOC datasets
scales_coco = [0.07, 0.15, 0.33, 0.51, 0.69, 0.87, 1.05] # The anchor box scaling factors used in the original SSD300 for the MS COCO datasets
scales = scales_pascal
aspect_ratios = [[1.0, 2.0, 0.5],
[1.0, 2.0, 0.5, 3.0, 1.0/3.0],
[1.0, 2.0, 0.5, 3.0, 1.0/3.0],
[1.0, 2.0, 0.5, 3.0, 1.0/3.0],
[1.0, 2.0, 0.5],
[1.0, 2.0, 0.5]] # The anchor box aspect ratios used in the original SSD300; the order matters
two_boxes_for_ar1 = True
steps = [8, 16, 32, 64, 100, 300] # The space between two adjacent anchor box center points for each predictor layer.
offsets = [0.5, 0.5, 0.5, 0.5, 0.5, 0.5] # The offsets of the first anchor box center points from the top and left borders of the image as a fraction of the step size for each predictor layer.
clip_boxes = False # Whether or not to clip the anchor boxes to lie entirely within the image boundaries
variances = [0.1, 0.1, 0.2, 0.2] # The variances by which the encoded target coordinates are divided as in the original implementation
normalize_coords = True
K.clear_session() # Clear previous models from memory.
model = ssd_300(image_size=(img_height, img_width, img_channels),
n_classes=n_classes,
mode='training',
l2_regularization=0.0005,
scales=scales,
aspect_ratios_per_layer=aspect_ratios,
two_boxes_for_ar1=two_boxes_for_ar1,
steps=steps,
offsets=offsets,
clip_boxes=clip_boxes,
variances=variances,
normalize_coords=normalize_coords,
subtract_mean=mean_color,
swap_channels=swap_channels)
#%%
# 2: Load some weights into the model.
# TODO: Set the path to the weights you want to load.
#weights_path = 'VGG_VOC0712Plus_SSD_300x300_ft_iter_160000.h5'
weights_path = 'VGG_ILSVRC_16_layers_fc_reduced.h5'
model.load_weights(weights_path, by_name=True)
# 3: Instantiate an optimizer and the SSD loss function and compile the model.
# If you want to follow the original Caffe implementation, use the preset SGD
# optimizer, otherwise I'd recommend the commented-out Adam optimizer.
#adam = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
sgd = SGD(lr=0.001, momentum=0.9, decay=0.0, nesterov=False)
ssd_loss = SSDLoss(neg_pos_ratio=3, alpha=1.0)
model.compile(optimizer=sgd, loss=ssd_loss.compute_loss)
model.summary()
#%%
# 1: Instantiate two `DataGenerator` objects: One for training, one for validation.
# Optional: If you have enough memory, consider loading the images into memory for the reasons explained above.
train_dataset = DataGenerator(load_images_into_memory=False, hdf5_dataset_path=None)
val_dataset = DataGenerator(load_images_into_memory=False, hdf5_dataset_path=None)
# 2: Parse the image and label lists for the training and validation datasets. This can take a while.
# TODO: Set the paths to the datasets here.
# The directories that contain the images.
VOC_2007_images_dir = '../VOCdevkit/VOC2007/JPEGImages/'
VOC_2012_images_dir = '../VOCdevkit/VOC2012/JPEGImages/'
# The directories that contain the annotations.
VOC_2007_annotations_dir = '../VOCdevkit/VOC2007/Annotations/'
VOC_2012_annotations_dir = '../VOCdevkit/VOC2012/Annotations/'
# The paths to the image sets.
VOC_2007_train_image_set_filename = '../VOCdevkit/VOC2007/ImageSets/Main/train.txt'
VOC_2012_train_image_set_filename = '../VOCdevkit/VOC2012/ImageSets/Main/train.txt'
VOC_2007_val_image_set_filename = '../VOCdevkit/VOC2007/ImageSets/Main/val.txt'
VOC_2012_val_image_set_filename = '../VOCdevkit/VOC2012/ImageSets/Main/val.txt'
VOC_2007_trainval_image_set_filename = '../VOCdevkit/VOC2007/ImageSets/Main/trainval.txt'
VOC_2012_trainval_image_set_filename = '../VOCdevkit/VOC2012/ImageSets/Main/trainval.txt'
VOC_2007_test_image_set_filename = '../VOCdevkit/VOC2007/ImageSets/Main/test.txt'
# The XML parser needs to now what object class names to look for and in which order to map them to integers.
classes = ['background',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat',
'chair', 'cow', 'diningtable', 'dog',
'horse', 'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
classes = ['background', 'Gun', 'Knife', 'Razor', 'Shuriken']
train_dataset.parse_xml(images_dirs= ['/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/images'],
image_set_filenames=["/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/train.txt"],
annotations_dirs=["/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/anns"],
classes=classes,
include_classes='all',
exclude_truncated=False,
exclude_difficult=False,
ret=False)
val_dataset.parse_xml(images_dirs= ['/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/images'],
image_set_filenames=["/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/train.txt"],
annotations_dirs=["/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/anns"],
classes=classes,
include_classes='all',
exclude_truncated=False,
exclude_difficult=False,
ret=False)
#train_dataset.parse_xml(images_dirs=[VOC_2012_images_dir],
# image_set_filenames=[VOC_2012_trainval_image_set_filename],
# annotations_dirs=[VOC_2012_annotations_dir],
# classes=classes,
# include_classes='all',
# exclude_truncated=False,
# exclude_difficult=False,
# ret=False)
#
#val_dataset.parse_xml(images_dirs=[VOC_2012_images_dir],
# image_set_filenames=[VOC_2012_trainval_image_set_filename],
# annotations_dirs=[VOC_2012_annotations_dir],
# classes=classes,
# include_classes='all',
# exclude_truncated=False,
# exclude_difficult=True,
# ret=False)
#%%
# 3: Set the batch size.
batch_size = 32 # Change the batch size if you like, or if you run into GPU memory issues.
# 4: Set the image transformations for pre-processing and data augmentation options.
# For the training generator:
ssd_data_augmentation = SSDDataAugmentation(img_height=img_height,
img_width=img_width,
background=mean_color)
# For the validation generator:
convert_to_3_channels = ConvertTo3Channels()
resize = Resize(height=img_height, width=img_width)
# 5: Instantiate an encoder that can encode ground truth labels into the format needed by the SSD loss function.
# The encoder constructor needs the spatial dimensions of the model's predictor layers to create the anchor boxes.
predictor_sizes = [model.get_layer('conv4_3_norm_mbox_conf').output_shape[1:3],
model.get_layer('fc7_mbox_conf').output_shape[1:3],
model.get_layer('conv6_2_mbox_conf').output_shape[1:3],
model.get_layer('conv7_2_mbox_conf').output_shape[1:3],
model.get_layer('conv8_2_mbox_conf').output_shape[1:3],
model.get_layer('conv9_2_mbox_conf').output_shape[1:3]]
ssd_input_encoder = SSDInputEncoder(img_height=img_height,
img_width=img_width,
n_classes=n_classes,
predictor_sizes=predictor_sizes,
scales=scales,
aspect_ratios_per_layer=aspect_ratios,
two_boxes_for_ar1=two_boxes_for_ar1,
steps=steps,
offsets=offsets,
clip_boxes=clip_boxes,
variances=variances,
matching_type='multi',
pos_iou_threshold=0.5,
neg_iou_limit=0.5,
normalize_coords=normalize_coords)
# 6: Create the generator handles that will be passed to Keras' `fit_generator()` function.
train_generator = train_dataset.generate(batch_size=batch_size,
shuffle=True,
transformations=[ssd_data_augmentation],
label_encoder=ssd_input_encoder,
returns={'processed_images',
'encoded_labels'},
keep_images_without_gt=False)
val_generator = val_dataset.generate(batch_size=batch_size,
shuffle=False,
transformations=[convert_to_3_channels,
resize],
label_encoder=ssd_input_encoder,
returns={'processed_images',
'encoded_labels'},
keep_images_without_gt=False)
# Get the number of samples in the training and validations datasets.
train_dataset_size = train_dataset.get_dataset_size()
val_dataset_size = val_dataset.get_dataset_size()
print("Number of images in the training dataset:\t{:>6}".format(train_dataset_size))
print("Number of images in the validation dataset:\t{:>6}".format(val_dataset_size))
#%%
def lr_schedule(epoch):
if epoch < 80:
return 0.001
elif epoch < 100:
return 0.0001
else:
return 0.00001
# Define model callbacks.
# TODO: Set the filepath under which you want to save the model.
model_checkpoint = ModelCheckpoint(filepath='ssd300_pascal_07+12_epoch-{epoch:02d}_loss-{loss:.4f}_val_loss-{val_loss:.4f}.h5',
monitor='val_loss',
verbose=1,
save_best_only=True,
save_weights_only=False,
mode='auto',
period=1)
#model_checkpoint.best =
csv_logger = CSVLogger(filename='ssd300_pascal_07+12_training_log.csv',
separator=',',
append=True)
learning_rate_scheduler = LearningRateScheduler(schedule=lr_schedule,
verbose=1)
terminate_on_nan = TerminateOnNaN()
callbacks = [model_checkpoint,
csv_logger,
learning_rate_scheduler,
terminate_on_nan]
#%%
initial_epoch = 0
final_epoch = 120
steps_per_epoch = 1000
history = model.fit_generator(generator=train_generator,
steps_per_epoch=steps_per_epoch,
epochs=final_epoch,
callbacks=callbacks,
validation_data=val_generator,
validation_steps=ceil(val_dataset_size/batch_size),
initial_epoch=initial_epoch)

266
ssd_keras-master/README.md Executable file
View File

@@ -0,0 +1,266 @@
## SSD: Single-Shot MultiBox Detector implementation in Keras
---
### Contents
1. [Overview](#overview)
2. [Performance](#performance)
3. [Examples](#examples)
4. [Dependencies](#dependencies)
5. [How to use it](#how-to-use-it)
6. [Download the convolutionalized VGG-16 weights](#download-the-convolutionalized-vgg-16-weights)
7. [Download the original trained model weights](#download-the-original-trained-model-weights)
8. [How to fine-tune one of the trained models on your own dataset](#how-to-fine-tune-one-of-the-trained-models-on-your-own-dataset)
9. [ToDo](#todo)
10. [Important notes](#important-notes)
11. [Terminology](#terminology)
### Overview
This is a Keras port of the SSD model architecture introduced by Wei Liu et al. in the paper [SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325).
Ports of the trained weights of all the original models are provided below. This implementation is accurate, meaning that both the ported weights and models trained from scratch produce the same mAP values as the respective models of the original Caffe implementation (see performance section below).
The main goal of this project is to create an SSD implementation that is well documented for those who are interested in a low-level understanding of the model. The provided tutorials, documentation and detailed comments hopefully make it a bit easier to dig into the code and adapt or build upon the model than with most other implementations out there (Keras or otherwise) that provide little to no documentation and comments.
The repository currently provides the following network architectures:
* SSD300: [`keras_ssd300.py`](models/keras_ssd300.py)
* SSD512: [`keras_ssd512.py`](models/keras_ssd512.py)
* SSD7: [`keras_ssd7.py`](models/keras_ssd7.py) - a smaller 7-layer version that can be trained from scratch relatively quickly even on a mid-tier GPU, yet is capable enough for less complex object detection tasks and testing. You're obviously not going to get state-of-the-art results with that one, but it's fast.
If you would like to use one of the provided trained models for transfer learning (i.e. fine-tune one of the trained models on your own dataset), there is a [Jupyter notebook tutorial](weight_sampling_tutorial.ipynb) that helps you sub-sample the trained weights so that they are compatible with your dataset, see further below.
If you would like to build an SSD with your own base network architecture, you can use [`keras_ssd7.py`](models/keras_ssd7.py) as a template, it provides documentation and comments to help you.
### Performance
Here are the mAP evaluation results of the ported weights and below that the evaluation results of a model trained from scratch using this implementation. All models were evaluated using the official Pascal VOC test server (for 2012 `test`) or the official Pascal VOC Matlab evaluation script (for 2007 `test`). In all cases the results match (or slightly surpass) those of the original Caffe models. Download links to all ported weights are available further below.
<table width="70%">
<tr>
<td></td>
<td colspan=3 align=center>Mean Average Precision</td>
</tr>
<tr>
<td>evaluated on</td>
<td colspan=2 align=center>VOC2007 test</td>
<td align=center>VOC2012 test</td>
</tr>
<tr>
<td>trained on<br>IoU rule</td>
<td align=center width="25%">07+12<br>0.5</td>
<td align=center width="25%">07+12+COCO<br>0.5</td>
<td align=center width="25%">07++12+COCO<br>0.5</td>
</tr>
<tr>
<td><b>SSD300</td>
<td align=center><b>77.5</td>
<td align=center><b>81.2</td>
<td align=center><b>79.4</td>
</tr>
<tr>
<td><b>SSD512</td>
<td align=center><b>79.8</td>
<td align=center><b>83.2</td>
<td align=center><b>82.3</td>
</tr>
</table>
Training an SSD300 from scratch to convergence on Pascal VOC 2007 `trainval` and 2012 `trainval` produces the same mAP on Pascal VOC 2007 `test` as the original Caffe SSD300 "07+12" model. You can find a summary of the training [here](training_summaries/ssd300_pascal_07+12_training_summary.md).
<table width="95%">
<tr>
<td></td>
<td colspan=3 align=center>Mean Average Precision</td>
</tr>
<tr>
<td></td>
<td align=center>Original Caffe Model</td>
<td align=center>Ported Weights</td>
<td align=center>Trained from Scratch</td>
</tr>
<tr>
<td><b>SSD300 "07+12"</td>
<td align=center width="26%"><b>0.772</td>
<td align=center width="26%"><b>0.775</td>
<td align=center width="26%"><b><a href="https://drive.google.com/file/d/1-MYYaZbIHNPtI2zzklgVBAjssbP06BeA/view">0.771</a></td>
</tr>
</table>
The models achieve the following average number of frames per second (FPS) on Pascal VOC on an NVIDIA GeForce GTX 1070 mobile (i.e. the laptop version) and cuDNN v6. There are two things to note here. First, note that the benchmark prediction speeds of the original Caffe implementation were achieved using a TitanX GPU and cuDNN v4. Second, the paper says they measured the prediction speed at batch size 8, which I think isn't a meaningful way of measuring the speed. The whole point of measuring the speed of a detection model is to know how many individual sequential images the model can process per second, therefore measuring the prediction speed on batches of images and then deducing the time spent on each individual image in the batch defeats the purpose. For the sake of comparability, below you find the prediction speed for the original Caffe SSD implementation and the prediction speed for this implementation under the same conditions, i.e. at batch size 8. In addition you find the prediction speed for this implementation at batch size 1, which in my opinion is the more meaningful number.
<table width>
<tr>
<td></td>
<td colspan=3 align=center>Frames per Second</td>
</tr>
<tr>
<td></td>
<td align=center>Original Caffe Implementation</td>
<td colspan=2 align=center>This Implementation</td>
</tr>
<tr>
<td width="14%">Batch Size</td>
<td width="27%" align=center>8</td>
<td width="27%" align=center>8</td>
<td width="27%" align=center>1</td>
</tr>
<tr>
<td><b>SSD300</td>
<td align=center><b>46</td>
<td align=center><b>49</td>
<td align=center><b>39</td>
</tr>
<tr>
<td><b>SSD512</td>
<td align=center><b>19</td>
<td align=center><b>25</td>
<td align=center><b>20</td>
</tr>
<tr>
<td><b>SSD7</td>
<td align=center><b></td>
<td align=center><b>216</td>
<td align=center><b>127</td>
</tr>
</table>
### Examples
Below are some prediction examples of the fully trained original SSD300 "07+12" model (i.e. trained on Pascal VOC2007 `trainval` and VOC2012 `trainval`). The predictions were made on Pascal VOC2007 `test`.
| | |
|---|---|
| ![img01](./examples/trained_ssd300_pascalVOC2007_test_pred_05_no_gt.png) | ![img01](./examples/trained_ssd300_pascalVOC2007_test_pred_04_no_gt.png) |
| ![img01](./examples/trained_ssd300_pascalVOC2007_test_pred_01_no_gt.png) | ![img01](./examples/trained_ssd300_pascalVOC2007_test_pred_02_no_gt.png) |
Here are some prediction examples of an SSD7 (i.e. the small 7-layer version) partially trained on two road traffic datasets released by [Udacity](https://github.com/udacity/self-driving-car/tree/master/annotations) with roughly 20,000 images in total and 5 object categories (more info in [`ssd7_training.ipynb`](ssd7_training.ipynb)). The predictions you see below were made after 10,000 training steps at batch size 32. Admittedly, cars are comparatively easy objects to detect and I picked a few of the better examples, but it is nonetheless remarkable what such a small model can do after only 10,000 training iterations.
| | |
|---|---|
| ![img01](./examples/ssd7_udacity_traffic_pred_01.png) | ![img01](./examples/ssd7_udacity_traffic_pred_02.png) |
| ![img01](./examples/ssd7_udacity_traffic_pred_03.png) | ![img01](./examples/ssd7_udacity_traffic_pred_04.png) |
### Dependencies
* Python 3.x
* Numpy
* TensorFlow 1.x
* Keras 2.x
* OpenCV
* Beautiful Soup 4.x
The Theano and CNTK backends are currently not supported.
Python 2 compatibility: This implementation seems to work with Python 2.7, but I don't provide any support for it. It's 2018 and nobody should be using Python 2 anymore.
### How to use it
This repository provides Jupyter notebook tutorials that explain training, inference and evaluation, and there are a bunch of explanations in the subsequent sections that complement the notebooks.
How to use a trained model for inference:
* [`ssd300_inference.ipynb`](ssd300_inference.ipynb)
* [`ssd512_inference.ipynb`](ssd512_inference.ipynb)
How to train a model:
* [`ssd300_training.ipynb`](ssd300_training.ipynb)
* [`ssd7_training.ipynb`](ssd7_training.ipynb)
How to use one of the provided trained models for transfer learning on your own dataset:
* [Read below](#how-to-fine-tune-one-of-the-trained-models-on-your-own-dataset)
How to evaluate a trained model:
* In general: [`ssd300_evaluation.ipynb`](ssd300_evaluation.ipynb)
* On MS COCO: [`ssd300_evaluation_COCO.ipynb`](ssd300_evaluation_COCO.ipynb)
How to use the data generator:
* The data generator used here has its own repository with a detailed tutorial [here](https://github.com/pierluigiferrari/data_generator_object_detection_2d)
#### Training details
The general training setup is layed out and explained in [`ssd7_training.ipynb`](ssd7_training.ipynb) and in [`ssd300_training.ipynb`](ssd300_training.ipynb). The setup and explanations are similar in both notebooks for the most part, so it doesn't matter which one you look at to understand the general training setup, but the parameters in [`ssd300_training.ipynb`](ssd300_training.ipynb) are preset to copy the setup of the original Caffe implementation for training on Pascal VOC, while the parameters in [`ssd7_training.ipynb`](ssd7_training.ipynb) are preset to train on the [Udacity traffic datasets](https://github.com/udacity/self-driving-car/tree/master/annotations).
To train the original SSD300 model on Pascal VOC:
1. Download the datasets:
```c
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
```
2. Download the weights for the convolutionalized VGG-16 or for one of the trained original models provided below.
3. Set the file paths for the datasets and model weights accordingly in [`ssd300_training.ipynb`](ssd300_training.ipynb) and execute the cells.
The procedure for training SSD512 is the same of course. It is imperative that you load the pre-trained VGG-16 weights when attempting to train an SSD300 or SSD512 from scratch, otherwise the training will probably fail. Here is a summary of a full training of the SSD300 "07+12" model for comparison with your own training:
* [SSD300 Pascal VOC "07+12" training summary](training_summaries/ssd300_pascal_07+12_training_summary.md)
#### Encoding and decoding boxes
The [`ssd_encoder_decoder`](ssd_encoder_decoder) sub-package contains all functions and classes related to encoding and decoding boxes. Encoding boxes means converting ground truth labels into the target format that the loss function needs during training. It is this encoding process in which the matching of ground truth boxes to anchor boxes (the paper calls them default boxes and in the original C++ code they are called priors - all the same thing) happens. Decoding boxes means converting raw model output back to the input label format, which entails various conversion and filtering processes such as non-maximum suppression (NMS).
In order to train the model, you need to create an instance of `SSDInputEncoder` that needs to be passed to the data generator. The data generator does the rest, so you don't usually need to call any of `SSDInputEncoder`'s methods manually.
Models can be created in 'training' or 'inference' mode. In 'training' mode, the model outputs the raw prediction tensor that still needs to be post-processed with coordinate conversion, confidence thresholding, non-maximum suppression, etc. The functions `decode_detections()` and `decode_detections_fast()` are responsible for that. The former follows the original Caffe implementation, which entails performing NMS per object class, while the latter performs NMS globally across all object classes and is thus more efficient, but also behaves slightly differently. Read the documentation for details about both functions. If a model is created in 'inference' mode, its last layer is the `DecodeDetections` layer, which performs all the post-processing that `decode_detections()` does, but in TensorFlow. That means the output of the model is already the post-processed output. In order to be trainable, a model must be created in 'training' mode. The trained weights can then later be loaded into a model that was created in 'inference' mode.
A note on the anchor box offset coordinates used internally by the model: This may or may not be obvious to you, but it is important to understand that it is not possible for the model to predict absolute coordinates for the predicted bounding boxes. In order to be able to predict absolute box coordinates, the convolutional layers responsible for localization would need to produce different output values for the same object instance at different locations within the input image. This isn't possible of course: For a given input to the filter of a convolutional layer, the filter will produce the same output regardless of the spatial position within the image because of the shared weights. This is the reason why the model predicts offsets to anchor boxes instead of absolute coordinates, and why during training, absolute ground truth coordinates are converted to anchor box offsets in the encoding process. The fact that the model predicts offsets to anchor box coordinates is in turn the reason why the model contains anchor box layers that do nothing but output the anchor box coordinates so that the model's output tensor can include those. If the model's output tensor did not contain the anchor box coordinates, the information to convert the predicted offsets back to absolute coordinates would be missing in the model output.
#### Using a different base network architecture
If you want to build a different base network architecture, you could use [`keras_ssd7.py`](models/keras_ssd7.py) as a template. It provides documentation and comments to help you turn it into a different base network. Put together the base network you want and add a predictor layer on top of each network layer from which you would like to make predictions. Create two predictor heads for each, one for localization, one for classification. Create an anchor box layer for each predictor layer and set the respective localization head's output as the input for the anchor box layer. The structure of all tensor reshaping and concatenation operations remains the same, you just have to make sure to include all of your predictor and anchor box layers of course.
### Download the convolutionalized VGG-16 weights
In order to train an SSD300 or SSD512 from scratch, download the weights of the fully convolutionalized VGG-16 model trained to convergence on ImageNet classification here:
[`VGG_ILSVRC_16_layers_fc_reduced.h5`](https://drive.google.com/open?id=1sBmajn6vOE7qJ8GnxUJt4fGPuffVUZox).
As with all other weights files below, this is a direct port of the corresponding `.caffemodel` file that is provided in the repository of the original Caffe implementation.
### Download the original trained model weights
Here are the ported weights for all the original trained models. The filenames correspond to their respective `.caffemodel` counterparts. The asterisks and footnotes refer to those in the README of the [original Caffe implementation](https://github.com/weiliu89/caffe/tree/ssd#models).
1. PASCAL VOC models:
* 07+12: [SSD300*](https://drive.google.com/open?id=121-kCXaOHOkJE_Kf5lKcJvC_5q1fYb_q), [SSD512*](https://drive.google.com/open?id=19NIa0baRCFYT3iRxQkOKCD7CpN6BFO8p)
* 07++12: [SSD300*](https://drive.google.com/open?id=1M99knPZ4DpY9tI60iZqxXsAxX2bYWDvZ), [SSD512*](https://drive.google.com/open?id=18nFnqv9fG5Rh_fx6vUtOoQHOLySt4fEx)
* COCO[1]: [SSD300*](https://drive.google.com/open?id=17G1J4zEpFwiOzgBmq886ci4P3YaIz8bY), [SSD512*](https://drive.google.com/open?id=1wGc368WyXSHZOv4iow2tri9LnB0vm9X-)
* 07+12+COCO: [SSD300*](https://drive.google.com/open?id=1vtNI6kSnv7fkozl7WxyhGyReB6JvDM41), [SSD512*](https://drive.google.com/open?id=14mELuzm0OvXnwjb0mzAiG-Ake9_NP_LQ)
* 07++12+COCO: [SSD300*](https://drive.google.com/open?id=1fyDDUcIOSjeiP08vl1WCndcFdtboFXua), [SSD512*](https://drive.google.com/open?id=1a-64b6y6xsQr5puUsHX_wxI1orQDercM)
2. COCO models:
* trainval35k: [SSD300*](https://drive.google.com/open?id=1vmEF7FUsWfHquXyCqO17UaXOPpRbwsdj), [SSD512*](https://drive.google.com/open?id=1IJWZKmjkcFMlvaz2gYukzFx4d6mH3py5)
3. ILSVRC models:
* trainval1: [SSD300*](https://drive.google.com/open?id=1VWkj1oQS2RUhyJXckx3OaDYs5fx2mMCq), [SSD500](https://drive.google.com/open?id=1LcBPsd9CJbuBw4KiSuE1o1fMA-Pz2Zvw)
### How to fine-tune one of the trained models on your own dataset
If you want to fine-tune one of the provided trained models on your own dataset, chances are your dataset doesn't have the same number of classes as the trained model. The following tutorial explains how to deal with this problem:
[`weight_sampling_tutorial.ipynb`](weight_sampling_tutorial.ipynb)
### ToDo
The following things are on the to-do list, ranked by priority. Contributions are welcome, but please read the [contributing guidelines](CONTRIBUTING.md).
1. Add model definitions and trained weights for SSDs based on other base networks such as MobileNet, InceptionResNetV2, or DenseNet.
2. Add support for the Theano and CNTK backends. Requires porting the custom layers and the loss function from TensorFlow to the abstract Keras backend.
Currently in the works:
* A new [Focal Loss](https://arxiv.org/abs/1708.02002) loss function.
### Important notes
* All trained models that were trained on MS COCO use the smaller anchor box scaling factors provided in all of the Jupyter notebooks. In particular, note that the '07+12+COCO' and '07++12+COCO' models use the smaller scaling factors.
### Terminology
* "Anchor boxes": The paper calls them "default boxes", in the original C++ code they are called "prior boxes" or "priors", and the Faster R-CNN paper calls them "anchor boxes". All terms mean the same thing, but I slightly prefer the name "anchor boxes" because I find it to be the most descriptive of these names. I call them "prior boxes" or "priors" in `keras_ssd300.py` and `keras_ssd512.py` to stay consistent with the original Caffe implementation, but everywhere else I use the name "anchor boxes" or "anchors".
* "Labels": For the purpose of this project, datasets consist of "images" and "labels". Everything that belongs to the annotations of a given image is the "labels" of that image: Not just object category labels, but also bounding box coordinates. "Labels" is just shorter than "annotations". I also use the terms "labels" and "targets" more or less interchangeably throughout the documentation, although "targets" means labels specifically in the context of training.
* "Predictor layer": The "predictor layers" or "predictors" are all the last convolution layers of the network, i.e. all convolution layers that do not feed into any subsequent convolution layers.

View File

Binary file not shown.

View File

@@ -0,0 +1,383 @@
'''
Includes:
* Function to compute the IoU similarity for axis-aligned, rectangular, 2D bounding boxes
* Function for coordinate conversion for axis-aligned, rectangular, 2D bounding boxes
Copyright (C) 2018 Pierluigi Ferrari
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''
from __future__ import division
import numpy as np
def convert_coordinates(tensor, start_index, conversion, border_pixels='half'):
'''
Convert coordinates for axis-aligned 2D boxes between two coordinate formats.
Creates a copy of `tensor`, i.e. does not operate in place. Currently there are
three supported coordinate formats that can be converted from and to each other:
1) (xmin, xmax, ymin, ymax) - the 'minmax' format
2) (xmin, ymin, xmax, ymax) - the 'corners' format
2) (cx, cy, w, h) - the 'centroids' format
Arguments:
tensor (array): A Numpy nD array containing the four consecutive coordinates
to be converted somewhere in the last axis.
start_index (int): The index of the first coordinate in the last axis of `tensor`.
conversion (str, optional): The conversion direction. Can be 'minmax2centroids',
'centroids2minmax', 'corners2centroids', 'centroids2corners', 'minmax2corners',
or 'corners2minmax'.
border_pixels (str, optional): How to treat the border pixels of the bounding boxes.
Can be 'include', 'exclude', or 'half'. If 'include', the border pixels belong
to the boxes. If 'exclude', the border pixels do not belong to the boxes.
If 'half', then one of each of the two horizontal and vertical borders belong
to the boxex, but not the other.
Returns:
A Numpy nD array, a copy of the input tensor with the converted coordinates
in place of the original coordinates and the unaltered elements of the original
tensor elsewhere.
'''
if border_pixels == 'half':
d = 0
elif border_pixels == 'include':
d = 1
elif border_pixels == 'exclude':
d = -1
ind = start_index
tensor1 = np.copy(tensor).astype(np.float)
if conversion == 'minmax2centroids':
tensor1[..., ind] = (tensor[..., ind] + tensor[..., ind+1]) / 2.0 # Set cx
tensor1[..., ind+1] = (tensor[..., ind+2] + tensor[..., ind+3]) / 2.0 # Set cy
tensor1[..., ind+2] = tensor[..., ind+1] - tensor[..., ind] + d # Set w
tensor1[..., ind+3] = tensor[..., ind+3] - tensor[..., ind+2] + d # Set h
elif conversion == 'centroids2minmax':
tensor1[..., ind] = tensor[..., ind] - tensor[..., ind+2] / 2.0 # Set xmin
tensor1[..., ind+1] = tensor[..., ind] + tensor[..., ind+2] / 2.0 # Set xmax
tensor1[..., ind+2] = tensor[..., ind+1] - tensor[..., ind+3] / 2.0 # Set ymin
tensor1[..., ind+3] = tensor[..., ind+1] + tensor[..., ind+3] / 2.0 # Set ymax
elif conversion == 'corners2centroids':
tensor1[..., ind] = (tensor[..., ind] + tensor[..., ind+2]) / 2.0 # Set cx
tensor1[..., ind+1] = (tensor[..., ind+1] + tensor[..., ind+3]) / 2.0 # Set cy
tensor1[..., ind+2] = tensor[..., ind+2] - tensor[..., ind] + d # Set w
tensor1[..., ind+3] = tensor[..., ind+3] - tensor[..., ind+1] + d # Set h
elif conversion == 'centroids2corners':
tensor1[..., ind] = tensor[..., ind] - tensor[..., ind+2] / 2.0 # Set xmin
tensor1[..., ind+1] = tensor[..., ind+1] - tensor[..., ind+3] / 2.0 # Set ymin
tensor1[..., ind+2] = tensor[..., ind] + tensor[..., ind+2] / 2.0 # Set xmax
tensor1[..., ind+3] = tensor[..., ind+1] + tensor[..., ind+3] / 2.0 # Set ymax
elif (conversion == 'minmax2corners') or (conversion == 'corners2minmax'):
tensor1[..., ind+1] = tensor[..., ind+2]
tensor1[..., ind+2] = tensor[..., ind+1]
else:
raise ValueError("Unexpected conversion value. Supported values are 'minmax2centroids', 'centroids2minmax', 'corners2centroids', 'centroids2corners', 'minmax2corners', and 'corners2minmax'.")
return tensor1
def convert_coordinates2(tensor, start_index, conversion):
'''
A matrix multiplication implementation of `convert_coordinates()`.
Supports only conversion between the 'centroids' and 'minmax' formats.
This function is marginally slower on average than `convert_coordinates()`,
probably because it involves more (unnecessary) arithmetic operations (unnecessary
because the two matrices are sparse).
For details please refer to the documentation of `convert_coordinates()`.
'''
ind = start_index
tensor1 = np.copy(tensor).astype(np.float)
if conversion == 'minmax2centroids':
M = np.array([[0.5, 0. , -1., 0.],
[0.5, 0. , 1., 0.],
[0. , 0.5, 0., -1.],
[0. , 0.5, 0., 1.]])
tensor1[..., ind:ind+4] = np.dot(tensor1[..., ind:ind+4], M)
elif conversion == 'centroids2minmax':
M = np.array([[ 1. , 1. , 0. , 0. ],
[ 0. , 0. , 1. , 1. ],
[-0.5, 0.5, 0. , 0. ],
[ 0. , 0. , -0.5, 0.5]]) # The multiplicative inverse of the matrix above
tensor1[..., ind:ind+4] = np.dot(tensor1[..., ind:ind+4], M)
else:
raise ValueError("Unexpected conversion value. Supported values are 'minmax2centroids' and 'centroids2minmax'.")
return tensor1
def intersection_area(boxes1, boxes2, coords='centroids', mode='outer_product', border_pixels='half'):
'''
Computes the intersection areas of two sets of axis-aligned 2D rectangular boxes.
Let `boxes1` and `boxes2` contain `m` and `n` boxes, respectively.
In 'outer_product' mode, returns an `(m,n)` matrix with the intersection areas for all possible
combinations of the boxes in `boxes1` and `boxes2`.
In 'element-wise' mode, `m` and `n` must be broadcast-compatible. Refer to the explanation
of the `mode` argument for details.
Arguments:
boxes1 (array): Either a 1D Numpy array of shape `(4, )` containing the coordinates for one box in the
format specified by `coords` or a 2D Numpy array of shape `(m, 4)` containing the coordinates for `m` boxes.
If `mode` is set to 'element_wise', the shape must be broadcast-compatible with `boxes2`.
boxes2 (array): Either a 1D Numpy array of shape `(4, )` containing the coordinates for one box in the
format specified by `coords` or a 2D Numpy array of shape `(n, 4)` containing the coordinates for `n` boxes.
If `mode` is set to 'element_wise', the shape must be broadcast-compatible with `boxes1`.
coords (str, optional): The coordinate format in the input arrays. Can be either 'centroids' for the format
`(cx, cy, w, h)`, 'minmax' for the format `(xmin, xmax, ymin, ymax)`, or 'corners' for the format
`(xmin, ymin, xmax, ymax)`.
mode (str, optional): Can be one of 'outer_product' and 'element-wise'. In 'outer_product' mode, returns an
`(m,n)` matrix with the intersection areas for all possible combinations of the `m` boxes in `boxes1` with the
`n` boxes in `boxes2`. In 'element-wise' mode, returns a 1D array and the shapes of `boxes1` and `boxes2`
must be boadcast-compatible. If both `boxes1` and `boxes2` have `m` boxes, then this returns an array of
length `m` where the i-th position contains the intersection area of `boxes1[i]` with `boxes2[i]`.
border_pixels (str, optional): How to treat the border pixels of the bounding boxes.
Can be 'include', 'exclude', or 'half'. If 'include', the border pixels belong
to the boxes. If 'exclude', the border pixels do not belong to the boxes.
If 'half', then one of each of the two horizontal and vertical borders belong
to the boxex, but not the other.
Returns:
A 1D or 2D Numpy array (refer to the `mode` argument for details) of dtype float containing values with
the intersection areas of the boxes in `boxes1` and `boxes2`.
'''
# Make sure the boxes have the right shapes.
if boxes1.ndim > 2: raise ValueError("boxes1 must have rank either 1 or 2, but has rank {}.".format(boxes1.ndim))
if boxes2.ndim > 2: raise ValueError("boxes2 must have rank either 1 or 2, but has rank {}.".format(boxes2.ndim))
if boxes1.ndim == 1: boxes1 = np.expand_dims(boxes1, axis=0)
if boxes2.ndim == 1: boxes2 = np.expand_dims(boxes2, axis=0)
if not (boxes1.shape[1] == boxes2.shape[1] == 4): raise ValueError("All boxes must consist of 4 coordinates, but the boxes in `boxes1` and `boxes2` have {} and {} coordinates, respectively.".format(boxes1.shape[1], boxes2.shape[1]))
if not mode in {'outer_product', 'element-wise'}: raise ValueError("`mode` must be one of 'outer_product' and 'element-wise', but got '{}'.",format(mode))
# Convert the coordinates if necessary.
if coords == 'centroids':
boxes1 = convert_coordinates(boxes1, start_index=0, conversion='centroids2corners')
boxes2 = convert_coordinates(boxes2, start_index=0, conversion='centroids2corners')
coords = 'corners'
elif not (coords in {'minmax', 'corners'}):
raise ValueError("Unexpected value for `coords`. Supported values are 'minmax', 'corners' and 'centroids'.")
m = boxes1.shape[0] # The number of boxes in `boxes1`
n = boxes2.shape[0] # The number of boxes in `boxes2`
# Set the correct coordinate indices for the respective formats.
if coords == 'corners':
xmin = 0
ymin = 1
xmax = 2
ymax = 3
elif coords == 'minmax':
xmin = 0
xmax = 1
ymin = 2
ymax = 3
if border_pixels == 'half':
d = 0
elif border_pixels == 'include':
d = 1 # If border pixels are supposed to belong to the bounding boxes, we have to add one pixel to any difference `xmax - xmin` or `ymax - ymin`.
elif border_pixels == 'exclude':
d = -1 # If border pixels are not supposed to belong to the bounding boxes, we have to subtract one pixel from any difference `xmax - xmin` or `ymax - ymin`.
# Compute the intersection areas.
if mode == 'outer_product':
# For all possible box combinations, get the greater xmin and ymin values.
# This is a tensor of shape (m,n,2).
min_xy = np.maximum(np.tile(np.expand_dims(boxes1[:,[xmin,ymin]], axis=1), reps=(1, n, 1)),
np.tile(np.expand_dims(boxes2[:,[xmin,ymin]], axis=0), reps=(m, 1, 1)))
# For all possible box combinations, get the smaller xmax and ymax values.
# This is a tensor of shape (m,n,2).
max_xy = np.minimum(np.tile(np.expand_dims(boxes1[:,[xmax,ymax]], axis=1), reps=(1, n, 1)),
np.tile(np.expand_dims(boxes2[:,[xmax,ymax]], axis=0), reps=(m, 1, 1)))
# Compute the side lengths of the intersection rectangles.
side_lengths = np.maximum(0, max_xy - min_xy + d)
return side_lengths[:,:,0] * side_lengths[:,:,1]
elif mode == 'element-wise':
min_xy = np.maximum(boxes1[:,[xmin,ymin]], boxes2[:,[xmin,ymin]])
max_xy = np.minimum(boxes1[:,[xmax,ymax]], boxes2[:,[xmax,ymax]])
# Compute the side lengths of the intersection rectangles.
side_lengths = np.maximum(0, max_xy - min_xy + d)
return side_lengths[:,0] * side_lengths[:,1]
def intersection_area_(boxes1, boxes2, coords='corners', mode='outer_product', border_pixels='half'):
'''
The same as 'intersection_area()' but for internal use, i.e. without all the safety checks.
'''
m = boxes1.shape[0] # The number of boxes in `boxes1`
n = boxes2.shape[0] # The number of boxes in `boxes2`
# Set the correct coordinate indices for the respective formats.
if coords == 'corners':
xmin = 0
ymin = 1
xmax = 2
ymax = 3
elif coords == 'minmax':
xmin = 0
xmax = 1
ymin = 2
ymax = 3
if border_pixels == 'half':
d = 0
elif border_pixels == 'include':
d = 1 # If border pixels are supposed to belong to the bounding boxes, we have to add one pixel to any difference `xmax - xmin` or `ymax - ymin`.
elif border_pixels == 'exclude':
d = -1 # If border pixels are not supposed to belong to the bounding boxes, we have to subtract one pixel from any difference `xmax - xmin` or `ymax - ymin`.
# Compute the intersection areas.
if mode == 'outer_product':
# For all possible box combinations, get the greater xmin and ymin values.
# This is a tensor of shape (m,n,2).
min_xy = np.maximum(np.tile(np.expand_dims(boxes1[:,[xmin,ymin]], axis=1), reps=(1, n, 1)),
np.tile(np.expand_dims(boxes2[:,[xmin,ymin]], axis=0), reps=(m, 1, 1)))
# For all possible box combinations, get the smaller xmax and ymax values.
# This is a tensor of shape (m,n,2).
max_xy = np.minimum(np.tile(np.expand_dims(boxes1[:,[xmax,ymax]], axis=1), reps=(1, n, 1)),
np.tile(np.expand_dims(boxes2[:,[xmax,ymax]], axis=0), reps=(m, 1, 1)))
# Compute the side lengths of the intersection rectangles.
side_lengths = np.maximum(0, max_xy - min_xy + d)
return side_lengths[:,:,0] * side_lengths[:,:,1]
elif mode == 'element-wise':
min_xy = np.maximum(boxes1[:,[xmin,ymin]], boxes2[:,[xmin,ymin]])
max_xy = np.minimum(boxes1[:,[xmax,ymax]], boxes2[:,[xmax,ymax]])
# Compute the side lengths of the intersection rectangles.
side_lengths = np.maximum(0, max_xy - min_xy + d)
return side_lengths[:,0] * side_lengths[:,1]
def iou(boxes1, boxes2, coords='centroids', mode='outer_product', border_pixels='half'):
'''
Computes the intersection-over-union similarity (also known as Jaccard similarity)
of two sets of axis-aligned 2D rectangular boxes.
Let `boxes1` and `boxes2` contain `m` and `n` boxes, respectively.
In 'outer_product' mode, returns an `(m,n)` matrix with the IoUs for all possible
combinations of the boxes in `boxes1` and `boxes2`.
In 'element-wise' mode, `m` and `n` must be broadcast-compatible. Refer to the explanation
of the `mode` argument for details.
Arguments:
boxes1 (array): Either a 1D Numpy array of shape `(4, )` containing the coordinates for one box in the
format specified by `coords` or a 2D Numpy array of shape `(m, 4)` containing the coordinates for `m` boxes.
If `mode` is set to 'element_wise', the shape must be broadcast-compatible with `boxes2`.
boxes2 (array): Either a 1D Numpy array of shape `(4, )` containing the coordinates for one box in the
format specified by `coords` or a 2D Numpy array of shape `(n, 4)` containing the coordinates for `n` boxes.
If `mode` is set to 'element_wise', the shape must be broadcast-compatible with `boxes1`.
coords (str, optional): The coordinate format in the input arrays. Can be either 'centroids' for the format
`(cx, cy, w, h)`, 'minmax' for the format `(xmin, xmax, ymin, ymax)`, or 'corners' for the format
`(xmin, ymin, xmax, ymax)`.
mode (str, optional): Can be one of 'outer_product' and 'element-wise'. In 'outer_product' mode, returns an
`(m,n)` matrix with the IoU overlaps for all possible combinations of the `m` boxes in `boxes1` with the
`n` boxes in `boxes2`. In 'element-wise' mode, returns a 1D array and the shapes of `boxes1` and `boxes2`
must be boadcast-compatible. If both `boxes1` and `boxes2` have `m` boxes, then this returns an array of
length `m` where the i-th position contains the IoU overlap of `boxes1[i]` with `boxes2[i]`.
border_pixels (str, optional): How to treat the border pixels of the bounding boxes.
Can be 'include', 'exclude', or 'half'. If 'include', the border pixels belong
to the boxes. If 'exclude', the border pixels do not belong to the boxes.
If 'half', then one of each of the two horizontal and vertical borders belong
to the boxex, but not the other.
Returns:
A 1D or 2D Numpy array (refer to the `mode` argument for details) of dtype float containing values in [0,1],
the Jaccard similarity of the boxes in `boxes1` and `boxes2`. 0 means there is no overlap between two given
boxes, 1 means their coordinates are identical.
'''
# Make sure the boxes have the right shapes.
if boxes1.ndim > 2: raise ValueError("boxes1 must have rank either 1 or 2, but has rank {}.".format(boxes1.ndim))
if boxes2.ndim > 2: raise ValueError("boxes2 must have rank either 1 or 2, but has rank {}.".format(boxes2.ndim))
if boxes1.ndim == 1: boxes1 = np.expand_dims(boxes1, axis=0)
if boxes2.ndim == 1: boxes2 = np.expand_dims(boxes2, axis=0)
if not (boxes1.shape[1] == boxes2.shape[1] == 4): raise ValueError("All boxes must consist of 4 coordinates, but the boxes in `boxes1` and `boxes2` have {} and {} coordinates, respectively.".format(boxes1.shape[1], boxes2.shape[1]))
if not mode in {'outer_product', 'element-wise'}: raise ValueError("`mode` must be one of 'outer_product' and 'element-wise', but got '{}'.".format(mode))
# Convert the coordinates if necessary.
if coords == 'centroids':
boxes1 = convert_coordinates(boxes1, start_index=0, conversion='centroids2corners')
boxes2 = convert_coordinates(boxes2, start_index=0, conversion='centroids2corners')
coords = 'corners'
elif not (coords in {'minmax', 'corners'}):
raise ValueError("Unexpected value for `coords`. Supported values are 'minmax', 'corners' and 'centroids'.")
# Compute the IoU.
# Compute the interesection areas.
intersection_areas = intersection_area_(boxes1, boxes2, coords=coords, mode=mode)
m = boxes1.shape[0] # The number of boxes in `boxes1`
n = boxes2.shape[0] # The number of boxes in `boxes2`
# Compute the union areas.
# Set the correct coordinate indices for the respective formats.
if coords == 'corners':
xmin = 0
ymin = 1
xmax = 2
ymax = 3
elif coords == 'minmax':
xmin = 0
xmax = 1
ymin = 2
ymax = 3
if border_pixels == 'half':
d = 0
elif border_pixels == 'include':
d = 1 # If border pixels are supposed to belong to the bounding boxes, we have to add one pixel to any difference `xmax - xmin` or `ymax - ymin`.
elif border_pixels == 'exclude':
d = -1 # If border pixels are not supposed to belong to the bounding boxes, we have to subtract one pixel from any difference `xmax - xmin` or `ymax - ymin`.
if mode == 'outer_product':
boxes1_areas = np.tile(np.expand_dims((boxes1[:,xmax] - boxes1[:,xmin] + d) * (boxes1[:,ymax] - boxes1[:,ymin] + d), axis=1), reps=(1,n))
boxes2_areas = np.tile(np.expand_dims((boxes2[:,xmax] - boxes2[:,xmin] + d) * (boxes2[:,ymax] - boxes2[:,ymin] + d), axis=0), reps=(m,1))
elif mode == 'element-wise':
boxes1_areas = (boxes1[:,xmax] - boxes1[:,xmin] + d) * (boxes1[:,ymax] - boxes1[:,ymin] + d)
boxes2_areas = (boxes2[:,xmax] - boxes2[:,xmin] + d) * (boxes2[:,ymax] - boxes2[:,ymin] + d)
union_areas = boxes1_areas + boxes2_areas - intersection_areas
return intersection_areas / union_areas

View File

@@ -0,0 +1,33 @@
{
"model" : {
"backend": "ssd512",
"input": 512,
"labels": ["Gun" ,"Knife", "Razor", "Shuriken"]
},
"train": {
"train_image_folder": "/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/images",
"train_annot_folder": "/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/anns",
"train_image_set_filename": "/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/train.txt",
"train_times": 1,
"batch_size": 16,
"learning_rate": 1e-4,
"nb_epochs": 50,
"warmup_epochs": 3,
"saved_weights_name": "experimento_3_ssd512.h5",
"debug": false
},
"valid": {
"valid_image_folder": "/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/images",
"valid_annot_folder": "/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/anns",
"valid_image_set_filename": "/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/train.txt",
"valid_times": 1
},
"test": {
"test_image_folder": "/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Baggages/Testing/images",
"test_annot_folder": "/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Baggages/Testing/anns",
"test_image_set_filename": "/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Baggages/Testing/test.txt",
}
}

View File

@@ -0,0 +1,33 @@
{
"model" : {
"backend": "ssd300",
"input": 300,
"labels": ["Gun" ,"Knife", "Razor", "Shuriken"]
},
"train": {
"train_image_folder": "../Experimento_5/Training/images/",
"train_annot_folder": "../Experimento_5/Training/anns/",
"train_image_set_filename": "../Experimento_5/Training/train_no_original.txt",
"train_times": 1,
"batch_size": 8,
"learning_rate": 1e-4,
"nb_epochs": 100,
"warmup_epochs": 3,
"saved_weights_name": "../Experimento_5/Resultados_ssd/ssd300/experimento_5_ssd300.h5",
"debug": false
},
"valid": {
"valid_image_folder": "../Experimento_5/Training/images/",
"valid_annot_folder": "../Experimento_5/Training/anns/",
"valid_image_set_filename": "../Experimento_5/Training/train_no_original.txt",
"valid_times": 1
},
"test": {
"test_image_folder": "../Experimento_3/Baggages/Testing_3/images/",
"test_annot_folder": "../Experimento_3/Baggages/Testing_3/anns/",
"test_image_set_filename": "../Experimento_3/Baggages/Testing_3/test.txt"
}
}

View File

@@ -0,0 +1,33 @@
{
"model" : {
"backend": "ssd512",
"input": 512,
"labels": ["Gun" ,"Knife", "Razor", "Shuriken"]
},
"train": {
"train_image_folder": "../Experimento_3/Training/images/",
"train_annot_folder": "../Experimento_3/Training/anns/",
"train_image_set_filename": "../Experimento_3/Training/train_no_original.txt",
"train_times": 1,
"batch_size": 1,
"learning_rate": 1e-4,
"nb_epochs": 100,
"warmup_epochs": 3,
"saved_weights_name": "../Experimento_3/Resultados_ssd/ssd512/experimento_3_ssd512.h5",
"debug": false
},
"valid": {
"valid_image_folder": "../Experimento_3/Training/images/",
"valid_annot_folder": "../Experimento_3/Training/anns/",
"valid_image_set_filename": "../Experimento_3/Training/train_no_original.txt",
"valid_times": 1
},
"test": {
"test_image_folder": "../Experimento_3/Baggages/Testing_small/images/",
"test_annot_folder": "../Experimento_3/Baggages/Testing_small/anns/",
"test_image_set_filename": "../Experimento_3/Baggages/Testing_small/test.txt"
}
}

View File

@@ -0,0 +1,33 @@
{
"model" : {
"backend": "ssd7",
"input": 448,
"labels": ["Gun" ,"Knife", "Razor", "Shuriken"]
},
"train": {
"train_image_folder": "../Experimento_3/Training/images/",
"train_annot_folder": "../Experimento_3/Training/anns/",
"train_image_set_filename": "../Experimento_3/Training/train.txt",
"train_times": 1,
"batch_size": 8,
"learning_rate": 1e-4,
"nb_epochs": 100,
"warmup_epochs": 3,
"saved_weights_name": "../Experimento_3/Resultados_ssd/ssd7/experimento_3_ssd7.h5",
"debug": false
},
"valid": {
"valid_image_folder": "../Experimento_3/Training/images/",
"valid_annot_folder": "../Experimento_3/Training/anns/",
"valid_image_set_filename": "../Experimento_3/Training/train.txt",
"valid_times": 1
},
"test": {
"test_image_folder": "../Experimento_3/Baggages/Testing_678/images/",
"test_annot_folder": "../Experimento_3/Baggages/Testing_678/anns/",
"test_image_set_filename": "../Experimento_3/Baggages/Testing_678/test.txt"
}
}

Binary file not shown.

View File

@@ -0,0 +1,183 @@
'''
The data augmentation operations of the original SSD implementation.
Copyright (C) 2018 Pierluigi Ferrari
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''
from __future__ import division
import numpy as np
from data_generator.object_detection_2d_photometric_ops import ConvertColor, ConvertDataType, ConvertTo3Channels, RandomBrightness, RandomContrast, RandomHue, RandomSaturation
from data_generator.object_detection_2d_geometric_ops import RandomFlip, RandomTranslate, RandomScale
from data_generator.object_detection_2d_image_boxes_validation_utils import BoundGenerator, BoxFilter, ImageValidator
class DataAugmentationConstantInputSize:
'''
Applies a chain of photometric and geometric image transformations. For documentation, please refer
to the documentation of the individual transformations involved.
Important: This augmentation chain is suitable for constant-size images only.
'''
def __init__(self,
random_brightness=(-48, 48, 0.5),
random_contrast=(0.5, 1.8, 0.5),
random_saturation=(0.5, 1.8, 0.5),
random_hue=(18, 0.5),
random_flip=0.5,
random_translate=((0.03,0.5), (0.03,0.5), 0.5),
random_scale=(0.5, 2.0, 0.5),
n_trials_max=3,
clip_boxes=True,
overlap_criterion='area',
bounds_box_filter=(0.3, 1.0),
bounds_validator=(0.5, 1.0),
n_boxes_min=1,
background=(0,0,0),
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
if (random_scale[0] >= 1) or (random_scale[1] <= 1):
raise ValueError("This sequence of transformations only makes sense if the minimum scaling factor is <1 and the maximum scaling factor is >1.")
self.n_trials_max = n_trials_max
self.clip_boxes = clip_boxes
self.overlap_criterion = overlap_criterion
self.bounds_box_filter = bounds_box_filter
self.bounds_validator = bounds_validator
self.n_boxes_min = n_boxes_min
self.background = background
self.labels_format = labels_format
# Determines which boxes are kept in an image after the transformations have been applied.
self.box_filter = BoxFilter(check_overlap=True,
check_min_area=True,
check_degenerate=True,
overlap_criterion=self.overlap_criterion,
overlap_bounds=self.bounds_box_filter,
min_area=16,
labels_format=self.labels_format)
# Determines whether the result of the transformations is a valid training image.
self.image_validator = ImageValidator(overlap_criterion=self.overlap_criterion,
bounds=self.bounds_validator,
n_boxes_min=self.n_boxes_min,
labels_format=self.labels_format)
# Utility distortions
self.convert_RGB_to_HSV = ConvertColor(current='RGB', to='HSV')
self.convert_HSV_to_RGB = ConvertColor(current='HSV', to='RGB')
self.convert_to_float32 = ConvertDataType(to='float32')
self.convert_to_uint8 = ConvertDataType(to='uint8')
self.convert_to_3_channels = ConvertTo3Channels() # Make sure all images end up having 3 channels.
# Photometric transformations
self.random_brightness = RandomBrightness(lower=random_brightness[0], upper=random_brightness[1], prob=random_brightness[2])
self.random_contrast = RandomContrast(lower=random_contrast[0], upper=random_contrast[1], prob=random_contrast[2])
self.random_saturation = RandomSaturation(lower=random_saturation[0], upper=random_saturation[1], prob=random_saturation[2])
self.random_hue = RandomHue(max_delta=random_hue[0], prob=random_hue[1])
# Geometric transformations
self.random_flip = RandomFlip(dim='horizontal', prob=random_flip, labels_format=self.labels_format)
self.random_translate = RandomTranslate(dy_minmax=random_translate[0],
dx_minmax=random_translate[1],
prob=random_translate[2],
clip_boxes=self.clip_boxes,
box_filter=self.box_filter,
image_validator=self.image_validator,
n_trials_max=self.n_trials_max,
background=self.background,
labels_format=self.labels_format)
self.random_zoom_in = RandomScale(min_factor=1.0,
max_factor=random_scale[1],
prob=random_scale[2],
clip_boxes=self.clip_boxes,
box_filter=self.box_filter,
image_validator=self.image_validator,
n_trials_max=self.n_trials_max,
background=self.background,
labels_format=self.labels_format)
self.random_zoom_out = RandomScale(min_factor=random_scale[0],
max_factor=1.0,
prob=random_scale[2],
clip_boxes=self.clip_boxes,
box_filter=self.box_filter,
image_validator=self.image_validator,
n_trials_max=self.n_trials_max,
background=self.background,
labels_format=self.labels_format)
# If we zoom in, do translation before scaling.
self.sequence1 = [self.convert_to_3_channels,
self.convert_to_float32,
self.random_brightness,
self.random_contrast,
self.convert_to_uint8,
self.convert_RGB_to_HSV,
self.convert_to_float32,
self.random_saturation,
self.random_hue,
self.convert_to_uint8,
self.convert_HSV_to_RGB,
self.random_translate,
self.random_zoom_in,
self.random_flip]
# If we zoom out, do scaling before translation.
self.sequence2 = [self.convert_to_3_channels,
self.convert_to_float32,
self.random_brightness,
self.convert_to_uint8,
self.convert_RGB_to_HSV,
self.convert_to_float32,
self.random_saturation,
self.random_hue,
self.convert_to_uint8,
self.convert_HSV_to_RGB,
self.convert_to_float32,
self.random_contrast,
self.convert_to_uint8,
self.random_zoom_out,
self.random_translate,
self.random_flip]
def __call__(self, image, labels=None):
self.random_translate.labels_format = self.labels_format
self.random_zoom_in.labels_format = self.labels_format
self.random_zoom_out.labels_format = self.labels_format
self.random_flip.labels_format = self.labels_format
# Choose sequence 1 with probability 0.5.
if np.random.choice(2):
if not (labels is None):
for transform in self.sequence1:
image, labels = transform(image, labels)
return image, labels
else:
for transform in self.sequence1:
image = transform(image)
return image
# Choose sequence 2 with probability 0.5.
else:
if not (labels is None):
for transform in self.sequence2:
image, labels = transform(image, labels)
return image, labels
else:
for transform in self.sequence2:
image = transform(image)
return image

View File

@@ -0,0 +1,280 @@
'''
The data augmentation operations of the original SSD implementation.
Copyright (C) 2018 Pierluigi Ferrari
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''
from __future__ import division
import numpy as np
import cv2
import inspect
from data_generator.object_detection_2d_photometric_ops import ConvertColor, ConvertDataType, ConvertTo3Channels, RandomBrightness, RandomContrast, RandomHue, RandomSaturation, RandomChannelSwap
from data_generator.object_detection_2d_patch_sampling_ops import PatchCoordinateGenerator, RandomPatch, RandomPatchInf
from data_generator.object_detection_2d_geometric_ops import ResizeRandomInterp, RandomFlip
from data_generator.object_detection_2d_image_boxes_validation_utils import BoundGenerator, BoxFilter, ImageValidator
class SSDRandomCrop:
'''
Performs the same random crops as defined by the `batch_sampler` instructions
of the original Caffe implementation of SSD. A description of this random cropping
strategy can also be found in the data augmentation section of the paper:
https://arxiv.org/abs/1512.02325
'''
def __init__(self, labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
'''
Arguments:
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
'''
self.labels_format = labels_format
# This randomly samples one of the lower IoU bounds defined
# by the `sample_space` every time it is called.
self.bound_generator = BoundGenerator(sample_space=((None, None),
(0.1, None),
(0.3, None),
(0.5, None),
(0.7, None),
(0.9, None)),
weights=None)
# Produces coordinates for candidate patches such that the height
# and width of the patches are between 0.3 and 1.0 of the height
# and width of the respective image and the aspect ratio of the
# patches is between 0.5 and 2.0.
self.patch_coord_generator = PatchCoordinateGenerator(must_match='h_w',
min_scale=0.3,
max_scale=1.0,
scale_uniformly=False,
min_aspect_ratio = 0.5,
max_aspect_ratio = 2.0)
# Filters out boxes whose center point does not lie within the
# chosen patches.
self.box_filter = BoxFilter(check_overlap=True,
check_min_area=False,
check_degenerate=False,
overlap_criterion='center_point',
labels_format=self.labels_format)
# Determines whether a given patch is considered a valid patch.
# Defines a patch to be valid if at least one ground truth bounding box
# (n_boxes_min == 1) has an IoU overlap with the patch that
# meets the requirements defined by `bound_generator`.
self.image_validator = ImageValidator(overlap_criterion='iou',
n_boxes_min=1,
labels_format=self.labels_format,
border_pixels='half')
# Performs crops according to the parameters set in the objects above.
# Runs until either a valid patch is found or the original input image
# is returned unaltered. Runs a maximum of 50 trials to find a valid
# patch for each new sampled IoU threshold. Every 50 trials, the original
# image is returned as is with probability (1 - prob) = 0.143.
self.random_crop = RandomPatchInf(patch_coord_generator=self.patch_coord_generator,
box_filter=self.box_filter,
image_validator=self.image_validator,
bound_generator=self.bound_generator,
n_trials_max=50,
clip_boxes=True,
prob=0.857,
labels_format=self.labels_format)
def __call__(self, image, labels=None, return_inverter=False):
self.random_crop.labels_format = self.labels_format
return self.random_crop(image, labels, return_inverter)
class SSDExpand:
'''
Performs the random image expansion as defined by the `train_transform_param` instructions
of the original Caffe implementation of SSD. A description of this expansion strategy
can also be found in section 3.6 ("Data Augmentation for Small Object Accuracy") of the paper:
https://arxiv.org/abs/1512.02325
'''
def __init__(self, background=(123, 117, 104), labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
'''
Arguments:
background (list/tuple, optional): A 3-tuple specifying the RGB color value of the
background pixels of the translated images.
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
'''
self.labels_format = labels_format
# Generate coordinates for patches that are between 1.0 and 4.0 times
# the size of the input image in both spatial dimensions.
self.patch_coord_generator = PatchCoordinateGenerator(must_match='h_w',
min_scale=1.0,
max_scale=4.0,
scale_uniformly=True)
# With probability 0.5, place the input image randomly on a canvas filled with
# mean color values according to the parameters set above. With probability 0.5,
# return the input image unaltered.
self.expand = RandomPatch(patch_coord_generator=self.patch_coord_generator,
box_filter=None,
image_validator=None,
n_trials_max=1,
clip_boxes=False,
prob=0.5,
background=background,
labels_format=self.labels_format)
def __call__(self, image, labels=None, return_inverter=False):
self.expand.labels_format = self.labels_format
return self.expand(image, labels, return_inverter)
class SSDPhotometricDistortions:
'''
Performs the photometric distortions defined by the `train_transform_param` instructions
of the original Caffe implementation of SSD.
'''
def __init__(self):
self.convert_RGB_to_HSV = ConvertColor(current='RGB', to='HSV')
self.convert_HSV_to_RGB = ConvertColor(current='HSV', to='RGB')
self.convert_to_float32 = ConvertDataType(to='float32')
self.convert_to_uint8 = ConvertDataType(to='uint8')
self.convert_to_3_channels = ConvertTo3Channels()
self.random_brightness = RandomBrightness(lower=-32, upper=32, prob=0.5)
self.random_contrast = RandomContrast(lower=0.5, upper=1.5, prob=0.5)
self.random_saturation = RandomSaturation(lower=0.5, upper=1.5, prob=0.5)
self.random_hue = RandomHue(max_delta=18, prob=0.5)
self.random_channel_swap = RandomChannelSwap(prob=0.0)
self.sequence1 = [self.convert_to_3_channels,
self.convert_to_float32,
self.random_brightness,
self.random_contrast,
self.convert_to_uint8,
self.convert_RGB_to_HSV,
self.convert_to_float32,
self.random_saturation,
self.random_hue,
self.convert_to_uint8,
self.convert_HSV_to_RGB,
self.random_channel_swap]
self.sequence2 = [self.convert_to_3_channels,
self.convert_to_float32,
self.random_brightness,
self.convert_to_uint8,
self.convert_RGB_to_HSV,
self.convert_to_float32,
self.random_saturation,
self.random_hue,
self.convert_to_uint8,
self.convert_HSV_to_RGB,
self.convert_to_float32,
self.random_contrast,
self.convert_to_uint8,
self.random_channel_swap]
def __call__(self, image, labels):
# Choose sequence 1 with probability 0.5.
if np.random.choice(2):
for transform in self.sequence1:
image, labels = transform(image, labels)
return image, labels
# Choose sequence 2 with probability 0.5.
else:
for transform in self.sequence2:
image, labels = transform(image, labels)
return image, labels
class SSDDataAugmentation:
'''
Reproduces the data augmentation pipeline used in the training of the original
Caffe implementation of SSD.
'''
def __init__(self,
img_height=300,
img_width=300,
background=(123, 117, 104),
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
'''
Arguments:
height (int): The desired height of the output images in pixels.
width (int): The desired width of the output images in pixels.
background (list/tuple, optional): A 3-tuple specifying the RGB color value of the
background pixels of the translated images.
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
'''
self.labels_format = labels_format
self.photometric_distortions = SSDPhotometricDistortions()
self.expand = SSDExpand(background=background, labels_format=self.labels_format)
self.random_crop = SSDRandomCrop(labels_format=self.labels_format)
self.random_flip = RandomFlip(dim='horizontal', prob=0.5, labels_format=self.labels_format)
# This box filter makes sure that the resized images don't contain any degenerate boxes.
# Resizing the images could lead the boxes to becomes smaller. For boxes that are already
# pretty small, that might result in boxes with height and/or width zero, which we obviously
# cannot allow.
self.box_filter = BoxFilter(check_overlap=False,
check_min_area=False,
check_degenerate=True,
labels_format=self.labels_format)
self.resize = ResizeRandomInterp(height=img_height,
width=img_width,
interpolation_modes=[cv2.INTER_NEAREST,
cv2.INTER_LINEAR,
cv2.INTER_CUBIC,
cv2.INTER_AREA,
cv2.INTER_LANCZOS4],
box_filter=self.box_filter,
labels_format=self.labels_format)
self.sequence = [self.photometric_distortions,
self.expand,
self.random_crop,
self.random_flip,
self.resize]
def __call__(self, image, labels, return_inverter=False):
self.expand.labels_format = self.labels_format
self.random_crop.labels_format = self.labels_format
self.random_flip.labels_format = self.labels_format
self.resize.labels_format = self.labels_format
inverters = []
for transform in self.sequence:
if return_inverter and ('return_inverter' in inspect.signature(transform).parameters):
image, labels, inverter = transform(image, labels, return_inverter=True)
inverters.append(inverter)
else:
image, labels = transform(image, labels)
if return_inverter:
return image, labels, inverters[::-1]
else:
return image, labels

View File

@@ -0,0 +1,157 @@
'''
A data augmentation pipeline for datasets in bird's eye view, i.e. where there is
no "up" or "down" in the images.
Copyright (C) 2018 Pierluigi Ferrari
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''
from __future__ import division
import numpy as np
from data_generator.object_detection_2d_photometric_ops import ConvertColor, ConvertDataType, ConvertTo3Channels, RandomBrightness, RandomContrast, RandomHue, RandomSaturation
from data_generator.object_detection_2d_geometric_ops import Resize, RandomFlip, RandomRotate
from data_generator.object_detection_2d_patch_sampling_ops import PatchCoordinateGenerator, RandomPatch
from data_generator.object_detection_2d_image_boxes_validation_utils import BoxFilter, ImageValidator
class DataAugmentationSatellite:
'''
A data augmentation pipeline for datasets in bird's eye view, i.e. where there is
no "up" or "down" in the images.
Applies a chain of photometric and geometric image transformations. For documentation, please refer
to the documentation of the individual transformations involved.
'''
def __init__(self,
resize_height,
resize_width,
random_brightness=(-48, 48, 0.5),
random_contrast=(0.5, 1.8, 0.5),
random_saturation=(0.5, 1.8, 0.5),
random_hue=(18, 0.5),
random_flip=0.5,
random_rotate=([90, 180, 270], 0.5),
min_scale=0.3,
max_scale=2.0,
min_aspect_ratio = 0.8,
max_aspect_ratio = 1.25,
n_trials_max=3,
clip_boxes=True,
overlap_criterion='area',
bounds_box_filter=(0.3, 1.0),
bounds_validator=(0.5, 1.0),
n_boxes_min=1,
background=(0,0,0),
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
self.n_trials_max = n_trials_max
self.clip_boxes = clip_boxes
self.overlap_criterion = overlap_criterion
self.bounds_box_filter = bounds_box_filter
self.bounds_validator = bounds_validator
self.n_boxes_min = n_boxes_min
self.background = background
self.labels_format = labels_format
# Determines which boxes are kept in an image after the transformations have been applied.
self.box_filter_patch = BoxFilter(check_overlap=True,
check_min_area=False,
check_degenerate=False,
overlap_criterion=self.overlap_criterion,
overlap_bounds=self.bounds_box_filter,
labels_format=self.labels_format)
self.box_filter_resize = BoxFilter(check_overlap=False,
check_min_area=True,
check_degenerate=True,
min_area=16,
labels_format=self.labels_format)
# Determines whether the result of the transformations is a valid training image.
self.image_validator = ImageValidator(overlap_criterion=self.overlap_criterion,
bounds=self.bounds_validator,
n_boxes_min=self.n_boxes_min,
labels_format=self.labels_format)
# Utility transformations
self.convert_to_3_channels = ConvertTo3Channels() # Make sure all images end up having 3 channels.
self.convert_RGB_to_HSV = ConvertColor(current='RGB', to='HSV')
self.convert_HSV_to_RGB = ConvertColor(current='HSV', to='RGB')
self.convert_to_float32 = ConvertDataType(to='float32')
self.convert_to_uint8 = ConvertDataType(to='uint8')
self.resize = Resize(height=resize_height,
width=resize_width,
box_filter=self.box_filter_resize,
labels_format=self.labels_format)
# Photometric transformations
self.random_brightness = RandomBrightness(lower=random_brightness[0], upper=random_brightness[1], prob=random_brightness[2])
self.random_contrast = RandomContrast(lower=random_contrast[0], upper=random_contrast[1], prob=random_contrast[2])
self.random_saturation = RandomSaturation(lower=random_saturation[0], upper=random_saturation[1], prob=random_saturation[2])
self.random_hue = RandomHue(max_delta=random_hue[0], prob=random_hue[1])
# Geometric transformations
self.random_horizontal_flip = RandomFlip(dim='horizontal', prob=random_flip, labels_format=self.labels_format)
self.random_vertical_flip = RandomFlip(dim='vertical', prob=random_flip, labels_format=self.labels_format)
self.random_rotate = RandomRotate(angles=random_rotate[0], prob=random_rotate[1], labels_format=self.labels_format)
self.patch_coord_generator = PatchCoordinateGenerator(must_match='w_ar',
min_scale=min_scale,
max_scale=max_scale,
scale_uniformly=False,
min_aspect_ratio = min_aspect_ratio,
max_aspect_ratio = max_aspect_ratio)
self.random_patch = RandomPatch(patch_coord_generator=self.patch_coord_generator,
box_filter=self.box_filter_patch,
image_validator=self.image_validator,
n_trials_max=self.n_trials_max,
clip_boxes=self.clip_boxes,
prob=1.0,
can_fail=False,
labels_format=self.labels_format)
# Define the processing chain.
self.transformations = [self.convert_to_3_channels,
self.convert_to_float32,
self.random_brightness,
self.random_contrast,
self.convert_to_uint8,
self.convert_RGB_to_HSV,
self.convert_to_float32,
self.random_saturation,
self.random_hue,
self.convert_to_uint8,
self.convert_HSV_to_RGB,
self.random_horizontal_flip,
self.random_vertical_flip,
self.random_rotate,
self.random_patch,
self.resize]
def __call__(self, image, labels=None):
self.random_patch.labels_format = self.labels_format
self.random_horizontal_flip.labels_format = self.labels_format
self.random_vertical_flip.labels_format = self.labels_format
self.random_rotate.labels_format = self.labels_format
self.resize.labels_format = self.labels_format
if not (labels is None):
for transform in self.transformations:
image, labels = transform(image, labels)
return image, labels
else:
for transform in self.sequence1:
image = transform(image)
return image

View File

@@ -0,0 +1,152 @@
'''
A data augmentation pipeline suitable for variable-size images that produces effects
that are similar (but not identical) to those of the original SSD data augmentation
pipeline while being faster.
Copyright (C) 2018 Pierluigi Ferrari
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''
from __future__ import division
import numpy as np
from data_generator.object_detection_2d_photometric_ops import ConvertColor, ConvertDataType, ConvertTo3Channels, RandomBrightness, RandomContrast, RandomHue, RandomSaturation
from data_generator.object_detection_2d_geometric_ops import Resize, RandomFlip
from data_generator.object_detection_2d_patch_sampling_ops import PatchCoordinateGenerator, RandomPatch
from data_generator.object_detection_2d_image_boxes_validation_utils import BoxFilter, ImageValidator
class DataAugmentationVariableInputSize:
'''
A data augmentation pipeline suitable for variable-size images that produces effects
that are similar (but not identical!) to those of the original SSD data augmentation
pipeline while being faster.
Applies a chain of photometric and geometric image transformations. For documentation, please refer
to the documentation of the individual transformations involved.
'''
def __init__(self,
resize_height,
resize_width,
random_brightness=(-48, 48, 0.5),
random_contrast=(0.5, 1.8, 0.5),
random_saturation=(0.5, 1.8, 0.5),
random_hue=(18, 0.5),
random_flip=0.5,
min_scale=0.3,
max_scale=2.0,
min_aspect_ratio = 0.5,
max_aspect_ratio = 2.0,
n_trials_max=3,
clip_boxes=True,
overlap_criterion='area',
bounds_box_filter=(0.3, 1.0),
bounds_validator=(0.5, 1.0),
n_boxes_min=1,
background=(0,0,0),
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
self.n_trials_max = n_trials_max
self.clip_boxes = clip_boxes
self.overlap_criterion = overlap_criterion
self.bounds_box_filter = bounds_box_filter
self.bounds_validator = bounds_validator
self.n_boxes_min = n_boxes_min
self.background = background
self.labels_format = labels_format
# Determines which boxes are kept in an image after the transformations have been applied.
self.box_filter_patch = BoxFilter(check_overlap=True,
check_min_area=False,
check_degenerate=False,
overlap_criterion=self.overlap_criterion,
overlap_bounds=self.bounds_box_filter,
labels_format=self.labels_format)
self.box_filter_resize = BoxFilter(check_overlap=False,
check_min_area=True,
check_degenerate=True,
min_area=16,
labels_format=self.labels_format)
# Determines whether the result of the transformations is a valid training image.
self.image_validator = ImageValidator(overlap_criterion=self.overlap_criterion,
bounds=self.bounds_validator,
n_boxes_min=self.n_boxes_min,
labels_format=self.labels_format)
# Utility transformations
self.convert_to_3_channels = ConvertTo3Channels() # Make sure all images end up having 3 channels.
self.convert_RGB_to_HSV = ConvertColor(current='RGB', to='HSV')
self.convert_HSV_to_RGB = ConvertColor(current='HSV', to='RGB')
self.convert_to_float32 = ConvertDataType(to='float32')
self.convert_to_uint8 = ConvertDataType(to='uint8')
self.resize = Resize(height=resize_height,
width=resize_width,
box_filter=self.box_filter_resize,
labels_format=self.labels_format)
# Photometric transformations
self.random_brightness = RandomBrightness(lower=random_brightness[0], upper=random_brightness[1], prob=random_brightness[2])
self.random_contrast = RandomContrast(lower=random_contrast[0], upper=random_contrast[1], prob=random_contrast[2])
self.random_saturation = RandomSaturation(lower=random_saturation[0], upper=random_saturation[1], prob=random_saturation[2])
self.random_hue = RandomHue(max_delta=random_hue[0], prob=random_hue[1])
# Geometric transformations
self.random_flip = RandomFlip(dim='horizontal', prob=random_flip, labels_format=self.labels_format)
self.patch_coord_generator = PatchCoordinateGenerator(must_match='w_ar',
min_scale=min_scale,
max_scale=max_scale,
scale_uniformly=False,
min_aspect_ratio = min_aspect_ratio,
max_aspect_ratio = max_aspect_ratio)
self.random_patch = RandomPatch(patch_coord_generator=self.patch_coord_generator,
box_filter=self.box_filter_patch,
image_validator=self.image_validator,
n_trials_max=self.n_trials_max,
clip_boxes=self.clip_boxes,
prob=1.0,
can_fail=False,
labels_format=self.labels_format)
# Define the processing chain
self.transformations = [self.convert_to_3_channels,
self.convert_to_float32,
self.random_brightness,
self.random_contrast,
self.convert_to_uint8,
self.convert_RGB_to_HSV,
self.convert_to_float32,
self.random_saturation,
self.random_hue,
self.convert_to_uint8,
self.convert_HSV_to_RGB,
self.random_patch,
self.random_flip,
self.resize]
def __call__(self, image, labels=None):
self.random_patch.labels_format = self.labels_format
self.random_flip.labels_format = self.labels_format
self.resize.labels_format = self.labels_format
if not (labels is None):
for transform in self.transformations:
image, labels = transform(image, labels)
return image, labels
else:
for transform in self.sequence1:
image = transform(image)
return image

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,779 @@
'''
Various geometric image transformations for 2D object detection, both deterministic
and probabilistic.
Copyright (C) 2018 Pierluigi Ferrari
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''
from __future__ import division
import numpy as np
import cv2
import random
from data_generator.object_detection_2d_image_boxes_validation_utils import BoxFilter, ImageValidator
class Resize:
'''
Resizes images to a specified height and width in pixels.
'''
def __init__(self,
height,
width,
interpolation_mode=cv2.INTER_LINEAR,
box_filter=None,
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
'''
Arguments:
height (int): The desired height of the output images in pixels.
width (int): The desired width of the output images in pixels.
interpolation_mode (int, optional): An integer that denotes a valid
OpenCV interpolation mode. For example, integers 0 through 5 are
valid interpolation modes.
box_filter (BoxFilter, optional): Only relevant if ground truth bounding boxes are given.
A `BoxFilter` object to filter out bounding boxes that don't meet the given criteria
after the transformation. Refer to the `BoxFilter` documentation for details. If `None`,
the validity of the bounding boxes is not checked.
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
'''
if not (isinstance(box_filter, BoxFilter) or box_filter is None):
raise ValueError("`box_filter` must be either `None` or a `BoxFilter` object.")
self.out_height = height
self.out_width = width
self.interpolation_mode = interpolation_mode
self.box_filter = box_filter
self.labels_format = labels_format
def __call__(self, image, labels=None, return_inverter=False):
img_height, img_width = image.shape[:2]
xmin = self.labels_format['xmin']
ymin = self.labels_format['ymin']
xmax = self.labels_format['xmax']
ymax = self.labels_format['ymax']
image = cv2.resize(image,
dsize=(self.out_width, self.out_height),
interpolation=self.interpolation_mode)
if return_inverter:
def inverter(labels):
labels = np.copy(labels)
labels[:, [ymin+1, ymax+1]] = np.round(labels[:, [ymin+1, ymax+1]] * (img_height / self.out_height), decimals=0)
labels[:, [xmin+1, xmax+1]] = np.round(labels[:, [xmin+1, xmax+1]] * (img_width / self.out_width), decimals=0)
return labels
if labels is None:
if return_inverter:
return image, inverter
else:
return image
else:
labels = np.copy(labels)
labels[:, [ymin, ymax]] = np.round(labels[:, [ymin, ymax]] * (self.out_height / img_height), decimals=0)
labels[:, [xmin, xmax]] = np.round(labels[:, [xmin, xmax]] * (self.out_width / img_width), decimals=0)
if not (self.box_filter is None):
self.box_filter.labels_format = self.labels_format
labels = self.box_filter(labels=labels,
image_height=self.out_height,
image_width=self.out_width)
if return_inverter:
return image, labels, inverter
else:
return image, labels
class ResizeRandomInterp:
'''
Resizes images to a specified height and width in pixels using a radnomly
selected interpolation mode.
'''
def __init__(self,
height,
width,
interpolation_modes=[cv2.INTER_NEAREST,
cv2.INTER_LINEAR,
cv2.INTER_CUBIC,
cv2.INTER_AREA,
cv2.INTER_LANCZOS4],
box_filter=None,
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
'''
Arguments:
height (int): The desired height of the output image in pixels.
width (int): The desired width of the output image in pixels.
interpolation_modes (list/tuple, optional): A list/tuple of integers
that represent valid OpenCV interpolation modes. For example,
integers 0 through 5 are valid interpolation modes.
box_filter (BoxFilter, optional): Only relevant if ground truth bounding boxes are given.
A `BoxFilter` object to filter out bounding boxes that don't meet the given criteria
after the transformation. Refer to the `BoxFilter` documentation for details. If `None`,
the validity of the bounding boxes is not checked.
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
'''
if not (isinstance(interpolation_modes, (list, tuple))):
raise ValueError("`interpolation_mode` must be a list or tuple.")
self.height = height
self.width = width
self.interpolation_modes = interpolation_modes
self.box_filter = box_filter
self.labels_format = labels_format
self.resize = Resize(height=self.height,
width=self.width,
box_filter=self.box_filter,
labels_format=self.labels_format)
def __call__(self, image, labels=None, return_inverter=False):
self.resize.interpolation_mode = np.random.choice(self.interpolation_modes)
self.resize.labels_format = self.labels_format
return self.resize(image, labels, return_inverter)
class Flip:
'''
Flips images horizontally or vertically.
'''
def __init__(self,
dim='horizontal',
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
'''
Arguments:
dim (str, optional): Can be either of 'horizontal' and 'vertical'.
If 'horizontal', images will be flipped horizontally, i.e. along
the vertical axis. If 'horizontal', images will be flipped vertically,
i.e. along the horizontal axis.
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
'''
if not (dim in {'horizontal', 'vertical'}): raise ValueError("`dim` can be one of 'horizontal' and 'vertical'.")
self.dim = dim
self.labels_format = labels_format
def __call__(self, image, labels=None, return_inverter=False):
img_height, img_width = image.shape[:2]
xmin = self.labels_format['xmin']
ymin = self.labels_format['ymin']
xmax = self.labels_format['xmax']
ymax = self.labels_format['ymax']
if self.dim == 'horizontal':
image = image[:,::-1]
if labels is None:
return image
else:
labels = np.copy(labels)
labels[:, [xmin, xmax]] = img_width - labels[:, [xmax, xmin]]
return image, labels
else:
image = image[::-1]
if labels is None:
return image
else:
labels = np.copy(labels)
labels[:, [ymin, ymax]] = img_height - labels[:, [ymax, ymin]]
return image, labels
class RandomFlip:
'''
Randomly flips images horizontally or vertically. The randomness only refers
to whether or not the image will be flipped.
'''
def __init__(self,
dim='horizontal',
prob=0.5,
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
'''
Arguments:
dim (str, optional): Can be either of 'horizontal' and 'vertical'.
If 'horizontal', images will be flipped horizontally, i.e. along
the vertical axis. If 'horizontal', images will be flipped vertically,
i.e. along the horizontal axis.
prob (float, optional): `(1 - prob)` determines the probability with which the original,
unaltered image is returned.
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
'''
self.dim = dim
self.prob = prob
self.labels_format = labels_format
self.flip = Flip(dim=self.dim, labels_format=self.labels_format)
def __call__(self, image, labels=None):
p = np.random.uniform(0,1)
if p >= (1.0-self.prob):
self.flip.labels_format = self.labels_format
return self.flip(image, labels)
elif labels is None:
return image
else:
return image, labels
class Translate:
'''
Translates images horizontally and/or vertically.
'''
def __init__(self,
dy,
dx,
clip_boxes=True,
box_filter=None,
background=(0,0,0),
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
'''
Arguments:
dy (float): The fraction of the image height by which to translate images along the
vertical axis. Positive values translate images downwards, negative values
translate images upwards.
dx (float): The fraction of the image width by which to translate images along the
horizontal axis. Positive values translate images to the right, negative values
translate images to the left.
clip_boxes (bool, optional): Only relevant if ground truth bounding boxes are given.
If `True`, any ground truth bounding boxes will be clipped to lie entirely within the
image after the translation.
box_filter (BoxFilter, optional): Only relevant if ground truth bounding boxes are given.
A `BoxFilter` object to filter out bounding boxes that don't meet the given criteria
after the transformation. Refer to the `BoxFilter` documentation for details. If `None`,
the validity of the bounding boxes is not checked.
background (list/tuple, optional): A 3-tuple specifying the RGB color value of the
background pixels of the translated images.
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
'''
if not (isinstance(box_filter, BoxFilter) or box_filter is None):
raise ValueError("`box_filter` must be either `None` or a `BoxFilter` object.")
self.dy_rel = dy
self.dx_rel = dx
self.clip_boxes = clip_boxes
self.box_filter = box_filter
self.background = background
self.labels_format = labels_format
def __call__(self, image, labels=None):
img_height, img_width = image.shape[:2]
# Compute the translation matrix.
dy_abs = int(round(img_height * self.dy_rel))
dx_abs = int(round(img_width * self.dx_rel))
M = np.float32([[1, 0, dx_abs],
[0, 1, dy_abs]])
# Translate the image.
image = cv2.warpAffine(image,
M=M,
dsize=(img_width, img_height),
borderMode=cv2.BORDER_CONSTANT,
borderValue=self.background)
if labels is None:
return image
else:
xmin = self.labels_format['xmin']
ymin = self.labels_format['ymin']
xmax = self.labels_format['xmax']
ymax = self.labels_format['ymax']
labels = np.copy(labels)
# Translate the box coordinates to the translated image's coordinate system.
labels[:,[xmin,xmax]] += dx_abs
labels[:,[ymin,ymax]] += dy_abs
# Compute all valid boxes for this patch.
if not (self.box_filter is None):
self.box_filter.labels_format = self.labels_format
labels = self.box_filter(labels=labels,
image_height=img_height,
image_width=img_width)
if self.clip_boxes:
labels[:,[ymin,ymax]] = np.clip(labels[:,[ymin,ymax]], a_min=0, a_max=img_height-1)
labels[:,[xmin,xmax]] = np.clip(labels[:,[xmin,xmax]], a_min=0, a_max=img_width-1)
return image, labels
class RandomTranslate:
'''
Randomly translates images horizontally and/or vertically.
'''
def __init__(self,
dy_minmax=(0.03,0.3),
dx_minmax=(0.03,0.3),
prob=0.5,
clip_boxes=True,
box_filter=None,
image_validator=None,
n_trials_max=3,
background=(0,0,0),
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
'''
Arguments:
dy_minmax (list/tuple, optional): A 2-tuple `(min, max)` of non-negative floats that
determines the minimum and maximum relative translation of images along the vertical
axis both upward and downward. That is, images will be randomly translated by at least
`min` and at most `max` either upward or downward. For example, if `dy_minmax == (0.05,0.3)`,
an image of size `(100,100)` will be translated by at least 5 and at most 30 pixels
either upward or downward. The translation direction is chosen randomly.
dx_minmax (list/tuple, optional): A 2-tuple `(min, max)` of non-negative floats that
determines the minimum and maximum relative translation of images along the horizontal
axis both to the left and right. That is, images will be randomly translated by at least
`min` and at most `max` either left or right. For example, if `dx_minmax == (0.05,0.3)`,
an image of size `(100,100)` will be translated by at least 5 and at most 30 pixels
either left or right. The translation direction is chosen randomly.
prob (float, optional): `(1 - prob)` determines the probability with which the original,
unaltered image is returned.
clip_boxes (bool, optional): Only relevant if ground truth bounding boxes are given.
If `True`, any ground truth bounding boxes will be clipped to lie entirely within the
image after the translation.
box_filter (BoxFilter, optional): Only relevant if ground truth bounding boxes are given.
A `BoxFilter` object to filter out bounding boxes that don't meet the given criteria
after the transformation. Refer to the `BoxFilter` documentation for details. If `None`,
the validity of the bounding boxes is not checked.
image_validator (ImageValidator, optional): Only relevant if ground truth bounding boxes are given.
An `ImageValidator` object to determine whether a translated image is valid. If `None`,
any outcome is valid.
n_trials_max (int, optional): Only relevant if ground truth bounding boxes are given.
Determines the maxmial number of trials to produce a valid image. If no valid image could
be produced in `n_trials_max` trials, returns the unaltered input image.
background (list/tuple, optional): A 3-tuple specifying the RGB color value of the
background pixels of the translated images.
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
'''
if dy_minmax[0] > dy_minmax[1]:
raise ValueError("It must be `dy_minmax[0] <= dy_minmax[1]`.")
if dx_minmax[0] > dx_minmax[1]:
raise ValueError("It must be `dx_minmax[0] <= dx_minmax[1]`.")
if dy_minmax[0] < 0 or dx_minmax[0] < 0:
raise ValueError("It must be `dy_minmax[0] >= 0` and `dx_minmax[0] >= 0`.")
if not (isinstance(image_validator, ImageValidator) or image_validator is None):
raise ValueError("`image_validator` must be either `None` or an `ImageValidator` object.")
self.dy_minmax = dy_minmax
self.dx_minmax = dx_minmax
self.prob = prob
self.clip_boxes = clip_boxes
self.box_filter = box_filter
self.image_validator = image_validator
self.n_trials_max = n_trials_max
self.background = background
self.labels_format = labels_format
self.translate = Translate(dy=0,
dx=0,
clip_boxes=self.clip_boxes,
box_filter=self.box_filter,
background=self.background,
labels_format=self.labels_format)
def __call__(self, image, labels=None):
p = np.random.uniform(0,1)
if p >= (1.0-self.prob):
img_height, img_width = image.shape[:2]
xmin = self.labels_format['xmin']
ymin = self.labels_format['ymin']
xmax = self.labels_format['xmax']
ymax = self.labels_format['ymax']
# Override the preset labels format.
if not self.image_validator is None:
self.image_validator.labels_format = self.labels_format
self.translate.labels_format = self.labels_format
for _ in range(max(1, self.n_trials_max)):
# Pick the relative amount by which to translate.
dy_abs = np.random.uniform(self.dy_minmax[0], self.dy_minmax[1])
dx_abs = np.random.uniform(self.dx_minmax[0], self.dx_minmax[1])
# Pick the direction in which to translate.
dy = np.random.choice([-dy_abs, dy_abs])
dx = np.random.choice([-dx_abs, dx_abs])
self.translate.dy_rel = dy
self.translate.dx_rel = dx
if (labels is None) or (self.image_validator is None):
# We either don't have any boxes or if we do, we will accept any outcome as valid.
return self.translate(image, labels)
else:
# Translate the box coordinates to the translated image's coordinate system.
new_labels = np.copy(labels)
new_labels[:, [ymin, ymax]] += int(round(img_height * dy))
new_labels[:, [xmin, xmax]] += int(round(img_width * dx))
# Check if the patch is valid.
if self.image_validator(labels=new_labels,
image_height=img_height,
image_width=img_width):
return self.translate(image, labels)
# If all attempts failed, return the unaltered input image.
if labels is None:
return image
else:
return image, labels
elif labels is None:
return image
else:
return image, labels
class Scale:
'''
Scales images, i.e. zooms in or out.
'''
def __init__(self,
factor,
clip_boxes=True,
box_filter=None,
background=(0,0,0),
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
'''
Arguments:
factor (float): The fraction of the image size by which to scale images. Must be positive.
clip_boxes (bool, optional): Only relevant if ground truth bounding boxes are given.
If `True`, any ground truth bounding boxes will be clipped to lie entirely within the
image after the translation.
box_filter (BoxFilter, optional): Only relevant if ground truth bounding boxes are given.
A `BoxFilter` object to filter out bounding boxes that don't meet the given criteria
after the transformation. Refer to the `BoxFilter` documentation for details. If `None`,
the validity of the bounding boxes is not checked.
background (list/tuple, optional): A 3-tuple specifying the RGB color value of the potential
background pixels of the scaled images.
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
'''
if factor <= 0:
raise ValueError("It must be `factor > 0`.")
if not (isinstance(box_filter, BoxFilter) or box_filter is None):
raise ValueError("`box_filter` must be either `None` or a `BoxFilter` object.")
self.factor = factor
self.clip_boxes = clip_boxes
self.box_filter = box_filter
self.background = background
self.labels_format = labels_format
def __call__(self, image, labels=None):
img_height, img_width = image.shape[:2]
# Compute the rotation matrix.
M = cv2.getRotationMatrix2D(center=(img_width / 2, img_height / 2),
angle=0,
scale=self.factor)
# Scale the image.
image = cv2.warpAffine(image,
M=M,
dsize=(img_width, img_height),
borderMode=cv2.BORDER_CONSTANT,
borderValue=self.background)
if labels is None:
return image
else:
xmin = self.labels_format['xmin']
ymin = self.labels_format['ymin']
xmax = self.labels_format['xmax']
ymax = self.labels_format['ymax']
labels = np.copy(labels)
# Scale the bounding boxes accordingly.
# Transform two opposite corner points of the rectangular boxes using the rotation matrix `M`.
toplefts = np.array([labels[:,xmin], labels[:,ymin], np.ones(labels.shape[0])])
bottomrights = np.array([labels[:,xmax], labels[:,ymax], np.ones(labels.shape[0])])
new_toplefts = (np.dot(M, toplefts)).T
new_bottomrights = (np.dot(M, bottomrights)).T
labels[:,[xmin,ymin]] = np.round(new_toplefts, decimals=0).astype(np.int)
labels[:,[xmax,ymax]] = np.round(new_bottomrights, decimals=0).astype(np.int)
# Compute all valid boxes for this patch.
if not (self.box_filter is None):
self.box_filter.labels_format = self.labels_format
labels = self.box_filter(labels=labels,
image_height=img_height,
image_width=img_width)
if self.clip_boxes:
labels[:,[ymin,ymax]] = np.clip(labels[:,[ymin,ymax]], a_min=0, a_max=img_height-1)
labels[:,[xmin,xmax]] = np.clip(labels[:,[xmin,xmax]], a_min=0, a_max=img_width-1)
return image, labels
class RandomScale:
'''
Randomly scales images.
'''
def __init__(self,
min_factor=0.5,
max_factor=1.5,
prob=0.5,
clip_boxes=True,
box_filter=None,
image_validator=None,
n_trials_max=3,
background=(0,0,0),
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
'''
Arguments:
min_factor (float, optional): The minimum fraction of the image size by which to scale images.
Must be positive.
max_factor (float, optional): The maximum fraction of the image size by which to scale images.
Must be positive.
prob (float, optional): `(1 - prob)` determines the probability with which the original,
unaltered image is returned.
clip_boxes (bool, optional): Only relevant if ground truth bounding boxes are given.
If `True`, any ground truth bounding boxes will be clipped to lie entirely within the
image after the translation.
box_filter (BoxFilter, optional): Only relevant if ground truth bounding boxes are given.
A `BoxFilter` object to filter out bounding boxes that don't meet the given criteria
after the transformation. Refer to the `BoxFilter` documentation for details. If `None`,
the validity of the bounding boxes is not checked.
image_validator (ImageValidator, optional): Only relevant if ground truth bounding boxes are given.
An `ImageValidator` object to determine whether a scaled image is valid. If `None`,
any outcome is valid.
n_trials_max (int, optional): Only relevant if ground truth bounding boxes are given.
Determines the maxmial number of trials to produce a valid image. If no valid image could
be produced in `n_trials_max` trials, returns the unaltered input image.
background (list/tuple, optional): A 3-tuple specifying the RGB color value of the potential
background pixels of the scaled images.
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
'''
if not (0 < min_factor <= max_factor):
raise ValueError("It must be `0 < min_factor <= max_factor`.")
if not (isinstance(image_validator, ImageValidator) or image_validator is None):
raise ValueError("`image_validator` must be either `None` or an `ImageValidator` object.")
self.min_factor = min_factor
self.max_factor = max_factor
self.prob = prob
self.clip_boxes = clip_boxes
self.box_filter = box_filter
self.image_validator = image_validator
self.n_trials_max = n_trials_max
self.background = background
self.labels_format = labels_format
self.scale = Scale(factor=1.0,
clip_boxes=self.clip_boxes,
box_filter=self.box_filter,
background=self.background,
labels_format=self.labels_format)
def __call__(self, image, labels=None):
p = np.random.uniform(0,1)
if p >= (1.0-self.prob):
img_height, img_width = image.shape[:2]
xmin = self.labels_format['xmin']
ymin = self.labels_format['ymin']
xmax = self.labels_format['xmax']
ymax = self.labels_format['ymax']
# Override the preset labels format.
if not self.image_validator is None:
self.image_validator.labels_format = self.labels_format
self.scale.labels_format = self.labels_format
for _ in range(max(1, self.n_trials_max)):
# Pick a scaling factor.
factor = np.random.uniform(self.min_factor, self.max_factor)
self.scale.factor = factor
if (labels is None) or (self.image_validator is None):
# We either don't have any boxes or if we do, we will accept any outcome as valid.
return self.scale(image, labels)
else:
# Scale the bounding boxes accordingly.
# Transform two opposite corner points of the rectangular boxes using the rotation matrix `M`.
toplefts = np.array([labels[:,xmin], labels[:,ymin], np.ones(labels.shape[0])])
bottomrights = np.array([labels[:,xmax], labels[:,ymax], np.ones(labels.shape[0])])
# Compute the rotation matrix.
M = cv2.getRotationMatrix2D(center=(img_width / 2, img_height / 2),
angle=0,
scale=factor)
new_toplefts = (np.dot(M, toplefts)).T
new_bottomrights = (np.dot(M, bottomrights)).T
new_labels = np.copy(labels)
new_labels[:,[xmin,ymin]] = np.around(new_toplefts, decimals=0).astype(np.int)
new_labels[:,[xmax,ymax]] = np.around(new_bottomrights, decimals=0).astype(np.int)
# Check if the patch is valid.
if self.image_validator(labels=new_labels,
image_height=img_height,
image_width=img_width):
return self.scale(image, labels)
# If all attempts failed, return the unaltered input image.
if labels is None:
return image
else:
return image, labels
elif labels is None:
return image
else:
return image, labels
class Rotate:
'''
Rotates images counter-clockwise by 90, 180, or 270 degrees.
'''
def __init__(self,
angle,
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
'''
Arguments:
angle (int): The angle in degrees by which to rotate the images counter-clockwise.
Only 90, 180, and 270 are valid values.
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
'''
if not angle in {90, 180, 270}:
raise ValueError("`angle` must be in the set {90, 180, 270}.")
self.angle = angle
self.labels_format = labels_format
def __call__(self, image, labels=None):
img_height, img_width = image.shape[:2]
# Compute the rotation matrix.
M = cv2.getRotationMatrix2D(center=(img_width / 2, img_height / 2),
angle=self.angle,
scale=1)
# Get the sine and cosine from the rotation matrix.
cos_angle = np.abs(M[0, 0])
sin_angle = np.abs(M[0, 1])
# Compute the new bounding dimensions of the image.
img_width_new = int(img_height * sin_angle + img_width * cos_angle)
img_height_new = int(img_height * cos_angle + img_width * sin_angle)
# Adjust the rotation matrix to take into account the translation.
M[1, 2] += (img_height_new - img_height) / 2
M[0, 2] += (img_width_new - img_width) / 2
# Rotate the image.
image = cv2.warpAffine(image,
M=M,
dsize=(img_width_new, img_height_new))
if labels is None:
return image
else:
xmin = self.labels_format['xmin']
ymin = self.labels_format['ymin']
xmax = self.labels_format['xmax']
ymax = self.labels_format['ymax']
labels = np.copy(labels)
# Rotate the bounding boxes accordingly.
# Transform two opposite corner points of the rectangular boxes using the rotation matrix `M`.
toplefts = np.array([labels[:,xmin], labels[:,ymin], np.ones(labels.shape[0])])
bottomrights = np.array([labels[:,xmax], labels[:,ymax], np.ones(labels.shape[0])])
new_toplefts = (np.dot(M, toplefts)).T
new_bottomrights = (np.dot(M, bottomrights)).T
labels[:,[xmin,ymin]] = np.round(new_toplefts, decimals=0).astype(np.int)
labels[:,[xmax,ymax]] = np.round(new_bottomrights, decimals=0).astype(np.int)
if self.angle == 90:
# ymin and ymax were switched by the rotation.
labels[:,[ymax,ymin]] = labels[:,[ymin,ymax]]
elif self.angle == 180:
# ymin and ymax were switched by the rotation,
# and also xmin and xmax were switched.
labels[:,[ymax,ymin]] = labels[:,[ymin,ymax]]
labels[:,[xmax,xmin]] = labels[:,[xmin,xmax]]
elif self.angle == 270:
# xmin and xmax were switched by the rotation.
labels[:,[xmax,xmin]] = labels[:,[xmin,xmax]]
return image, labels
class RandomRotate:
'''
Randomly rotates images counter-clockwise.
'''
def __init__(self,
angles=[90, 180, 270],
prob=0.5,
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
'''
Arguments:
angle (list): The list of angles in degrees from which one is randomly selected to rotate
the images counter-clockwise. Only 90, 180, and 270 are valid values.
prob (float, optional): `(1 - prob)` determines the probability with which the original,
unaltered image is returned.
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
'''
for angle in angles:
if not angle in {90, 180, 270}:
raise ValueError("`angles` can only contain the values 90, 180, and 270.")
self.angles = angles
self.prob = prob
self.labels_format = labels_format
self.rotate = Rotate(angle=90, labels_format=self.labels_format)
def __call__(self, image, labels=None):
p = np.random.uniform(0,1)
if p >= (1.0-self.prob):
# Pick a rotation angle.
self.rotate.angle = random.choice(self.angles)
self.rotate.labels_format = self.labels_format
return self.rotate(image, labels)
elif labels is None:
return image
else:
return image, labels

View File

@@ -0,0 +1,322 @@
'''
Utilities for 2D object detection related to answering the following questions:
1. Given an image size and bounding boxes, which bounding boxes meet certain
requirements with respect to the image size?
2. Given an image size and bounding boxes, is an image of that size valid with
respect to the bounding boxes according to certain requirements?
Copyright (C) 2018 Pierluigi Ferrari
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''
from __future__ import division
import numpy as np
from bounding_box_utils.bounding_box_utils import iou
class BoundGenerator:
'''
Generates pairs of floating point values that represent lower and upper bounds
from a given sample space.
'''
def __init__(self,
sample_space=((0.1, None),
(0.3, None),
(0.5, None),
(0.7, None),
(0.9, None),
(None, None)),
weights=None):
'''
Arguments:
sample_space (list or tuple): A list, tuple, or array-like object of shape
`(n, 2)` that contains `n` samples to choose from, where each sample
is a 2-tuple of scalars and/or `None` values.
weights (list or tuple, optional): A list or tuple representing the distribution
over the sample space. If `None`, a uniform distribution will be assumed.
'''
if (not (weights is None)) and len(weights) != len(sample_space):
raise ValueError("`weights` must either be `None` for uniform distribution or have the same length as `sample_space`.")
self.sample_space = []
for bound_pair in sample_space:
if len(bound_pair) != 2:
raise ValueError("All elements of the sample space must be 2-tuples.")
bound_pair = list(bound_pair)
if bound_pair[0] is None: bound_pair[0] = 0.0
if bound_pair[1] is None: bound_pair[1] = 1.0
if bound_pair[0] > bound_pair[1]:
raise ValueError("For all sample space elements, the lower bound cannot be greater than the upper bound.")
self.sample_space.append(bound_pair)
self.sample_space_size = len(self.sample_space)
if weights is None:
self.weights = [1.0/self.sample_space_size] * self.sample_space_size
else:
self.weights = weights
def __call__(self):
'''
Returns:
An item of the sample space, i.e. a 2-tuple of scalars.
'''
i = np.random.choice(self.sample_space_size, p=self.weights)
return self.sample_space[i]
class BoxFilter:
'''
Returns all bounding boxes that are valid with respect to a the defined criteria.
'''
def __init__(self,
check_overlap=True,
check_min_area=True,
check_degenerate=True,
overlap_criterion='center_point',
overlap_bounds=(0.3, 1.0),
min_area=16,
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4},
border_pixels='half'):
'''
Arguments:
check_overlap (bool, optional): Whether or not to enforce the overlap requirements defined by
`overlap_criterion` and `overlap_bounds`. Sometimes you might want to use the box filter only
to enforce a certain minimum area for all boxes (see next argument), in such cases you can
turn the overlap requirements off.
check_min_area (bool, optional): Whether or not to enforce the minimum area requirement defined
by `min_area`. If `True`, any boxes that have an area (in pixels) that is smaller than `min_area`
will be removed from the labels of an image. Bounding boxes below a certain area aren't useful
training examples. An object that takes up only, say, 5 pixels in an image is probably not
recognizable anymore, neither for a human, nor for an object detection model. It makes sense
to remove such boxes.
check_degenerate (bool, optional): Whether or not to check for and remove degenerate bounding boxes.
Degenerate bounding boxes are boxes that have `xmax <= xmin` and/or `ymax <= ymin`. In particular,
boxes with a width and/or height of zero are degenerate. It is obviously important to filter out
such boxes, so you should only set this option to `False` if you are certain that degenerate
boxes are not possible in your data and processing chain.
overlap_criterion (str, optional): Can be either of 'center_point', 'iou', or 'area'. Determines
which boxes are considered valid with respect to a given image. If set to 'center_point',
a given bounding box is considered valid if its center point lies within the image.
If set to 'area', a given bounding box is considered valid if the quotient of its intersection
area with the image and its own area is within the given `overlap_bounds`. If set to 'iou', a given
bounding box is considered valid if its IoU with the image is within the given `overlap_bounds`.
overlap_bounds (list or BoundGenerator, optional): Only relevant if `overlap_criterion` is 'area' or 'iou'.
Determines the lower and upper bounds for `overlap_criterion`. Can be either a 2-tuple of scalars
representing a lower bound and an upper bound, or a `BoundGenerator` object, which provides
the possibility to generate bounds randomly.
min_area (int, optional): Only relevant if `check_min_area` is `True`. Defines the minimum area in
pixels that a bounding box must have in order to be valid. Boxes with an area smaller than this
will be removed.
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
border_pixels (str, optional): How to treat the border pixels of the bounding boxes.
Can be 'include', 'exclude', or 'half'. If 'include', the border pixels belong
to the boxes. If 'exclude', the border pixels do not belong to the boxes.
If 'half', then one of each of the two horizontal and vertical borders belong
to the boxex, but not the other.
'''
if not isinstance(overlap_bounds, (list, tuple, BoundGenerator)):
raise ValueError("`overlap_bounds` must be either a 2-tuple of scalars or a `BoundGenerator` object.")
if isinstance(overlap_bounds, (list, tuple)) and (overlap_bounds[0] > overlap_bounds[1]):
raise ValueError("The lower bound must not be greater than the upper bound.")
if not (overlap_criterion in {'iou', 'area', 'center_point'}):
raise ValueError("`overlap_criterion` must be one of 'iou', 'area', or 'center_point'.")
self.overlap_criterion = overlap_criterion
self.overlap_bounds = overlap_bounds
self.min_area = min_area
self.check_overlap = check_overlap
self.check_min_area = check_min_area
self.check_degenerate = check_degenerate
self.labels_format = labels_format
self.border_pixels = border_pixels
def __call__(self,
labels,
image_height=None,
image_width=None):
'''
Arguments:
labels (array): The labels to be filtered. This is an array with shape `(m,n)`, where
`m` is the number of bounding boxes and `n` is the number of elements that defines
each bounding box (box coordinates, class ID, etc.). The box coordinates are expected
to be in the image's coordinate system.
image_height (int): Only relevant if `check_overlap == True`. The height of the image
(in pixels) to compare the box coordinates to.
image_width (int): `check_overlap == True`. The width of the image (in pixels) to compare
the box coordinates to.
Returns:
An array containing the labels of all boxes that are valid.
'''
labels = np.copy(labels)
xmin = self.labels_format['xmin']
ymin = self.labels_format['ymin']
xmax = self.labels_format['xmax']
ymax = self.labels_format['ymax']
# Record the boxes that pass all checks here.
requirements_met = np.ones(shape=labels.shape[0], dtype=np.bool)
if self.check_degenerate:
non_degenerate = (labels[:,xmax] > labels[:,xmin]) * (labels[:,ymax] > labels[:,ymin])
requirements_met *= non_degenerate
if self.check_min_area:
min_area_met = (labels[:,xmax] - labels[:,xmin]) * (labels[:,ymax] - labels[:,ymin]) >= self.min_area
requirements_met *= min_area_met
if self.check_overlap:
# Get the lower and upper bounds.
if isinstance(self.overlap_bounds, BoundGenerator):
lower, upper = self.overlap_bounds()
else:
lower, upper = self.overlap_bounds
# Compute which boxes are valid.
if self.overlap_criterion == 'iou':
# Compute the patch coordinates.
image_coords = np.array([0, 0, image_width, image_height])
# Compute the IoU between the patch and all of the ground truth boxes.
image_boxes_iou = iou(image_coords, labels[:, [xmin, ymin, xmax, ymax]], coords='corners', mode='element-wise', border_pixels=self.border_pixels)
requirements_met *= (image_boxes_iou > lower) * (image_boxes_iou <= upper)
elif self.overlap_criterion == 'area':
if self.border_pixels == 'half':
d = 0
elif self.border_pixels == 'include':
d = 1 # If border pixels are supposed to belong to the bounding boxes, we have to add one pixel to any difference `xmax - xmin` or `ymax - ymin`.
elif self.border_pixels == 'exclude':
d = -1 # If border pixels are not supposed to belong to the bounding boxes, we have to subtract one pixel from any difference `xmax - xmin` or `ymax - ymin`.
# Compute the areas of the boxes.
box_areas = (labels[:,xmax] - labels[:,xmin] + d) * (labels[:,ymax] - labels[:,ymin] + d)
# Compute the intersection area between the patch and all of the ground truth boxes.
clipped_boxes = np.copy(labels)
clipped_boxes[:,[ymin,ymax]] = np.clip(labels[:,[ymin,ymax]], a_min=0, a_max=image_height-1)
clipped_boxes[:,[xmin,xmax]] = np.clip(labels[:,[xmin,xmax]], a_min=0, a_max=image_width-1)
intersection_areas = (clipped_boxes[:,xmax] - clipped_boxes[:,xmin] + d) * (clipped_boxes[:,ymax] - clipped_boxes[:,ymin] + d) # +1 because the border pixels belong to the box areas.
# Check which boxes meet the overlap requirements.
if lower == 0.0:
mask_lower = intersection_areas > lower * box_areas # If `self.lower == 0`, we want to make sure that boxes with area 0 don't count, hence the ">" sign instead of the ">=" sign.
else:
mask_lower = intersection_areas >= lower * box_areas # Especially for the case `self.lower == 1` we want the ">=" sign, otherwise no boxes would count at all.
mask_upper = intersection_areas <= upper * box_areas
requirements_met *= mask_lower * mask_upper
elif self.overlap_criterion == 'center_point':
# Compute the center points of the boxes.
cy = (labels[:,ymin] + labels[:,ymax]) / 2
cx = (labels[:,xmin] + labels[:,xmax]) / 2
# Check which of the boxes have center points within the cropped patch remove those that don't.
requirements_met *= (cy >= 0.0) * (cy <= image_height-1) * (cx >= 0.0) * (cx <= image_width-1)
return labels[requirements_met]
class ImageValidator:
'''
Returns `True` if a given minimum number of bounding boxes meets given overlap
requirements with an image of a given height and width.
'''
def __init__(self,
overlap_criterion='center_point',
bounds=(0.3, 1.0),
n_boxes_min=1,
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4},
border_pixels='half'):
'''
Arguments:
overlap_criterion (str, optional): Can be either of 'center_point', 'iou', or 'area'. Determines
which boxes are considered valid with respect to a given image. If set to 'center_point',
a given bounding box is considered valid if its center point lies within the image.
If set to 'area', a given bounding box is considered valid if the quotient of its intersection
area with the image and its own area is within `lower` and `upper`. If set to 'iou', a given
bounding box is considered valid if its IoU with the image is within `lower` and `upper`.
bounds (list or BoundGenerator, optional): Only relevant if `overlap_criterion` is 'area' or 'iou'.
Determines the lower and upper bounds for `overlap_criterion`. Can be either a 2-tuple of scalars
representing a lower bound and an upper bound, or a `BoundGenerator` object, which provides
the possibility to generate bounds randomly.
n_boxes_min (int or str, optional): Either a non-negative integer or the string 'all'.
Determines the minimum number of boxes that must meet the `overlap_criterion` with respect to
an image of the given height and width in order for the image to be a valid image.
If set to 'all', an image is considered valid if all given boxes meet the `overlap_criterion`.
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
border_pixels (str, optional): How to treat the border pixels of the bounding boxes.
Can be 'include', 'exclude', or 'half'. If 'include', the border pixels belong
to the boxes. If 'exclude', the border pixels do not belong to the boxes.
If 'half', then one of each of the two horizontal and vertical borders belong
to the boxex, but not the other.
'''
if not ((isinstance(n_boxes_min, int) and n_boxes_min > 0) or n_boxes_min == 'all'):
raise ValueError("`n_boxes_min` must be a positive integer or 'all'.")
self.overlap_criterion = overlap_criterion
self.bounds = bounds
self.n_boxes_min = n_boxes_min
self.labels_format = labels_format
self.border_pixels = border_pixels
self.box_filter = BoxFilter(check_overlap=True,
check_min_area=False,
check_degenerate=False,
overlap_criterion=self.overlap_criterion,
overlap_bounds=self.bounds,
labels_format=self.labels_format,
border_pixels=self.border_pixels)
def __call__(self,
labels,
image_height,
image_width):
'''
Arguments:
labels (array): The labels to be tested. The box coordinates are expected
to be in the image's coordinate system.
image_height (int): The height of the image to compare the box coordinates to.
image_width (int): The width of the image to compare the box coordinates to.
Returns:
A boolean indicating whether an imgae of the given height and width is
valid with respect to the given bounding boxes.
'''
self.box_filter.overlap_bounds = self.bounds
self.box_filter.labels_format = self.labels_format
# Get all boxes that meet the overlap requirements.
valid_labels = self.box_filter(labels=labels,
image_height=image_height,
image_width=image_width)
# Check whether enough boxes meet the requirements.
if isinstance(self.n_boxes_min, int):
# The image is valid if at least `self.n_boxes_min` ground truth boxes meet the requirements.
if len(valid_labels) >= self.n_boxes_min:
return True
else:
return False
elif self.n_boxes_min == 'all':
# The image is valid if all ground truth boxes meet the requirements.
if len(valid_labels) == len(labels):
return True
else:
return False

View File

@@ -0,0 +1,73 @@
'''
Miscellaneous data generator utilities.
Copyright (C) 2018 Pierluigi Ferrari
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''
from __future__ import division
import numpy as np
def apply_inverse_transforms(y_pred_decoded, inverse_transforms):
'''
Takes a list or Numpy array of decoded predictions and applies a given list of
transforms to them. The list of inverse transforms would usually contain the
inverter functions that some of the image transformations that come with this
data generator return. This function would normally be used to transform predictions
that were made on a transformed image back to the original image.
Arguments:
y_pred_decoded (list or array): Either a list of length `batch_size` that
contains Numpy arrays that contain the predictions for each batch item
or a Numpy array. If this is a list of Numpy arrays, the arrays would
usually have the shape `(num_predictions, 6)`, where `num_predictions`
is different for each batch item. If this is a Numpy array, it would
usually have the shape `(batch_size, num_predictions, 6)`. The last axis
would usually contain the class ID, confidence score, and four bounding
box coordinates for each prediction.
inverse_predictions (list): A nested list of length `batch_size` that contains
for each batch item a list of functions that take one argument (one element
of `y_pred_decoded` if it is a list or one slice along the first axis of
`y_pred_decoded` if it is an array) and return an output of the same shape
and data type.
Returns:
The transformed predictions, which have the same structure as `y_pred_decoded`.
'''
if isinstance(y_pred_decoded, list):
y_pred_decoded_inv = []
for i in range(len(y_pred_decoded)):
y_pred_decoded_inv.append(np.copy(y_pred_decoded[i]))
if y_pred_decoded_inv[i].size > 0: # If there are any predictions for this batch item.
for inverter in inverse_transforms[i]:
if not (inverter is None):
y_pred_decoded_inv[i] = inverter(y_pred_decoded_inv[i])
elif isinstance(y_pred_decoded, np.ndarray):
y_pred_decoded_inv = np.copy(y_pred_decoded)
for i in range(len(y_pred_decoded)):
if y_pred_decoded_inv[i].size > 0: # If there are any predictions for this batch item.
for inverter in inverse_transforms[i]:
if not (inverter is None):
y_pred_decoded_inv[i] = inverter(y_pred_decoded_inv[i])
else:
raise ValueError("`y_pred_decoded` must be either a list or a Numpy array.")
return y_pred_decoded_inv

View File

@@ -0,0 +1,881 @@
'''
Various patch sampling operations for data augmentation in 2D object detection.
Copyright (C) 2018 Pierluigi Ferrari
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''
from __future__ import division
import numpy as np
from data_generator.object_detection_2d_image_boxes_validation_utils import BoundGenerator, BoxFilter, ImageValidator
class PatchCoordinateGenerator:
'''
Generates random patch coordinates that meet specified requirements.
'''
def __init__(self,
img_height=None,
img_width=None,
must_match='h_w',
min_scale=0.3,
max_scale=1.0,
scale_uniformly=False,
min_aspect_ratio = 0.5,
max_aspect_ratio = 2.0,
patch_ymin=None,
patch_xmin=None,
patch_height=None,
patch_width=None,
patch_aspect_ratio=None):
'''
Arguments:
img_height (int): The height of the image for which the patch coordinates
shall be generated. Doesn't have to be known upon construction.
img_width (int): The width of the image for which the patch coordinates
shall be generated. Doesn't have to be known upon construction.
must_match (str, optional): Can be either of 'h_w', 'h_ar', and 'w_ar'.
Specifies which two of the three quantities height, width, and aspect
ratio determine the shape of the generated patch. The respective third
quantity will be computed from the other two. For example,
if `must_match == 'h_w'`, then the patch's height and width will be
set to lie within [min_scale, max_scale] of the image size or to
`patch_height` and/or `patch_width`, if given. The patch's aspect ratio
is the dependent variable in this case, it will be computed from the
height and width. Any given values for `patch_aspect_ratio`,
`min_aspect_ratio`, or `max_aspect_ratio` will be ignored.
min_scale (float, optional): The minimum size of a dimension of the patch
as a fraction of the respective dimension of the image. Can be greater
than 1. For example, if the image width is 200 and `min_scale == 0.5`,
then the width of the generated patch will be at least 100. If `min_scale == 1.5`,
the width of the generated patch will be at least 300.
max_scale (float, optional): The maximum size of a dimension of the patch
as a fraction of the respective dimension of the image. Can be greater
than 1. For example, if the image width is 200 and `max_scale == 1.0`,
then the width of the generated patch will be at most 200. If `max_scale == 1.5`,
the width of the generated patch will be at most 300. Must be greater than
`min_scale`.
scale_uniformly (bool, optional): If `True` and if `must_match == 'h_w'`,
the patch height and width will be scaled uniformly, otherwise they will
be scaled independently.
min_aspect_ratio (float, optional): Determines the minimum aspect ratio
for the generated patches.
max_aspect_ratio (float, optional): Determines the maximum aspect ratio
for the generated patches.
patch_ymin (int, optional): `None` or the vertical coordinate of the top left
corner of the generated patches. If this is not `None`, the position of the
patches along the vertical axis is fixed. If this is `None`, then the
vertical position of generated patches will be chosen randomly such that
the overlap of a patch and the image along the vertical dimension is
always maximal.
patch_xmin (int, optional): `None` or the horizontal coordinate of the top left
corner of the generated patches. If this is not `None`, the position of the
patches along the horizontal axis is fixed. If this is `None`, then the
horizontal position of generated patches will be chosen randomly such that
the overlap of a patch and the image along the horizontal dimension is
always maximal.
patch_height (int, optional): `None` or the fixed height of the generated patches.
patch_width (int, optional): `None` or the fixed width of the generated patches.
patch_aspect_ratio (float, optional): `None` or the fixed aspect ratio of the
generated patches.
'''
if not (must_match in {'h_w', 'h_ar', 'w_ar'}):
raise ValueError("`must_match` must be either of 'h_w', 'h_ar' and 'w_ar'.")
if min_scale >= max_scale:
raise ValueError("It must be `min_scale < max_scale`.")
if min_aspect_ratio >= max_aspect_ratio:
raise ValueError("It must be `min_aspect_ratio < max_aspect_ratio`.")
if scale_uniformly and not ((patch_height is None) and (patch_width is None)):
raise ValueError("If `scale_uniformly == True`, `patch_height` and `patch_width` must both be `None`.")
self.img_height = img_height
self.img_width = img_width
self.must_match = must_match
self.min_scale = min_scale
self.max_scale = max_scale
self.scale_uniformly = scale_uniformly
self.min_aspect_ratio = min_aspect_ratio
self.max_aspect_ratio = max_aspect_ratio
self.patch_ymin = patch_ymin
self.patch_xmin = patch_xmin
self.patch_height = patch_height
self.patch_width = patch_width
self.patch_aspect_ratio = patch_aspect_ratio
def __call__(self):
'''
Returns:
A 4-tuple `(ymin, xmin, height, width)` that represents the coordinates
of the generated patch.
'''
# Get the patch height and width.
if self.must_match == 'h_w': # Aspect is the dependent variable.
if not self.scale_uniformly:
# Get the height.
if self.patch_height is None:
patch_height = int(np.random.uniform(self.min_scale, self.max_scale) * self.img_height)
else:
patch_height = self.patch_height
# Get the width.
if self.patch_width is None:
patch_width = int(np.random.uniform(self.min_scale, self.max_scale) * self.img_width)
else:
patch_width = self.patch_width
else:
scaling_factor = np.random.uniform(self.min_scale, self.max_scale)
patch_height = int(scaling_factor * self.img_height)
patch_width = int(scaling_factor * self.img_width)
elif self.must_match == 'h_ar': # Width is the dependent variable.
# Get the height.
if self.patch_height is None:
patch_height = int(np.random.uniform(self.min_scale, self.max_scale) * self.img_height)
else:
patch_height = self.patch_height
# Get the aspect ratio.
if self.patch_aspect_ratio is None:
patch_aspect_ratio = np.random.uniform(self.min_aspect_ratio, self.max_aspect_ratio)
else:
patch_aspect_ratio = self.patch_aspect_ratio
# Get the width.
patch_width = int(patch_height * patch_aspect_ratio)
elif self.must_match == 'w_ar': # Height is the dependent variable.
# Get the width.
if self.patch_width is None:
patch_width = int(np.random.uniform(self.min_scale, self.max_scale) * self.img_width)
else:
patch_width = self.patch_width
# Get the aspect ratio.
if self.patch_aspect_ratio is None:
patch_aspect_ratio = np.random.uniform(self.min_aspect_ratio, self.max_aspect_ratio)
else:
patch_aspect_ratio = self.patch_aspect_ratio
# Get the height.
patch_height = int(patch_width / patch_aspect_ratio)
# Get the top left corner coordinates of the patch.
if self.patch_ymin is None:
# Compute how much room we have along the vertical axis to place the patch.
# A negative number here means that we want to sample a patch that is larger than the original image
# in the vertical dimension, in which case the patch will be placed such that it fully contains the
# image in the vertical dimension.
y_range = self.img_height - patch_height
# Select a random top left corner for the sample position from the possible positions.
if y_range >= 0: patch_ymin = np.random.randint(0, y_range + 1) # There are y_range + 1 possible positions for the crop in the vertical dimension.
else: patch_ymin = np.random.randint(y_range, 1) # The possible positions for the image on the background canvas in the vertical dimension.
else:
patch_ymin = self.patch_ymin
if self.patch_xmin is None:
# Compute how much room we have along the horizontal axis to place the patch.
# A negative number here means that we want to sample a patch that is larger than the original image
# in the horizontal dimension, in which case the patch will be placed such that it fully contains the
# image in the horizontal dimension.
x_range = self.img_width - patch_width
# Select a random top left corner for the sample position from the possible positions.
if x_range >= 0: patch_xmin = np.random.randint(0, x_range + 1) # There are x_range + 1 possible positions for the crop in the horizontal dimension.
else: patch_xmin = np.random.randint(x_range, 1) # The possible positions for the image on the background canvas in the horizontal dimension.
else:
patch_xmin = self.patch_xmin
return (patch_ymin, patch_xmin, patch_height, patch_width)
class CropPad:
'''
Crops and/or pads an image deterministically.
Depending on the given output patch size and the position (top left corner) relative
to the input image, the image will be cropped and/or padded along one or both spatial
dimensions.
For example, if the output patch lies entirely within the input image, this will result
in a regular crop. If the input image lies entirely within the output patch, this will
result in the image being padded in every direction. All other cases are mixed cases
where the image might be cropped in some directions and padded in others.
The output patch can be arbitrary in both size and position as long as it overlaps
with the input image.
'''
def __init__(self,
patch_ymin,
patch_xmin,
patch_height,
patch_width,
clip_boxes=True,
box_filter=None,
background=(0,0,0),
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
'''
Arguments:
patch_ymin (int, optional): The vertical coordinate of the top left corner of the output
patch relative to the image coordinate system. Can be negative (i.e. lie outside the image)
as long as the resulting patch still overlaps with the image.
patch_ymin (int, optional): The horizontal coordinate of the top left corner of the output
patch relative to the image coordinate system. Can be negative (i.e. lie outside the image)
as long as the resulting patch still overlaps with the image.
patch_height (int): The height of the patch to be sampled from the image. Can be greater
than the height of the input image.
patch_width (int): The width of the patch to be sampled from the image. Can be greater
than the width of the input image.
clip_boxes (bool, optional): Only relevant if ground truth bounding boxes are given.
If `True`, any ground truth bounding boxes will be clipped to lie entirely within the
sampled patch.
box_filter (BoxFilter, optional): Only relevant if ground truth bounding boxes are given.
A `BoxFilter` object to filter out bounding boxes that don't meet the given criteria
after the transformation. Refer to the `BoxFilter` documentation for details. If `None`,
the validity of the bounding boxes is not checked.
background (list/tuple, optional): A 3-tuple specifying the RGB color value of the potential
background pixels of the scaled images. In the case of single-channel images,
the first element of `background` will be used as the background pixel value.
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
'''
#if (patch_height <= 0) or (patch_width <= 0):
# raise ValueError("Patch height and width must both be positive.")
#if (patch_ymin + patch_height < 0) or (patch_xmin + patch_width < 0):
# raise ValueError("A patch with the given coordinates cannot overlap with an input image.")
if not (isinstance(box_filter, BoxFilter) or box_filter is None):
raise ValueError("`box_filter` must be either `None` or a `BoxFilter` object.")
self.patch_height = patch_height
self.patch_width = patch_width
self.patch_ymin = patch_ymin
self.patch_xmin = patch_xmin
self.clip_boxes = clip_boxes
self.box_filter = box_filter
self.background = background
self.labels_format = labels_format
def __call__(self, image, labels=None, return_inverter=False):
img_height, img_width = image.shape[:2]
if (self.patch_ymin > img_height) or (self.patch_xmin > img_width):
raise ValueError("The given patch doesn't overlap with the input image.")
labels = np.copy(labels)
xmin = self.labels_format['xmin']
ymin = self.labels_format['ymin']
xmax = self.labels_format['xmax']
ymax = self.labels_format['ymax']
# Top left corner of the patch relative to the image coordinate system:
patch_ymin = self.patch_ymin
patch_xmin = self.patch_xmin
# Create a canvas of the size of the patch we want to end up with.
if image.ndim == 3:
canvas = np.zeros(shape=(self.patch_height, self.patch_width, 3), dtype=np.uint8)
canvas[:, :] = self.background
elif image.ndim == 2:
canvas = np.zeros(shape=(self.patch_height, self.patch_width), dtype=np.uint8)
canvas[:, :] = self.background[0]
# Perform the crop.
if patch_ymin < 0 and patch_xmin < 0: # Pad the image at the top and on the left.
image_crop_height = min(img_height, self.patch_height + patch_ymin) # The number of pixels of the image that will end up on the canvas in the vertical direction.
image_crop_width = min(img_width, self.patch_width + patch_xmin) # The number of pixels of the image that will end up on the canvas in the horizontal direction.
canvas[-patch_ymin:-patch_ymin + image_crop_height, -patch_xmin:-patch_xmin + image_crop_width] = image[:image_crop_height, :image_crop_width]
elif patch_ymin < 0 and patch_xmin >= 0: # Pad the image at the top and crop it on the left.
image_crop_height = min(img_height, self.patch_height + patch_ymin) # The number of pixels of the image that will end up on the canvas in the vertical direction.
image_crop_width = min(self.patch_width, img_width - patch_xmin) # The number of pixels of the image that will end up on the canvas in the horizontal direction.
canvas[-patch_ymin:-patch_ymin + image_crop_height, :image_crop_width] = image[:image_crop_height, patch_xmin:patch_xmin + image_crop_width]
elif patch_ymin >= 0 and patch_xmin < 0: # Crop the image at the top and pad it on the left.
image_crop_height = min(self.patch_height, img_height - patch_ymin) # The number of pixels of the image that will end up on the canvas in the vertical direction.
image_crop_width = min(img_width, self.patch_width + patch_xmin) # The number of pixels of the image that will end up on the canvas in the horizontal direction.
canvas[:image_crop_height, -patch_xmin:-patch_xmin + image_crop_width] = image[patch_ymin:patch_ymin + image_crop_height, :image_crop_width]
elif patch_ymin >= 0 and patch_xmin >= 0: # Crop the image at the top and on the left.
image_crop_height = min(self.patch_height, img_height - patch_ymin) # The number of pixels of the image that will end up on the canvas in the vertical direction.
image_crop_width = min(self.patch_width, img_width - patch_xmin) # The number of pixels of the image that will end up on the canvas in the horizontal direction.
canvas[:image_crop_height, :image_crop_width] = image[patch_ymin:patch_ymin + image_crop_height, patch_xmin:patch_xmin + image_crop_width]
image = canvas
if return_inverter:
def inverter(labels):
labels = np.copy(labels)
labels[:, [ymin+1, ymax+1]] += patch_ymin
labels[:, [xmin+1, xmax+1]] += patch_xmin
return labels
if not (labels is None):
# Translate the box coordinates to the patch's coordinate system.
labels[:, [ymin, ymax]] -= patch_ymin
labels[:, [xmin, xmax]] -= patch_xmin
# Compute all valid boxes for this patch.
if not (self.box_filter is None):
self.box_filter.labels_format = self.labels_format
labels = self.box_filter(labels=labels,
image_height=self.patch_height,
image_width=self.patch_width)
if self.clip_boxes:
labels[:,[ymin,ymax]] = np.clip(labels[:,[ymin,ymax]], a_min=0, a_max=self.patch_height-1)
labels[:,[xmin,xmax]] = np.clip(labels[:,[xmin,xmax]], a_min=0, a_max=self.patch_width-1)
if return_inverter:
return image, labels, inverter
else:
return image, labels
else:
if return_inverter:
return image, inverter
else:
return image
class Crop:
'''
Crops off the specified numbers of pixels from the borders of images.
This is just a convenience interface for `CropPad`.
'''
def __init__(self,
crop_top,
crop_bottom,
crop_left,
crop_right,
clip_boxes=True,
box_filter=None,
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
self.crop_top = crop_top
self.crop_bottom = crop_bottom
self.crop_left = crop_left
self.crop_right = crop_right
self.clip_boxes = clip_boxes
self.box_filter = box_filter
self.labels_format = labels_format
self.crop = CropPad(patch_ymin=self.crop_top,
patch_xmin=self.crop_left,
patch_height=None,
patch_width=None,
clip_boxes=self.clip_boxes,
box_filter=self.box_filter,
labels_format=self.labels_format)
def __call__(self, image, labels=None, return_inverter=False):
img_height, img_width = image.shape[:2]
self.crop.patch_height = img_height - self.crop_top - self.crop_bottom
self.crop.patch_width = img_width - self.crop_left - self.crop_right
self.crop.labels_format = self.labels_format
return self.crop(image, labels, return_inverter)
class Pad:
'''
Pads images by the specified numbers of pixels on each side.
This is just a convenience interface for `CropPad`.
'''
def __init__(self,
pad_top,
pad_bottom,
pad_left,
pad_right,
background=(0,0,0),
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
self.pad_top = pad_top
self.pad_bottom = pad_bottom
self.pad_left = pad_left
self.pad_right = pad_right
self.background = background
self.labels_format = labels_format
self.pad = CropPad(patch_ymin=-self.pad_top,
patch_xmin=-self.pad_left,
patch_height=None,
patch_width=None,
clip_boxes=False,
box_filter=None,
background=self.background,
labels_format=self.labels_format)
def __call__(self, image, labels=None, return_inverter=False):
img_height, img_width = image.shape[:2]
self.pad.patch_height = img_height + self.pad_top + self.pad_bottom
self.pad.patch_width = img_width + self.pad_left + self.pad_right
self.pad.labels_format = self.labels_format
return self.pad(image, labels, return_inverter)
class RandomPatch:
'''
Randomly samples a patch from an image. The randomness refers to whatever
randomness may be introduced by the patch coordinate generator, the box filter,
and the patch validator.
Input images may be cropped and/or padded along either or both of the two
spatial dimensions as necessary in order to obtain the required patch.
As opposed to `RandomPatchInf`, it is possible for this transform to fail to produce
an output image at all, in which case it will return `None`. This is useful, because
if this transform is used to generate patches of a fixed size or aspect ratio, then
the caller needs to be able to rely on the output image satisfying the set size or
aspect ratio. It might therefore not be an option to return the unaltered input image
as other random transforms do when they fail to produce a valid transformed image.
'''
def __init__(self,
patch_coord_generator,
box_filter=None,
image_validator=None,
n_trials_max=3,
clip_boxes=True,
prob=1.0,
background=(0,0,0),
can_fail=False,
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
'''
Arguments:
patch_coord_generator (PatchCoordinateGenerator): A `PatchCoordinateGenerator` object
to generate the positions and sizes of the patches to be sampled from the input images.
box_filter (BoxFilter, optional): Only relevant if ground truth bounding boxes are given.
A `BoxFilter` object to filter out bounding boxes that don't meet the given criteria
after the transformation. Refer to the `BoxFilter` documentation for details. If `None`,
the validity of the bounding boxes is not checked.
image_validator (ImageValidator, optional): Only relevant if ground truth bounding boxes are given.
An `ImageValidator` object to determine whether a sampled patch is valid. If `None`,
any outcome is valid.
n_trials_max (int, optional): Only relevant if ground truth bounding boxes are given.
Determines the maxmial number of trials to sample a valid patch. If no valid patch could
be sampled in `n_trials_max` trials, returns one `None` in place of each regular output.
clip_boxes (bool, optional): Only relevant if ground truth bounding boxes are given.
If `True`, any ground truth bounding boxes will be clipped to lie entirely within the
sampled patch.
prob (float, optional): `(1 - prob)` determines the probability with which the original,
unaltered image is returned.
background (list/tuple, optional): A 3-tuple specifying the RGB color value of the potential
background pixels of the scaled images. In the case of single-channel images,
the first element of `background` will be used as the background pixel value.
can_fail (bool, optional): If `True`, will return `None` if no valid patch could be found after
`n_trials_max` trials. If `False`, will return the unaltered input image in such a case.
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
'''
if not isinstance(patch_coord_generator, PatchCoordinateGenerator):
raise ValueError("`patch_coord_generator` must be an instance of `PatchCoordinateGenerator`.")
if not (isinstance(image_validator, ImageValidator) or image_validator is None):
raise ValueError("`image_validator` must be either `None` or an `ImageValidator` object.")
self.patch_coord_generator = patch_coord_generator
self.box_filter = box_filter
self.image_validator = image_validator
self.n_trials_max = n_trials_max
self.clip_boxes = clip_boxes
self.prob = prob
self.background = background
self.can_fail = can_fail
self.labels_format = labels_format
self.sample_patch = CropPad(patch_ymin=None,
patch_xmin=None,
patch_height=None,
patch_width=None,
clip_boxes=self.clip_boxes,
box_filter=self.box_filter,
background=self.background,
labels_format=self.labels_format)
def __call__(self, image, labels=None, return_inverter=False):
p = np.random.uniform(0,1)
if p >= (1.0-self.prob):
img_height, img_width = image.shape[:2]
self.patch_coord_generator.img_height = img_height
self.patch_coord_generator.img_width = img_width
xmin = self.labels_format['xmin']
ymin = self.labels_format['ymin']
xmax = self.labels_format['xmax']
ymax = self.labels_format['ymax']
# Override the preset labels format.
if not self.image_validator is None:
self.image_validator.labels_format = self.labels_format
self.sample_patch.labels_format = self.labels_format
for _ in range(max(1, self.n_trials_max)):
# Generate patch coordinates.
patch_ymin, patch_xmin, patch_height, patch_width = self.patch_coord_generator()
self.sample_patch.patch_ymin = patch_ymin
self.sample_patch.patch_xmin = patch_xmin
self.sample_patch.patch_height = patch_height
self.sample_patch.patch_width = patch_width
if (labels is None) or (self.image_validator is None):
# We either don't have any boxes or if we do, we will accept any outcome as valid.
return self.sample_patch(image, labels, return_inverter)
else:
# Translate the box coordinates to the patch's coordinate system.
new_labels = np.copy(labels)
new_labels[:, [ymin, ymax]] -= patch_ymin
new_labels[:, [xmin, xmax]] -= patch_xmin
# Check if the patch is valid.
if self.image_validator(labels=new_labels,
image_height=patch_height,
image_width=patch_width):
return self.sample_patch(image, labels, return_inverter)
# If we weren't able to sample a valid patch...
if self.can_fail:
# ...return `None`.
if labels is None:
if return_inverter:
return None, None
else:
return None
else:
if return_inverter:
return None, None, None
else:
return None, None
else:
# ...return the unaltered input image.
if labels is None:
if return_inverter:
return image, None
else:
return image
else:
if return_inverter:
return image, labels, None
else:
return image, labels
else:
if return_inverter:
def inverter(labels):
return labels
if labels is None:
if return_inverter:
return image, inverter
else:
return image
else:
if return_inverter:
return image, labels, inverter
else:
return image, labels
class RandomPatchInf:
'''
Randomly samples a patch from an image. The randomness refers to whatever
randomness may be introduced by the patch coordinate generator, the box filter,
and the patch validator.
Input images may be cropped and/or padded along either or both of the two
spatial dimensions as necessary in order to obtain the required patch.
This operation is very similar to `RandomPatch`, except that:
1. This operation runs indefinitely until either a valid patch is found or
the input image is returned unaltered, i.e. it cannot fail.
2. If a bound generator is given, a new pair of bounds will be generated
every `n_trials_max` iterations.
'''
def __init__(self,
patch_coord_generator,
box_filter=None,
image_validator=None,
bound_generator=None,
n_trials_max=50,
clip_boxes=True,
prob=0.857,
background=(0,0,0),
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
'''
Arguments:
patch_coord_generator (PatchCoordinateGenerator): A `PatchCoordinateGenerator` object
to generate the positions and sizes of the patches to be sampled from the input images.
box_filter (BoxFilter, optional): Only relevant if ground truth bounding boxes are given.
A `BoxFilter` object to filter out bounding boxes that don't meet the given criteria
after the transformation. Refer to the `BoxFilter` documentation for details. If `None`,
the validity of the bounding boxes is not checked.
image_validator (ImageValidator, optional): Only relevant if ground truth bounding boxes are given.
An `ImageValidator` object to determine whether a sampled patch is valid. If `None`,
any outcome is valid.
bound_generator (BoundGenerator, optional): A `BoundGenerator` object to generate upper and
lower bound values for the patch validator. Every `n_trials_max` trials, a new pair of
upper and lower bounds will be generated until a valid patch is found or the original image
is returned. This bound generator overrides the bound generator of the patch validator.
n_trials_max (int, optional): Only relevant if ground truth bounding boxes are given.
The sampler will run indefinitely until either a valid patch is found or the original image
is returned, but this determines the maxmial number of trials to sample a valid patch for each
selected pair of lower and upper bounds before a new pair is picked.
clip_boxes (bool, optional): Only relevant if ground truth bounding boxes are given.
If `True`, any ground truth bounding boxes will be clipped to lie entirely within the
sampled patch.
prob (float, optional): `(1 - prob)` determines the probability with which the original,
unaltered image is returned.
background (list/tuple, optional): A 3-tuple specifying the RGB color value of the potential
background pixels of the scaled images. In the case of single-channel images,
the first element of `background` will be used as the background pixel value.
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
'''
if not isinstance(patch_coord_generator, PatchCoordinateGenerator):
raise ValueError("`patch_coord_generator` must be an instance of `PatchCoordinateGenerator`.")
if not (isinstance(image_validator, ImageValidator) or image_validator is None):
raise ValueError("`image_validator` must be either `None` or an `ImageValidator` object.")
if not (isinstance(bound_generator, BoundGenerator) or bound_generator is None):
raise ValueError("`bound_generator` must be either `None` or a `BoundGenerator` object.")
self.patch_coord_generator = patch_coord_generator
self.box_filter = box_filter
self.image_validator = image_validator
self.bound_generator = bound_generator
self.n_trials_max = n_trials_max
self.clip_boxes = clip_boxes
self.prob = prob
self.background = background
self.labels_format = labels_format
self.sample_patch = CropPad(patch_ymin=None,
patch_xmin=None,
patch_height=None,
patch_width=None,
clip_boxes=self.clip_boxes,
box_filter=self.box_filter,
background=self.background,
labels_format=self.labels_format)
def __call__(self, image, labels=None, return_inverter=False):
img_height, img_width = image.shape[:2]
self.patch_coord_generator.img_height = img_height
self.patch_coord_generator.img_width = img_width
xmin = self.labels_format['xmin']
ymin = self.labels_format['ymin']
xmax = self.labels_format['xmax']
ymax = self.labels_format['ymax']
# Override the preset labels format.
if not self.image_validator is None:
self.image_validator.labels_format = self.labels_format
self.sample_patch.labels_format = self.labels_format
while True: # Keep going until we either find a valid patch or return the original image.
p = np.random.uniform(0,1)
if p >= (1.0-self.prob):
# In case we have a bound generator, pick a lower and upper bound for the patch validator.
if not ((self.image_validator is None) or (self.bound_generator is None)):
self.image_validator.bounds = self.bound_generator()
# Use at most `self.n_trials_max` attempts to find a crop
# that meets our requirements.
for _ in range(max(1, self.n_trials_max)):
# Generate patch coordinates.
patch_ymin, patch_xmin, patch_height, patch_width = self.patch_coord_generator()
self.sample_patch.patch_ymin = patch_ymin
self.sample_patch.patch_xmin = patch_xmin
self.sample_patch.patch_height = patch_height
self.sample_patch.patch_width = patch_width
# Check if the resulting patch meets the aspect ratio requirements.
aspect_ratio = patch_width / patch_height
if not (self.patch_coord_generator.min_aspect_ratio <= aspect_ratio <= self.patch_coord_generator.max_aspect_ratio):
continue
if (labels is None) or (self.image_validator is None):
# We either don't have any boxes or if we do, we will accept any outcome as valid.
return self.sample_patch(image, labels, return_inverter)
else:
# Translate the box coordinates to the patch's coordinate system.
new_labels = np.copy(labels)
new_labels[:, [ymin, ymax]] -= patch_ymin
new_labels[:, [xmin, xmax]] -= patch_xmin
# Check if the patch contains the minimum number of boxes we require.
if self.image_validator(labels=new_labels,
image_height=patch_height,
image_width=patch_width):
return self.sample_patch(image, labels, return_inverter)
else:
if return_inverter:
def inverter(labels):
return labels
if labels is None:
if return_inverter:
return image, inverter
else:
return image
else:
if return_inverter:
return image, labels, inverter
else:
return image, labels
class RandomMaxCropFixedAR:
'''
Crops the largest possible patch of a given fixed aspect ratio
from an image.
Since the aspect ratio of the sampled patches is constant, they
can subsequently be resized to the same size without distortion.
'''
def __init__(self,
patch_aspect_ratio,
box_filter=None,
image_validator=None,
n_trials_max=3,
clip_boxes=True,
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
'''
Arguments:
patch_aspect_ratio (float): The fixed aspect ratio that all sampled patches will have.
box_filter (BoxFilter, optional): Only relevant if ground truth bounding boxes are given.
A `BoxFilter` object to filter out bounding boxes that don't meet the given criteria
after the transformation. Refer to the `BoxFilter` documentation for details. If `None`,
the validity of the bounding boxes is not checked.
image_validator (ImageValidator, optional): Only relevant if ground truth bounding boxes are given.
An `ImageValidator` object to determine whether a sampled patch is valid. If `None`,
any outcome is valid.
n_trials_max (int, optional): Only relevant if ground truth bounding boxes are given.
Determines the maxmial number of trials to sample a valid patch. If no valid patch could
be sampled in `n_trials_max` trials, returns `None`.
clip_boxes (bool, optional): Only relevant if ground truth bounding boxes are given.
If `True`, any ground truth bounding boxes will be clipped to lie entirely within the
sampled patch.
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
'''
self.patch_aspect_ratio = patch_aspect_ratio
self.box_filter = box_filter
self.image_validator = image_validator
self.n_trials_max = n_trials_max
self.clip_boxes = clip_boxes
self.labels_format = labels_format
self.random_patch = RandomPatch(patch_coord_generator=PatchCoordinateGenerator(), # Just a dummy object
box_filter=self.box_filter,
image_validator=self.image_validator,
n_trials_max=self.n_trials_max,
clip_boxes=self.clip_boxes,
prob=1.0,
can_fail=False,
labels_format=self.labels_format)
def __call__(self, image, labels=None, return_inverter=False):
img_height, img_width = image.shape[:2]
# The ratio of the input image aspect ratio and patch aspect ratio determines the maximal possible crop.
image_aspect_ratio = img_width / img_height
if image_aspect_ratio < self.patch_aspect_ratio:
patch_width = img_width
patch_height = int(round(patch_width / self.patch_aspect_ratio))
else:
patch_height = img_height
patch_width = int(round(patch_height * self.patch_aspect_ratio))
# Now that we know the desired height and width for the patch,
# instantiate an appropriate patch coordinate generator.
patch_coord_generator = PatchCoordinateGenerator(img_height=img_height,
img_width=img_width,
must_match='h_w',
patch_height=patch_height,
patch_width=patch_width)
# The rest of the work is done by `RandomPatch`.
self.random_patch.patch_coord_generator = patch_coord_generator
self.random_patch.labels_format = self.labels_format
return self.random_patch(image, labels, return_inverter)
class RandomPadFixedAR:
'''
Adds the minimal possible padding to an image that results in a patch
of the given fixed aspect ratio that contains the entire image.
Since the aspect ratio of the resulting images is constant, they
can subsequently be resized to the same size without distortion.
'''
def __init__(self,
patch_aspect_ratio,
background=(0,0,0),
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
'''
Arguments:
patch_aspect_ratio (float): The fixed aspect ratio that all sampled patches will have.
background (list/tuple, optional): A 3-tuple specifying the RGB color value of the potential
background pixels of the scaled images. In the case of single-channel images,
the first element of `background` will be used as the background pixel value.
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
'''
self.patch_aspect_ratio = patch_aspect_ratio
self.background = background
self.labels_format = labels_format
self.random_patch = RandomPatch(patch_coord_generator=PatchCoordinateGenerator(), # Just a dummy object
box_filter=None,
image_validator=None,
n_trials_max=1,
clip_boxes=False,
background=self.background,
prob=1.0,
labels_format=self.labels_format)
def __call__(self, image, labels=None, return_inverter=False):
img_height, img_width = image.shape[:2]
if img_width < img_height:
patch_height = img_height
patch_width = int(round(patch_height * self.patch_aspect_ratio))
else:
patch_width = img_width
patch_height = int(round(patch_width / self.patch_aspect_ratio))
# Now that we know the desired height and width for the patch,
# instantiate an appropriate patch coordinate generator.
patch_coord_generator = PatchCoordinateGenerator(img_height=img_height,
img_width=img_width,
must_match='h_w',
patch_height=patch_height,
patch_width=patch_width)
# The rest of the work is done by `RandomPatch`.
self.random_patch.patch_coord_generator = patch_coord_generator
self.random_patch.labels_format = self.labels_format
return self.random_patch(image, labels, return_inverter)

View File

@@ -0,0 +1,485 @@
'''
Various photometric image transformations, both deterministic and probabilistic.
Copyright (C) 2018 Pierluigi Ferrari
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''
from __future__ import division
import numpy as np
import cv2
class ConvertColor:
'''
Converts images between RGB, HSV and grayscale color spaces. This is just a wrapper
around `cv2.cvtColor()`.
'''
def __init__(self, current='RGB', to='HSV', keep_3ch=True):
'''
Arguments:
current (str, optional): The current color space of the images. Can be
one of 'RGB' and 'HSV'.
to (str, optional): The target color space of the images. Can be one of
'RGB', 'HSV', and 'GRAY'.
keep_3ch (bool, optional): Only relevant if `to == GRAY`.
If `True`, the resulting grayscale images will have three channels.
'''
if not ((current in {'RGB', 'HSV'}) and (to in {'RGB', 'HSV', 'GRAY'})):
raise NotImplementedError
self.current = current
self.to = to
self.keep_3ch = keep_3ch
def __call__(self, image, labels=None):
if self.current == 'RGB' and self.to == 'HSV':
image = cv2.cvtColor(image, cv2.COLOR_RGB2HSV)
elif self.current == 'RGB' and self.to == 'GRAY':
image = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
if self.keep_3ch:
image = np.stack([image] * 3, axis=-1)
elif self.current == 'HSV' and self.to == 'RGB':
image = cv2.cvtColor(image, cv2.COLOR_HSV2RGB)
elif self.current == 'HSV' and self.to == 'GRAY':
image = cv2.cvtColor(image, cv2.COLOR_HSV2GRAY)
if self.keep_3ch:
image = np.stack([image] * 3, axis=-1)
if labels is None:
return image
else:
return image, labels
class ConvertDataType:
'''
Converts images represented as Numpy arrays between `uint8` and `float32`.
Serves as a helper for certain photometric distortions. This is just a wrapper
around `np.ndarray.astype()`.
'''
def __init__(self, to='uint8'):
'''
Arguments:
to (string, optional): To which datatype to convert the input images.
Can be either of 'uint8' and 'float32'.
'''
if not (to == 'uint8' or to == 'float32'):
raise ValueError("`to` can be either of 'uint8' or 'float32'.")
self.to = to
def __call__(self, image, labels=None):
if self.to == 'uint8':
image = np.round(image, decimals=0).astype(np.uint8)
else:
image = image.astype(np.float32)
if labels is None:
return image
else:
return image, labels
class ConvertTo3Channels:
'''
Converts 1-channel and 4-channel images to 3-channel images. Does nothing to images that
already have 3 channels. In the case of 4-channel images, the fourth channel will be
discarded.
'''
def __init__(self):
pass
def __call__(self, image, labels=None):
if image.ndim == 2:
image = np.stack([image] * 3, axis=-1)
elif image.ndim == 3:
if image.shape[2] == 1:
image = np.concatenate([image] * 3, axis=-1)
elif image.shape[2] == 4:
image = image[:,:,:3]
if labels is None:
return image
else:
return image, labels
class Hue:
'''
Changes the hue of HSV images.
Important:
- Expects HSV input.
- Expects input array to be of `dtype` `float`.
'''
def __init__(self, delta):
'''
Arguments:
delta (int): An integer in the closed interval `[-180, 180]` that determines the hue change, where
a change by integer `delta` means a change by `2 * delta` degrees. Read up on the HSV color format
if you need more information.
'''
if not (-180 <= delta <= 180): raise ValueError("`delta` must be in the closed interval `[-180, 180]`.")
self.delta = delta
def __call__(self, image, labels=None):
image[:, :, 0] = (image[:, :, 0] + self.delta) % 180.0
if labels is None:
return image
else:
return image, labels
class RandomHue:
'''
Randomly changes the hue of HSV images.
Important:
- Expects HSV input.
- Expects input array to be of `dtype` `float`.
'''
def __init__(self, max_delta=18, prob=0.5):
'''
Arguments:
max_delta (int): An integer in the closed interval `[0, 180]` that determines the maximal absolute
hue change.
prob (float, optional): `(1 - prob)` determines the probability with which the original,
unaltered image is returned.
'''
if not (0 <= max_delta <= 180): raise ValueError("`max_delta` must be in the closed interval `[0, 180]`.")
self.max_delta = max_delta
self.prob = prob
self.change_hue = Hue(delta=0)
def __call__(self, image, labels=None):
p = np.random.uniform(0,1)
if p >= (1.0-self.prob):
self.change_hue.delta = np.random.uniform(-self.max_delta, self.max_delta)
return self.change_hue(image, labels)
elif labels is None:
return image
else:
return image, labels
class Saturation:
'''
Changes the saturation of HSV images.
Important:
- Expects HSV input.
- Expects input array to be of `dtype` `float`.
'''
def __init__(self, factor):
'''
Arguments:
factor (float): A float greater than zero that determines saturation change, where
values less than one result in less saturation and values greater than one result
in more saturation.
'''
if factor <= 0.0: raise ValueError("It must be `factor > 0`.")
self.factor = factor
def __call__(self, image, labels=None):
image[:,:,1] = np.clip(image[:,:,1] * self.factor, 0, 255)
if labels is None:
return image
else:
return image, labels
class RandomSaturation:
'''
Randomly changes the saturation of HSV images.
Important:
- Expects HSV input.
- Expects input array to be of `dtype` `float`.
'''
def __init__(self, lower=0.3, upper=2.0, prob=0.5):
'''
Arguments:
lower (float, optional): A float greater than zero, the lower bound for the random
saturation change.
upper (float, optional): A float greater than zero, the upper bound for the random
saturation change. Must be greater than `lower`.
prob (float, optional): `(1 - prob)` determines the probability with which the original,
unaltered image is returned.
'''
if lower >= upper: raise ValueError("`upper` must be greater than `lower`.")
self.lower = lower
self.upper = upper
self.prob = prob
self.change_saturation = Saturation(factor=1.0)
def __call__(self, image, labels=None):
p = np.random.uniform(0,1)
if p >= (1.0-self.prob):
self.change_saturation.factor = np.random.uniform(self.lower, self.upper)
return self.change_saturation(image, labels)
elif labels is None:
return image
else:
return image, labels
class Brightness:
'''
Changes the brightness of RGB images.
Important:
- Expects RGB input.
- Expects input array to be of `dtype` `float`.
'''
def __init__(self, delta):
'''
Arguments:
delta (int): An integer, the amount to add to or subtract from the intensity
of every pixel.
'''
self.delta = delta
def __call__(self, image, labels=None):
image = np.clip(image + self.delta, 0, 255)
if labels is None:
return image
else:
return image, labels
class RandomBrightness:
'''
Randomly changes the brightness of RGB images.
Important:
- Expects RGB input.
- Expects input array to be of `dtype` `float`.
'''
def __init__(self, lower=-84, upper=84, prob=0.5):
'''
Arguments:
lower (int, optional): An integer, the lower bound for the random brightness change.
upper (int, optional): An integer, the upper bound for the random brightness change.
Must be greater than `lower`.
prob (float, optional): `(1 - prob)` determines the probability with which the original,
unaltered image is returned.
'''
if lower >= upper: raise ValueError("`upper` must be greater than `lower`.")
self.lower = float(lower)
self.upper = float(upper)
self.prob = prob
self.change_brightness = Brightness(delta=0)
def __call__(self, image, labels=None):
p = np.random.uniform(0,1)
if p >= (1.0-self.prob):
self.change_brightness.delta = np.random.uniform(self.lower, self.upper)
return self.change_brightness(image, labels)
elif labels is None:
return image
else:
return image, labels
class Contrast:
'''
Changes the contrast of RGB images.
Important:
- Expects RGB input.
- Expects input array to be of `dtype` `float`.
'''
def __init__(self, factor):
'''
Arguments:
factor (float): A float greater than zero that determines contrast change, where
values less than one result in less contrast and values greater than one result
in more contrast.
'''
if factor <= 0.0: raise ValueError("It must be `factor > 0`.")
self.factor = factor
def __call__(self, image, labels=None):
image = np.clip(127.5 + self.factor * (image - 127.5), 0, 255)
if labels is None:
return image
else:
return image, labels
class RandomContrast:
'''
Randomly changes the contrast of RGB images.
Important:
- Expects RGB input.
- Expects input array to be of `dtype` `float`.
'''
def __init__(self, lower=0.5, upper=1.5, prob=0.5):
'''
Arguments:
lower (float, optional): A float greater than zero, the lower bound for the random
contrast change.
upper (float, optional): A float greater than zero, the upper bound for the random
contrast change. Must be greater than `lower`.
prob (float, optional): `(1 - prob)` determines the probability with which the original,
unaltered image is returned.
'''
if lower >= upper: raise ValueError("`upper` must be greater than `lower`.")
self.lower = lower
self.upper = upper
self.prob = prob
self.change_contrast = Contrast(factor=1.0)
def __call__(self, image, labels=None):
p = np.random.uniform(0,1)
if p >= (1.0-self.prob):
self.change_contrast.factor = np.random.uniform(self.lower, self.upper)
return self.change_contrast(image, labels)
elif labels is None:
return image
else:
return image, labels
class Gamma:
'''
Changes the gamma value of RGB images.
Important: Expects RGB input.
'''
def __init__(self, gamma):
'''
Arguments:
gamma (float): A float greater than zero that determines gamma change.
'''
if gamma <= 0.0: raise ValueError("It must be `gamma > 0`.")
self.gamma = gamma
self.gamma_inv = 1.0 / gamma
# Build a lookup table mapping the pixel values [0, 255] to
# their adjusted gamma values.
self.table = np.array([((i / 255.0) ** self.gamma_inv) * 255 for i in np.arange(0, 256)]).astype("uint8")
def __call__(self, image, labels=None):
image = cv2.LUT(image, table)
if labels is None:
return image
else:
return image, labels
class RandomGamma:
'''
Randomly changes the gamma value of RGB images.
Important: Expects RGB input.
'''
def __init__(self, lower=0.25, upper=2.0, prob=0.5):
'''
Arguments:
lower (float, optional): A float greater than zero, the lower bound for the random
gamma change.
upper (float, optional): A float greater than zero, the upper bound for the random
gamma change. Must be greater than `lower`.
prob (float, optional): `(1 - prob)` determines the probability with which the original,
unaltered image is returned.
'''
if lower >= upper: raise ValueError("`upper` must be greater than `lower`.")
self.lower = lower
self.upper = upper
self.prob = prob
def __call__(self, image, labels=None):
p = np.random.uniform(0,1)
if p >= (1.0-self.prob):
gamma = np.random.uniform(self.lower, self.upper)
change_gamma = Gamma(gamma=gamma)
return change_gamma(image, labels)
elif labels is None:
return image
else:
return image, labels
class HistogramEqualization:
'''
Performs histogram equalization on HSV images.
Importat: Expects HSV input.
'''
def __init__(self):
pass
def __call__(self, image, labels=None):
image[:,:,2] = cv2.equalizeHist(image[:,:,2])
if labels is None:
return image
else:
return image, labels
class RandomHistogramEqualization:
'''
Randomly performs histogram equalization on HSV images. The randomness only refers
to whether or not the equalization is performed.
Importat: Expects HSV input.
'''
def __init__(self, prob=0.5):
'''
Arguments:
prob (float, optional): `(1 - prob)` determines the probability with which the original,
unaltered image is returned.
'''
self.prob = prob
self.equalize = HistogramEqualization()
def __call__(self, image, labels=None):
p = np.random.uniform(0,1)
if p >= (1.0-self.prob):
return self.equalize(image, labels)
elif labels is None:
return image
else:
return image, labels
class ChannelSwap:
'''
Swaps the channels of images.
'''
def __init__(self, order):
'''
Arguments:
order (tuple): A tuple of integers that defines the desired channel order
of the input images after the channel swap.
'''
self.order = order
def __call__(self, image, labels=None):
image = image[:,:,self.order]
if labels is None:
return image
else:
return image, labels
class RandomChannelSwap:
'''
Randomly swaps the channels of RGB images.
Important: Expects RGB input.
'''
def __init__(self, prob=0.5):
'''
Arguments:
prob (float, optional): `(1 - prob)` determines the probability with which the original,
unaltered image is returned.
'''
self.prob = prob
# All possible permutations of the three image channels except the original order.
self.permutations = ((0, 2, 1),
(1, 0, 2), (1, 2, 0),
(2, 0, 1), (2, 1, 0))
self.swap_channels = ChannelSwap(order=(0, 1, 2))
def __call__(self, image, labels=None):
p = np.random.uniform(0,1)
if p >= (1.0-self.prob):
i = np.random.randint(5) # There are 6 possible permutations.
self.swap_channels.order = self.permutations[i]
return self.swap_channels(image, labels)
elif labels is None:
return image
else:
return image, labels

View File

View File

@@ -0,0 +1,906 @@
'''
An evaluator to compute the Pascal VOC-style mean average precision (both the pre-2010
and post-2010 algorithm versions) of a given Keras SSD model on a given dataset.
Copyright (C) 2018 Pierluigi Ferrari
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''
from __future__ import division
import numpy as np
from math import ceil
from tqdm import trange
import sys
import warnings
from data_generator.object_detection_2d_data_generator import DataGenerator
from data_generator.object_detection_2d_geometric_ops import Resize
from data_generator.object_detection_2d_patch_sampling_ops import RandomPadFixedAR
from data_generator.object_detection_2d_photometric_ops import ConvertTo3Channels
from ssd_encoder_decoder.ssd_output_decoder import decode_detections
from data_generator.object_detection_2d_misc_utils import apply_inverse_transforms
from bounding_box_utils.bounding_box_utils import iou
class Evaluator:
'''
Computes the mean average precision of the given Keras SSD model on the given dataset.
Can compute the Pascal-VOC-style average precision in both the pre-2010 (k-point sampling)
and post-2010 (integration) algorithm versions.
Optionally also returns the average precisions, precisions, and recalls.
The algorithm is identical to the official Pascal VOC pre-2010 detection evaluation algorithm
in its default settings, but can be cusomized in a number of ways.
'''
def __init__(self,
model,
n_classes,
data_generator,
model_mode='inference',
pred_format={'class_id': 0, 'conf': 1, 'xmin': 2, 'ymin': 3, 'xmax': 4, 'ymax': 5},
gt_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
'''
Arguments:
model (Keras model): A Keras SSD model object.
n_classes (int): The number of positive classes, e.g. 20 for Pascal VOC, 80 for MS COCO.
data_generator (DataGenerator): A `DataGenerator` object with the evaluation dataset.
model_mode (str, optional): The mode in which the model was created, i.e. 'training', 'inference' or 'inference_fast'.
This is needed in order to know whether the model output is already decoded or still needs to be decoded. Refer to
the model documentation for the meaning of the individual modes.
pred_format (dict, optional): A dictionary that defines which index in the last axis of the model's decoded predictions
contains which bounding box coordinate. The dictionary must map the keywords 'class_id', 'conf' (for the confidence),
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis.
gt_format (list, optional): A dictionary that defines which index of a ground truth bounding box contains which of the five
items class ID, xmin, ymin, xmax, ymax. The expected strings are 'xmin', 'ymin', 'xmax', 'ymax', 'class_id'.
'''
if not isinstance(data_generator, DataGenerator):
warnings.warn("`data_generator` is not a `DataGenerator` object, which will cause undefined behavior.")
self.model = model
self.data_generator = data_generator
self.n_classes = n_classes
self.model_mode = model_mode
self.pred_format = pred_format
self.gt_format = gt_format
# The following lists all contain per-class data, i.e. all list have the length `n_classes + 1`,
# where one element is for the background class, i.e. that element is just a dummy entry.
self.prediction_results = None
self.num_gt_per_class = None
self.true_positives = None
self.false_positives = None
self.cumulative_true_positives = None
self.cumulative_false_positives = None
self.cumulative_precisions = None # "Cumulative" means that the i-th element in each list represents the precision for the first i highest condidence predictions for that class.
self.cumulative_recalls = None # "Cumulative" means that the i-th element in each list represents the recall for the first i highest condidence predictions for that class.
self.average_precisions = None
self.mean_average_precision = None
def __call__(self,
img_height,
img_width,
batch_size,
data_generator_mode='resize',
round_confidences=False,
matching_iou_threshold=0.5,
border_pixels='include',
sorting_algorithm='quicksort',
average_precision_mode='sample',
num_recall_points=11,
ignore_neutral_boxes=True,
return_precisions=False,
return_recalls=False,
return_average_precisions=False,
verbose=True,
decoding_confidence_thresh=0.01,
decoding_iou_threshold=0.45,
decoding_top_k=200,
decoding_pred_coords='centroids',
decoding_normalize_coords=True):
'''
Computes the mean average precision of the given Keras SSD model on the given dataset.
Optionally also returns the averages precisions, precisions, and recalls.
All the individual steps of the overall evaluation algorithm can also be called separately
(check out the other methods of this class), but this runs the overall algorithm all at once.
Arguments:
img_height (int): The input image height for the model.
img_width (int): The input image width for the model.
batch_size (int): The batch size for the evaluation.
data_generator_mode (str, optional): Either of 'resize' and 'pad'. If 'resize', the input images will
be resized (i.e. warped) to `(img_height, img_width)`. This mode does not preserve the aspect ratios of the images.
If 'pad', the input images will be first padded so that they have the aspect ratio defined by `img_height`
and `img_width` and then resized to `(img_height, img_width)`. This mode preserves the aspect ratios of the images.
round_confidences (int, optional): `False` or an integer that is the number of decimals that the prediction
confidences will be rounded to. If `False`, the confidences will not be rounded.
matching_iou_threshold (float, optional): A prediction will be considered a true positive if it has a Jaccard overlap
of at least `matching_iou_threshold` with any ground truth bounding box of the same class.
border_pixels (str, optional): How to treat the border pixels of the bounding boxes.
Can be 'include', 'exclude', or 'half'. If 'include', the border pixels belong
to the boxes. If 'exclude', the border pixels do not belong to the boxes.
If 'half', then one of each of the two horizontal and vertical borders belong
to the boxex, but not the other.
sorting_algorithm (str, optional): Which sorting algorithm the matching algorithm should use. This argument accepts
any valid sorting algorithm for Numpy's `argsort()` function. You will usually want to choose between 'quicksort'
(fastest and most memory efficient, but not stable) and 'mergesort' (slight slower and less memory efficient, but stable).
The official Matlab evaluation algorithm uses a stable sorting algorithm, so this algorithm is only guaranteed
to behave identically if you choose 'mergesort' as the sorting algorithm, but it will almost always behave identically
even if you choose 'quicksort' (but no guarantees).
average_precision_mode (str, optional): Can be either 'sample' or 'integrate'. In the case of 'sample', the average precision
will be computed according to the Pascal VOC formula that was used up until VOC 2009, where the precision will be sampled
for `num_recall_points` recall values. In the case of 'integrate', the average precision will be computed according to the
Pascal VOC formula that was used from VOC 2010 onward, where the average precision will be computed by numerically integrating
over the whole preciscion-recall curve instead of sampling individual points from it. 'integrate' mode is basically just
the limit case of 'sample' mode as the number of sample points increases.
num_recall_points (int, optional): The number of points to sample from the precision-recall-curve to compute the average
precisions. In other words, this is the number of equidistant recall values for which the resulting precision will be
computed. 11 points is the value used in the official Pascal VOC 2007 detection evaluation algorithm.
ignore_neutral_boxes (bool, optional): In case the data generator provides annotations indicating whether a ground truth
bounding box is supposed to either count or be neutral for the evaluation, this argument decides what to do with these
annotations. If `False`, even boxes that are annotated as neutral will be counted into the evaluation. If `True`,
neutral boxes will be ignored for the evaluation. An example for evaluation-neutrality are the ground truth boxes
annotated as "difficult" in the Pascal VOC datasets, which are usually treated as neutral for the evaluation.
return_precisions (bool, optional): If `True`, returns a nested list containing the cumulative precisions for each class.
return_recalls (bool, optional): If `True`, returns a nested list containing the cumulative recalls for each class.
return_average_precisions (bool, optional): If `True`, returns a list containing the average precision for each class.
verbose (bool, optional): If `True`, will print out the progress during runtime.
decoding_confidence_thresh (float, optional): Only relevant if the model is in 'training' mode.
A float in [0,1), the minimum classification confidence in a specific positive class in order to be considered
for the non-maximum suppression stage for the respective class. A lower value will result in a larger part of the
selection process being done by the non-maximum suppression stage, while a larger value will result in a larger
part of the selection process happening in the confidence thresholding stage.
decoding_iou_threshold (float, optional): Only relevant if the model is in 'training' mode. A float in [0,1].
All boxes with a Jaccard similarity of greater than `iou_threshold` with a locally maximal box will be removed
from the set of predictions for a given class, where 'maximal' refers to the box score.
decoding_top_k (int, optional): Only relevant if the model is in 'training' mode. The number of highest scoring
predictions to be kept for each batch item after the non-maximum suppression stage.
decoding_input_coords (str, optional): Only relevant if the model is in 'training' mode. The box coordinate format
that the model outputs. Can be either 'centroids' for the format `(cx, cy, w, h)` (box center coordinates, width, and height),
'minmax' for the format `(xmin, xmax, ymin, ymax)`, or 'corners' for the format `(xmin, ymin, xmax, ymax)`.
decoding_normalize_coords (bool, optional): Only relevant if the model is in 'training' mode. Set to `True` if the model
outputs relative coordinates. Do not set this to `True` if the model already outputs absolute coordinates,
as that would result in incorrect coordinates.
Returns:
A float, the mean average precision, plus any optional returns specified in the arguments.
'''
#############################################################################################
# Predict on the entire dataset.
#############################################################################################
self.predict_on_dataset(img_height=img_height,
img_width=img_width,
batch_size=batch_size,
data_generator_mode=data_generator_mode,
decoding_confidence_thresh=decoding_confidence_thresh,
decoding_iou_threshold=decoding_iou_threshold,
decoding_top_k=decoding_top_k,
decoding_pred_coords=decoding_pred_coords,
decoding_normalize_coords=decoding_normalize_coords,
decoding_border_pixels=border_pixels,
round_confidences=round_confidences,
verbose=verbose,
ret=False)
#############################################################################################
# Get the total number of ground truth boxes for each class.
#############################################################################################
self.get_num_gt_per_class(ignore_neutral_boxes=ignore_neutral_boxes,
verbose=False,
ret=False)
#############################################################################################
# Match predictions to ground truth boxes for all classes.
#############################################################################################
self.match_predictions(ignore_neutral_boxes=ignore_neutral_boxes,
matching_iou_threshold=matching_iou_threshold,
border_pixels=border_pixels,
sorting_algorithm=sorting_algorithm,
verbose=verbose,
ret=False)
#############################################################################################
# Compute the cumulative precision and recall for all classes.
#############################################################################################
self.compute_precision_recall(verbose=verbose, ret=False)
#############################################################################################
# Compute the average precision for this class.
#############################################################################################
self.compute_average_precisions(mode=average_precision_mode,
num_recall_points=num_recall_points,
verbose=verbose,
ret=False)
#############################################################################################
# Compute the mean average precision.
#############################################################################################
mean_average_precision = self.compute_mean_average_precision(ret=True)
#############################################################################################
# Compile the returns.
if return_precisions or return_recalls or return_average_precisions:
ret = [mean_average_precision]
if return_average_precisions:
ret.append(self.average_precisions)
if return_precisions:
ret.append(self.cumulative_precisions)
if return_recalls:
ret.append(self.cumulative_recalls)
return ret
else:
return mean_average_precision
def predict_on_dataset(self,
img_height,
img_width,
batch_size,
data_generator_mode='resize',
decoding_confidence_thresh=0.01,
decoding_iou_threshold=0.45,
decoding_top_k=200,
decoding_pred_coords='centroids',
decoding_normalize_coords=True,
decoding_border_pixels='include',
round_confidences=False,
verbose=True,
ret=False):
'''
Runs predictions for the given model over the entire dataset given by `data_generator`.
Arguments:
img_height (int): The input image height for the model.
img_width (int): The input image width for the model.
batch_size (int): The batch size for the evaluation.
data_generator_mode (str, optional): Either of 'resize' and 'pad'. If 'resize', the input images will
be resized (i.e. warped) to `(img_height, img_width)`. This mode does not preserve the aspect ratios of the images.
If 'pad', the input images will be first padded so that they have the aspect ratio defined by `img_height`
and `img_width` and then resized to `(img_height, img_width)`. This mode preserves the aspect ratios of the images.
decoding_confidence_thresh (float, optional): Only relevant if the model is in 'training' mode.
A float in [0,1), the minimum classification confidence in a specific positive class in order to be considered
for the non-maximum suppression stage for the respective class. A lower value will result in a larger part of the
selection process being done by the non-maximum suppression stage, while a larger value will result in a larger
part of the selection process happening in the confidence thresholding stage.
decoding_iou_threshold (float, optional): Only relevant if the model is in 'training' mode. A float in [0,1].
All boxes with a Jaccard similarity of greater than `iou_threshold` with a locally maximal box will be removed
from the set of predictions for a given class, where 'maximal' refers to the box score.
decoding_top_k (int, optional): Only relevant if the model is in 'training' mode. The number of highest scoring
predictions to be kept for each batch item after the non-maximum suppression stage.
decoding_input_coords (str, optional): Only relevant if the model is in 'training' mode. The box coordinate format
that the model outputs. Can be either 'centroids' for the format `(cx, cy, w, h)` (box center coordinates, width, and height),
'minmax' for the format `(xmin, xmax, ymin, ymax)`, or 'corners' for the format `(xmin, ymin, xmax, ymax)`.
decoding_normalize_coords (bool, optional): Only relevant if the model is in 'training' mode. Set to `True` if the model
outputs relative coordinates. Do not set this to `True` if the model already outputs absolute coordinates,
as that would result in incorrect coordinates.
round_confidences (int, optional): `False` or an integer that is the number of decimals that the prediction
confidences will be rounded to. If `False`, the confidences will not be rounded.
verbose (bool, optional): If `True`, will print out the progress during runtime.
ret (bool, optional): If `True`, returns the predictions.
Returns:
None by default. Optionally, a nested list containing the predictions for each class.
'''
class_id_pred = self.pred_format['class_id']
conf_pred = self.pred_format['conf']
xmin_pred = self.pred_format['xmin']
ymin_pred = self.pred_format['ymin']
xmax_pred = self.pred_format['xmax']
ymax_pred = self.pred_format['ymax']
#############################################################################################
# Configure the data generator for the evaluation.
#############################################################################################
convert_to_3_channels = ConvertTo3Channels()
resize = Resize(height=img_height,width=img_width, labels_format=self.gt_format)
if data_generator_mode == 'resize':
transformations = [convert_to_3_channels,
resize]
elif data_generator_mode == 'pad':
random_pad = RandomPadFixedAR(patch_aspect_ratio=img_width/img_height, labels_format=self.gt_format)
transformations = [convert_to_3_channels,
random_pad,
resize]
else:
raise ValueError("`data_generator_mode` can be either of 'resize' or 'pad', but received '{}'.".format(data_generator_mode))
# Set the generator parameters.
generator = self.data_generator.generate(batch_size=batch_size,
shuffle=False,
transformations=transformations,
label_encoder=None,
returns={'processed_images',
'image_ids',
'evaluation-neutral',
'inverse_transform',
'original_labels'},
keep_images_without_gt=True,
degenerate_box_handling='remove')
# If we don't have any real image IDs, generate pseudo-image IDs.
# This is just to make the evaluator compatible both with datasets that do and don't
# have image IDs.
if self.data_generator.image_ids is None:
self.data_generator.image_ids = list(range(self.data_generator.get_dataset_size()))
#############################################################################################
# Predict over all batches of the dataset and store the predictions.
#############################################################################################
# We have to generate a separate results list for each class.
results = [list() for _ in range(self.n_classes + 1)]
# Create a dictionary that maps image IDs to ground truth annotations.
# We'll need it below.
image_ids_to_labels = {}
# Compute the number of batches to iterate over the entire dataset.
n_images = self.data_generator.get_dataset_size()
n_batches = int(ceil(n_images / batch_size))
if verbose:
print("Number of images in the evaluation dataset: {}".format(n_images))
print()
tr = trange(n_batches, file=sys.stdout)
tr.set_description('Producing predictions batch-wise')
else:
tr = range(n_batches)
# Loop over all batches.
for j in tr:
# Generate batch.
batch_X, batch_image_ids, batch_eval_neutral, batch_inverse_transforms, batch_orig_labels = next(generator)
# Predict.
y_pred = self.model.predict(batch_X)
# If the model was created in 'training' mode, the raw predictions need to
# be decoded and filtered, otherwise that's already taken care of.
if self.model_mode == 'training':
# Decode.
y_pred = decode_detections(y_pred,
confidence_thresh=decoding_confidence_thresh,
iou_threshold=decoding_iou_threshold,
top_k=decoding_top_k,
input_coords=decoding_pred_coords,
normalize_coords=decoding_normalize_coords,
img_height=img_height,
img_width=img_width,
border_pixels=decoding_border_pixels)
else:
# Filter out the all-zeros dummy elements of `y_pred`.
y_pred_filtered = []
for i in range(len(y_pred)):
y_pred_filtered.append(y_pred[i][y_pred[i,:,0] != 0])
y_pred = y_pred_filtered
# Convert the predicted box coordinates for the original images.
y_pred = apply_inverse_transforms(y_pred, batch_inverse_transforms)
# Iterate over all batch items.
for k, batch_item in enumerate(y_pred):
image_id = batch_image_ids[k]
for box in batch_item:
class_id = int(box[class_id_pred])
# Round the box coordinates to reduce the required memory.
if round_confidences:
confidence = round(box[conf_pred], round_confidences)
else:
confidence = box[conf_pred]
xmin = round(box[xmin_pred], 1)
ymin = round(box[ymin_pred], 1)
xmax = round(box[xmax_pred], 1)
ymax = round(box[ymax_pred], 1)
prediction = (image_id, confidence, xmin, ymin, xmax, ymax)
# Append the predicted box to the results list for its class.
results[class_id].append(prediction)
self.prediction_results = results
if ret:
return results
def write_predictions_to_txt(self,
classes=None,
out_file_prefix='comp3_det_test_',
verbose=True):
'''
Writes the predictions for all classes to separate text files according to the Pascal VOC results format.
Arguments:
classes (list, optional): `None` or a list of strings containing the class names of all classes in the dataset,
including some arbitrary name for the background class. This list will be used to name the output text files.
The ordering of the names in the list represents the ordering of the classes as they are predicted by the model,
i.e. the element with index 3 in this list should correspond to the class with class ID 3 in the model's predictions.
If `None`, the output text files will be named by their class IDs.
out_file_prefix (str, optional): A prefix for the output text file names. The suffix to each output text file name will
be the respective class name followed by the `.txt` file extension. This string is also how you specify the directory
in which the results are to be saved.
verbose (bool, optional): If `True`, will print out the progress during runtime.
Returns:
None.
'''
if self.prediction_results is None:
raise ValueError("There are no prediction results. You must run `predict_on_dataset()` before calling this method.")
# We generate a separate results file for each class.
for class_id in range(1, self.n_classes + 1):
if verbose:
print("Writing results file for class {}/{}.".format(class_id, self.n_classes))
if classes is None:
class_suffix = '{:04d}'.format(class_id)
else:
class_suffix = classes[class_id]
results_file = open('{}{}.txt'.format(out_file_prefix, class_suffix), 'w')
for prediction in self.prediction_results[class_id]:
prediction_list = list(prediction)
prediction_list[0] = '{:06d}'.format(int(prediction_list[0]))
prediction_list[1] = round(prediction_list[1], 4)
prediction_txt = ' '.join(map(str, prediction_list)) + '\n'
results_file.write(prediction_txt)
results_file.close()
if verbose:
print("All results files saved.")
def get_num_gt_per_class(self,
ignore_neutral_boxes=True,
verbose=True,
ret=False):
'''
Counts the number of ground truth boxes for each class across the dataset.
Arguments:
ignore_neutral_boxes (bool, optional): In case the data generator provides annotations indicating whether a ground truth
bounding box is supposed to either count or be neutral for the evaluation, this argument decides what to do with these
annotations. If `True`, only non-neutral ground truth boxes will be counted, otherwise all ground truth boxes will
be counted.
verbose (bool, optional): If `True`, will print out the progress during runtime.
ret (bool, optional): If `True`, returns the list of counts.
Returns:
None by default. Optionally, a list containing a count of the number of ground truth boxes for each class across the
entire dataset.
'''
if self.data_generator.labels is None:
raise ValueError("Computing the number of ground truth boxes per class not possible, no ground truth given.")
num_gt_per_class = np.zeros(shape=(self.n_classes+1), dtype=np.int)
class_id_index = self.gt_format['class_id']
ground_truth = self.data_generator.labels
if verbose:
print('Computing the number of positive ground truth boxes per class.')
tr = trange(len(ground_truth), file=sys.stdout)
else:
tr = range(len(ground_truth))
# Iterate over the ground truth for all images in the dataset.
for i in tr:
boxes = np.asarray(ground_truth[i])
# Iterate over all ground truth boxes for the current image.
for j in range(boxes.shape[0]):
if ignore_neutral_boxes and not (self.data_generator.eval_neutral is None):
if not self.data_generator.eval_neutral[i][j]:
# If this box is not supposed to be evaluation-neutral,
# increment the counter for the respective class ID.
class_id = boxes[j, class_id_index]
num_gt_per_class[class_id] += 1
else:
# If there is no such thing as evaluation-neutral boxes for
# our dataset, always increment the counter for the respective
# class ID.
class_id = boxes[j, class_id_index]
num_gt_per_class[class_id] += 1
self.num_gt_per_class = num_gt_per_class
if ret:
return num_gt_per_class
def match_predictions(self,
ignore_neutral_boxes=True,
matching_iou_threshold=0.5,
border_pixels='include',
sorting_algorithm='quicksort',
verbose=True,
ret=False):
'''
Matches predictions to ground truth boxes.
Note that `predict_on_dataset()` must be called before calling this method.
Arguments:
ignore_neutral_boxes (bool, optional): In case the data generator provides annotations indicating whether a ground truth
bounding box is supposed to either count or be neutral for the evaluation, this argument decides what to do with these
annotations. If `False`, even boxes that are annotated as neutral will be counted into the evaluation. If `True`,
neutral boxes will be ignored for the evaluation. An example for evaluation-neutrality are the ground truth boxes
annotated as "difficult" in the Pascal VOC datasets, which are usually treated as neutral for the evaluation.
matching_iou_threshold (float, optional): A prediction will be considered a true positive if it has a Jaccard overlap
of at least `matching_iou_threshold` with any ground truth bounding box of the same class.
border_pixels (str, optional): How to treat the border pixels of the bounding boxes.
Can be 'include', 'exclude', or 'half'. If 'include', the border pixels belong
to the boxes. If 'exclude', the border pixels do not belong to the boxes.
If 'half', then one of each of the two horizontal and vertical borders belong
to the boxex, but not the other.
sorting_algorithm (str, optional): Which sorting algorithm the matching algorithm should use. This argument accepts
any valid sorting algorithm for Numpy's `argsort()` function. You will usually want to choose between 'quicksort'
(fastest and most memory efficient, but not stable) and 'mergesort' (slight slower and less memory efficient, but stable).
The official Matlab evaluation algorithm uses a stable sorting algorithm, so this algorithm is only guaranteed
to behave identically if you choose 'mergesort' as the sorting algorithm, but it will almost always behave identically
even if you choose 'quicksort' (but no guarantees).
verbose (bool, optional): If `True`, will print out the progress during runtime.
ret (bool, optional): If `True`, returns the true and false positives.
Returns:
None by default. Optionally, four nested lists containing the true positives, false positives, cumulative true positives,
and cumulative false positives for each class.
'''
if self.data_generator.labels is None:
raise ValueError("Matching predictions to ground truth boxes not possible, no ground truth given.")
if self.prediction_results is None:
raise ValueError("There are no prediction results. You must run `predict_on_dataset()` before calling this method.")
class_id_gt = self.gt_format['class_id']
xmin_gt = self.gt_format['xmin']
ymin_gt = self.gt_format['ymin']
xmax_gt = self.gt_format['xmax']
ymax_gt = self.gt_format['ymax']
# Convert the ground truth to a more efficient format for what we need
# to do, which is access ground truth by image ID repeatedly.
ground_truth = {}
eval_neutral_available = not (self.data_generator.eval_neutral is None) # Whether or not we have annotations to decide whether ground truth boxes should be neutral or not.
for i in range(len(self.data_generator.image_ids)):
image_id = str(self.data_generator.image_ids[i])
labels = self.data_generator.labels[i]
if ignore_neutral_boxes and eval_neutral_available:
ground_truth[image_id] = (np.asarray(labels), np.asarray(self.data_generator.eval_neutral[i]))
else:
ground_truth[image_id] = np.asarray(labels)
true_positives = [[]] # The false positives for each class, sorted by descending confidence.
false_positives = [[]] # The true positives for each class, sorted by descending confidence.
cumulative_true_positives = [[]]
cumulative_false_positives = [[]]
# Iterate over all classes.
for class_id in range(1, self.n_classes + 1):
predictions = self.prediction_results[class_id]
# Store the matching results in these lists:
true_pos = np.zeros(len(predictions), dtype=np.int) # 1 for every prediction that is a true positive, 0 otherwise
false_pos = np.zeros(len(predictions), dtype=np.int) # 1 for every prediction that is a false positive, 0 otherwise
# In case there are no predictions at all for this class, we're done here.
if len(predictions) == 0:
print("No predictions for class {}/{}".format(class_id, self.n_classes))
true_positives.append(true_pos)
false_positives.append(false_pos)
continue
# Convert the predictions list for this class into a structured array so that we can sort it by confidence.
# Get the number of characters needed to store the image ID strings in the structured array.
num_chars_per_image_id = len(str(predictions[0][0])) + 6 # Keep a few characters buffer in case some image IDs are longer than others.
# Create the data type for the structured array.
preds_data_type = np.dtype([('image_id', 'U{}'.format(num_chars_per_image_id)),
('confidence', 'f4'),
('xmin', 'f4'),
('ymin', 'f4'),
('xmax', 'f4'),
('ymax', 'f4')])
# Create the structured array
predictions = np.array(predictions, dtype=preds_data_type)
# Sort the detections by decreasing confidence.
descending_indices = np.argsort(-predictions['confidence'], kind=sorting_algorithm)
predictions_sorted = predictions[descending_indices]
if verbose:
tr = trange(len(predictions), file=sys.stdout)
tr.set_description("Matching predictions to ground truth, class {}/{}.".format(class_id, self.n_classes))
else:
tr = range(len(predictions.shape))
# Keep track of which ground truth boxes were already matched to a detection.
gt_matched = {}
# Iterate over all predictions.
for i in tr:
prediction = predictions_sorted[i]
image_id = prediction['image_id']
pred_box = np.asarray(list(prediction[['xmin', 'ymin', 'xmax', 'ymax']])) # Convert the structured array element to a regular array.
# Get the relevant ground truth boxes for this prediction,
# i.e. all ground truth boxes that match the prediction's
# image ID and class ID.
# The ground truth could either be a tuple with `(ground_truth_boxes, eval_neutral_boxes)`
# or only `ground_truth_boxes`.
if ignore_neutral_boxes and eval_neutral_available:
gt, eval_neutral = ground_truth[image_id]
else:
gt = ground_truth[image_id]
gt = np.asarray(gt)
class_mask = gt[:,class_id_gt] == class_id
gt = gt[class_mask]
if ignore_neutral_boxes and eval_neutral_available:
eval_neutral = eval_neutral[class_mask]
if gt.size == 0:
# If the image doesn't contain any objects of this class,
# the prediction becomes a false positive.
false_pos[i] = 1
continue
# Compute the IoU of this prediction with all ground truth boxes of the same class.
overlaps = iou(boxes1=gt[:,[xmin_gt, ymin_gt, xmax_gt, ymax_gt]],
boxes2=pred_box,
coords='corners',
mode='element-wise',
border_pixels=border_pixels)
# For each detection, match the ground truth box with the highest overlap.
# It's possible that the same ground truth box will be matched to multiple
# detections.
gt_match_index = np.argmax(overlaps)
gt_match_overlap = overlaps[gt_match_index]
if gt_match_overlap < matching_iou_threshold:
# False positive, IoU threshold violated:
# Those predictions whose matched overlap is below the threshold become
# false positives.
false_pos[i] = 1
else:
if not (ignore_neutral_boxes and eval_neutral_available) or (eval_neutral[gt_match_index] == False):
# If this is not a ground truth that is supposed to be evaluation-neutral
# (i.e. should be skipped for the evaluation) or if we don't even have the
# concept of neutral boxes.
if not (image_id in gt_matched):
# True positive:
# If the matched ground truth box for this prediction hasn't been matched to a
# different prediction already, we have a true positive.
true_pos[i] = 1
gt_matched[image_id] = np.zeros(shape=(gt.shape[0]), dtype=np.bool)
gt_matched[image_id][gt_match_index] = True
elif not gt_matched[image_id][gt_match_index]:
# True positive:
# If the matched ground truth box for this prediction hasn't been matched to a
# different prediction already, we have a true positive.
true_pos[i] = 1
gt_matched[image_id][gt_match_index] = True
else:
# False positive, duplicate detection:
# If the matched ground truth box for this prediction has already been matched
# to a different prediction previously, it is a duplicate detection for an
# already detected object, which counts as a false positive.
false_pos[i] = 1
true_positives.append(true_pos)
false_positives.append(false_pos)
cumulative_true_pos = np.cumsum(true_pos) # Cumulative sums of the true positives
cumulative_false_pos = np.cumsum(false_pos) # Cumulative sums of the false positives
cumulative_true_positives.append(cumulative_true_pos)
cumulative_false_positives.append(cumulative_false_pos)
self.true_positives = true_positives
self.false_positives = false_positives
self.cumulative_true_positives = cumulative_true_positives
self.cumulative_false_positives = cumulative_false_positives
if ret:
return true_positives, false_positives, cumulative_true_positives, cumulative_false_positives
def compute_precision_recall(self, verbose=True, ret=False):
'''
Computes the precisions and recalls for all classes.
Note that `match_predictions()` must be called before calling this method.
Arguments:
verbose (bool, optional): If `True`, will print out the progress during runtime.
ret (bool, optional): If `True`, returns the precisions and recalls.
Returns:
None by default. Optionally, two nested lists containing the cumulative precisions and recalls for each class.
'''
if (self.cumulative_true_positives is None) or (self.cumulative_false_positives is None):
raise ValueError("True and false positives not available. You must run `match_predictions()` before you call this method.")
if (self.num_gt_per_class is None):
raise ValueError("Number of ground truth boxes per class not available. You must run `get_num_gt_per_class()` before you call this method.")
cumulative_precisions = [[]]
cumulative_recalls = [[]]
# Iterate over all classes.
for class_id in range(1, self.n_classes + 1):
if verbose:
print("Computing precisions and recalls, class {}/{}".format(class_id, self.n_classes))
tp = self.cumulative_true_positives[class_id]
fp = self.cumulative_false_positives[class_id]
cumulative_precision = np.where(tp + fp > 0, tp / (tp + fp), 0) # 1D array with shape `(num_predictions,)`
cumulative_recall = tp / self.num_gt_per_class[class_id] # 1D array with shape `(num_predictions,)`
cumulative_precisions.append(cumulative_precision)
cumulative_recalls.append(cumulative_recall)
self.cumulative_precisions = cumulative_precisions
self.cumulative_recalls = cumulative_recalls
if ret:
return cumulative_precisions, cumulative_recalls
def compute_average_precisions(self, mode='sample', num_recall_points=11, verbose=True, ret=False):
'''
Computes the average precision for each class.
Can compute the Pascal-VOC-style average precision in both the pre-2010 (k-point sampling)
and post-2010 (integration) algorithm versions.
Note that `compute_precision_recall()` must be called before calling this method.
Arguments:
mode (str, optional): Can be either 'sample' or 'integrate'. In the case of 'sample', the average precision will be computed
according to the Pascal VOC formula that was used up until VOC 2009, where the precision will be sampled for `num_recall_points`
recall values. In the case of 'integrate', the average precision will be computed according to the Pascal VOC formula that
was used from VOC 2010 onward, where the average precision will be computed by numerically integrating over the whole
preciscion-recall curve instead of sampling individual points from it. 'integrate' mode is basically just the limit case
of 'sample' mode as the number of sample points increases. For details, see the references below.
num_recall_points (int, optional): Only relevant if mode is 'sample'. The number of points to sample from the precision-recall-curve
to compute the average precisions. In other words, this is the number of equidistant recall values for which the resulting
precision will be computed. 11 points is the value used in the official Pascal VOC pre-2010 detection evaluation algorithm.
verbose (bool, optional): If `True`, will print out the progress during runtime.
ret (bool, optional): If `True`, returns the average precisions.
Returns:
None by default. Optionally, a list containing average precision for each class.
References:
http://host.robots.ox.ac.uk/pascal/VOC/voc2012/htmldoc/devkit_doc.html#sec:ap
'''
if (self.cumulative_precisions is None) or (self.cumulative_recalls is None):
raise ValueError("Precisions and recalls not available. You must run `compute_precision_recall()` before you call this method.")
if not (mode in {'sample', 'integrate'}):
raise ValueError("`mode` can be either 'sample' or 'integrate', but received '{}'".format(mode))
average_precisions = [0.0]
# Iterate over all classes.
for class_id in range(1, self.n_classes + 1):
if verbose:
print("Computing average precision, class {}/{}".format(class_id, self.n_classes))
cumulative_precision = self.cumulative_precisions[class_id]
cumulative_recall = self.cumulative_recalls[class_id]
average_precision = 0.0
if mode == 'sample':
for t in np.linspace(start=0, stop=1, num=num_recall_points, endpoint=True):
cum_prec_recall_greater_t = cumulative_precision[cumulative_recall >= t]
if cum_prec_recall_greater_t.size == 0:
precision = 0.0
else:
precision = np.amax(cum_prec_recall_greater_t)
average_precision += precision
average_precision /= num_recall_points
elif mode == 'integrate':
# We will compute the precision at all unique recall values.
unique_recalls, unique_recall_indices, unique_recall_counts = np.unique(cumulative_recall, return_index=True, return_counts=True)
# Store the maximal precision for each recall value and the absolute difference
# between any two unique recal values in the lists below. The products of these
# two nummbers constitute the rectangular areas whose sum will be our numerical
# integral.
maximal_precisions = np.zeros_like(unique_recalls)
recall_deltas = np.zeros_like(unique_recalls)
# Iterate over all unique recall values in reverse order. This saves a lot of computation:
# For each unique recall value `r`, we want to get the maximal precision value obtained
# for any recall value `r* >= r`. Once we know the maximal precision for the last `k` recall
# values after a given iteration, then in the next iteration, in order compute the maximal
# precisions for the last `l > k` recall values, we only need to compute the maximal precision
# for `l - k` recall values and then take the maximum between that and the previously computed
# maximum instead of computing the maximum over all `l` values.
# We skip the very last recall value, since the precision after between the last recall value
# recall 1.0 is defined to be zero.
for i in range(len(unique_recalls)-2, -1, -1):
begin = unique_recall_indices[i]
end = unique_recall_indices[i + 1]
# When computing the maximal precisions, use the maximum of the previous iteration to
# avoid unnecessary repeated computation over the same precision values.
# The maximal precisions are the heights of the rectangle areas of our integral under
# the precision-recall curve.
maximal_precisions[i] = np.maximum(np.amax(cumulative_precision[begin:end]), maximal_precisions[i + 1])
# The differences between two adjacent recall values are the widths of our rectangle areas.
recall_deltas[i] = unique_recalls[i + 1] - unique_recalls[i]
average_precision = np.sum(maximal_precisions * recall_deltas)
average_precisions.append(average_precision)
self.average_precisions = average_precisions
if ret:
return average_precisions
def compute_mean_average_precision(self, ret=True):
'''
Computes the mean average precision over all classes.
Note that `compute_average_precisions()` must be called before calling this method.
Arguments:
ret (bool, optional): If `True`, returns the mean average precision.
Returns:
A float, the mean average precision, by default. Optionally, None.
'''
if self.average_precisions is None:
raise ValueError("Average precisions not available. You must run `compute_average_precisions()` before you call this method.")
mean_average_precision = np.average(self.average_precisions[1:]) # The first element is for the background class, so skip it.
self.mean_average_precision = mean_average_precision
if ret:
return mean_average_precision

View File

@@ -0,0 +1,200 @@
'''
A few utilities that are useful when working with the MS COCO datasets.
Copyright (C) 2018 Pierluigi Ferrari
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''
import json
from tqdm import trange
from math import ceil
import sys
from data_generator.object_detection_2d_geometric_ops import Resize
from data_generator.object_detection_2d_patch_sampling_ops import RandomPadFixedAR
from data_generator.object_detection_2d_photometric_ops import ConvertTo3Channels
from ssd_encoder_decoder.ssd_output_decoder import decode_detections
from data_generator.object_detection_2d_misc_utils import apply_inverse_transforms
def get_coco_category_maps(annotations_file):
'''
Builds dictionaries that map between MS COCO category IDs, transformed category IDs, and category names.
The original MS COCO category IDs are not consecutive unfortunately: The 80 category IDs are spread
across the integers 1 through 90 with some integers skipped. Since we usually use a one-hot
class representation in neural networks, we need to map these non-consecutive original COCO category
IDs (let's call them 'cats') to consecutive category IDs (let's call them 'classes').
Arguments:
annotations_file (str): The filepath to any MS COCO annotations JSON file.
Returns:
1) cats_to_classes: A dictionary that maps between the original (keys) and the transformed category IDs (values).
2) classes_to_cats: A dictionary that maps between the transformed (keys) and the original category IDs (values).
3) cats_to_names: A dictionary that maps between original category IDs (keys) and the respective category names (values).
4) classes_to_names: A list of the category names (values) with their indices representing the transformed IDs.
'''
with open(annotations_file, 'r') as f:
annotations = json.load(f)
cats_to_classes = {}
classes_to_cats = {}
cats_to_names = {}
classes_to_names = []
classes_to_names.append('background') # Need to add the background class first so that the indexing is right.
for i, cat in enumerate(annotations['categories']):
cats_to_classes[cat['id']] = i + 1
classes_to_cats[i + 1] = cat['id']
cats_to_names[cat['id']] = cat['name']
classes_to_names.append(cat['name'])
return cats_to_classes, classes_to_cats, cats_to_names, classes_to_names
def predict_all_to_json(out_file,
model,
img_height,
img_width,
classes_to_cats,
data_generator,
batch_size,
data_generator_mode='resize',
model_mode='training',
confidence_thresh=0.01,
iou_threshold=0.45,
top_k=200,
pred_coords='centroids',
normalize_coords=True):
'''
Runs detection predictions over the whole dataset given a model and saves them in a JSON file
in the MS COCO detection results format.
Arguments:
out_file (str): The file name (full path) under which to save the results JSON file.
model (Keras model): A Keras SSD model object.
img_height (int): The input image height for the model.
img_width (int): The input image width for the model.
classes_to_cats (dict): A dictionary that maps the consecutive class IDs predicted by the model
to the non-consecutive original MS COCO category IDs.
data_generator (DataGenerator): A `DataGenerator` object with the evaluation dataset.
batch_size (int): The batch size for the evaluation.
data_generator_mode (str, optional): Either of 'resize' or 'pad'. If 'resize', the input images will
be resized (i.e. warped) to `(img_height, img_width)`. This mode does not preserve the aspect ratios of the images.
If 'pad', the input images will be first padded so that they have the aspect ratio defined by `img_height`
and `img_width` and then resized to `(img_height, img_width)`. This mode preserves the aspect ratios of the images.
model_mode (str, optional): The mode in which the model was created, i.e. 'training', 'inference' or 'inference_fast'.
This is needed in order to know whether the model output is already decoded or still needs to be decoded. Refer to
the model documentation for the meaning of the individual modes.
confidence_thresh (float, optional): A float in [0,1), the minimum classification confidence in a specific
positive class in order to be considered for the non-maximum suppression stage for the respective class.
A lower value will result in a larger part of the selection process being done by the non-maximum suppression
stage, while a larger value will result in a larger part of the selection process happening in the confidence
thresholding stage.
iou_threshold (float, optional): A float in [0,1]. All boxes with a Jaccard similarity of greater than `iou_threshold`
with a locally maximal box will be removed from the set of predictions for a given class, where 'maximal' refers
to the box score.
top_k (int, optional): The number of highest scoring predictions to be kept for each batch item after the
non-maximum suppression stage. Defaults to 200, following the paper.
input_coords (str, optional): The box coordinate format that the model outputs. Can be either 'centroids'
for the format `(cx, cy, w, h)` (box center coordinates, width, and height), 'minmax' for the format
`(xmin, xmax, ymin, ymax)`, or 'corners' for the format `(xmin, ymin, xmax, ymax)`.
normalize_coords (bool, optional): Set to `True` if the model outputs relative coordinates (i.e. coordinates in [0,1])
and you wish to transform these relative coordinates back to absolute coordinates. If the model outputs
relative coordinates, but you do not want to convert them back to absolute coordinates, set this to `False`.
Do not set this to `True` if the model already outputs absolute coordinates, as that would result in incorrect
coordinates. Requires `img_height` and `img_width` if set to `True`.
Returns:
None.
'''
convert_to_3_channels = ConvertTo3Channels()
resize = Resize(height=img_height,width=img_width)
if data_generator_mode == 'resize':
transformations = [convert_to_3_channels,
resize]
elif data_generator_mode == 'pad':
random_pad = RandomPadFixedAR(patch_aspect_ratio=img_width/img_height, clip_boxes=False)
transformations = [convert_to_3_channels,
random_pad,
resize]
else:
raise ValueError("Unexpected argument value: `data_generator_mode` can be either of 'resize' or 'pad', but received '{}'.".format(data_generator_mode))
# Set the generator parameters.
generator = data_generator.generate(batch_size=batch_size,
shuffle=False,
transformations=transformations,
label_encoder=None,
returns={'processed_images',
'image_ids',
'inverse_transform'},
keep_images_without_gt=True)
# Put the results in this list.
results = []
# Compute the number of batches to iterate over the entire dataset.
n_images = data_generator.get_dataset_size()
print("Number of images in the evaluation dataset: {}".format(n_images))
n_batches = int(ceil(n_images / batch_size))
# Loop over all batches.
tr = trange(n_batches, file=sys.stdout)
tr.set_description('Producing results file')
for i in tr:
# Generate batch.
batch_X, batch_image_ids, batch_inverse_transforms = next(generator)
# Predict.
y_pred = model.predict(batch_X)
# If the model was created in 'training' mode, the raw predictions need to
# be decoded and filtered, otherwise that's already taken care of.
if model_mode == 'training':
# Decode.
y_pred = decode_detections(y_pred,
confidence_thresh=confidence_thresh,
iou_threshold=iou_threshold,
top_k=top_k,
input_coords=pred_coords,
normalize_coords=normalize_coords,
img_height=img_height,
img_width=img_width)
else:
# Filter out the all-zeros dummy elements of `y_pred`.
y_pred_filtered = []
for i in range(len(y_pred)):
y_pred_filtered.append(y_pred[i][y_pred[i,:,0] != 0])
y_pred = y_pred_filtered
# Convert the predicted box coordinates for the original images.
y_pred = apply_inverse_transforms(y_pred, batch_inverse_transforms)
# Convert each predicted box into the results format.
for k, batch_item in enumerate(y_pred):
for box in batch_item:
class_id = box[0]
# Transform the consecutive class IDs back to the original COCO category IDs.
cat_id = classes_to_cats[class_id]
# Round the box coordinates to reduce the JSON file size.
xmin = float(round(box[2], 1))
ymin = float(round(box[3], 1))
xmax = float(round(box[4], 1))
ymax = float(round(box[5], 1))
width = xmax - xmin
height = ymax - ymin
bbox = [xmin, ymin, width, height]
result = {}
result['image_id'] = batch_image_ids[k]
result['category_id'] = cat_id
result['score'] = float(round(box[1], 3))
result['bbox'] = bbox
results.append(result)
with open(out_file, 'w') as f:
json.dump(results, f)
print("Prediction results saved in '{}'".format(out_file))

View File

@@ -0,0 +1,119 @@
from keras import backend as K
from keras.models import load_model
from keras.optimizers import Adam
#from scipy.misc import imread
import numpy as np
from matplotlib import pyplot as plt
import argparse
import json
from models.keras_ssd300 import ssd_300
from keras_loss_function.keras_ssd_loss import SSDLoss
from keras_layers.keras_layer_AnchorBoxes import AnchorBoxes
from keras_layers.keras_layer_DecodeDetections import DecodeDetections
from keras_layers.keras_layer_DecodeDetectionsFast import DecodeDetectionsFast
from keras_layers.keras_layer_L2Normalization import L2Normalization
from data_generator.object_detection_2d_data_generator import DataGenerator
from eval_utils.average_precision_evaluator import Evaluator
def _main_(args):
config_path = args.conf
with open(config_path) as config_buffer:
config = json.loads(config_buffer.read())
###############################
# Parse the annotations
###############################
path_imgs_test = config['test']['test_image_folder']
path_anns_test = config['test']['test_annot_folder']
labels = config['model']['labels']
categories = {}
#categories = {"Razor": 1, "Gun": 2, "Knife": 3, "Shuriken": 4} #la categoría 0 es la background
for i in range(len(labels)): categories[labels[i]] = i+1
print('\nTraining on: \t' + str(categories) + '\n')
img_height = config['model']['input'] # Height of the model input images
img_width = config['model']['input'] # Width of the model input images
img_channels = 3 # Number of color channels of the model input images
n_classes = len(labels) # Number of positive classes, e.g. 20 for Pascal VOC, 80 for MS COCO
classes = ['background'] + labels
model_mode = 'training'
# TODO: Set the path to the `.h5` file of the model to be loaded.
model_path = config['train']['saved_weights_name']
# We need to create an SSDLoss object in order to pass that to the model loader.
ssd_loss = SSDLoss(neg_pos_ratio=3, alpha=1.0)
K.clear_session() # Clear previous models from memory.
model = load_model(model_path, custom_objects={'AnchorBoxes': AnchorBoxes,
'L2Normalization': L2Normalization,
'DecodeDetections': DecodeDetections,
'compute_loss': ssd_loss.compute_loss})
test_dataset = DataGenerator()
test_dataset.parse_xml(images_dirs= [config['test']['test_image_folder']],
image_set_filenames=[config['test']['test_image_set_filename']],
annotations_dirs=[config['test']['test_annot_folder']],
classes=classes,
include_classes='all',
exclude_truncated=False,
exclude_difficult=False,
ret=False)
evaluator = Evaluator(model=model,
n_classes=n_classes,
data_generator=test_dataset,
model_mode=model_mode)
results = evaluator(img_height=img_height,
img_width=img_width,
batch_size=4,
data_generator_mode='resize',
round_confidences=False,
matching_iou_threshold=0.5,
border_pixels='include',
sorting_algorithm='quicksort',
average_precision_mode='sample',
num_recall_points=11,
ignore_neutral_boxes=True,
return_precisions=True,
return_recalls=True,
return_average_precisions=True,
verbose=True)
mean_average_precision, average_precisions, precisions, recalls = results
total_instances = []
precisions = []
for i in range(1, len(average_precisions)):
print('{:.0f} instances of class'.format(len(recalls[i])),
classes[i], 'with average precision: {:.4f}'.format(average_precisions[i]))
total_instances.append(len(recalls[i]))
precisions.append(average_precisions[i])
if sum(total_instances) == 0:
print('No test instances found.')
return
print('mAP using the weighted average of precisions among classes: {:.4f}'.format(sum([a * b for a, b in zip(total_instances, precisions)]) / sum(total_instances)))
print('mAP: {:.4f}'.format(sum(precisions) / sum(x > 0 for x in total_instances)))
for i in range(1, len(average_precisions)):
print("{:<14}{:<6}{}".format(classes[i], 'AP', round(average_precisions[i], 3)))
print()
print("{:<14}{:<6}{}".format('','mAP', round(mean_average_precision, 3)))
if __name__ == '__main__':
argparser = argparse.ArgumentParser(description='train and evaluate ssd model on any dataset')
argparser.add_argument('-c', '--conf', help='path to configuration file')
args = argparser.parse_args()
_main_(args)

Binary file not shown.

After

Width:  |  Height:  |  Size: 346 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 196 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 151 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 273 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 171 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 238 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 209 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 162 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 212 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 240 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 278 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 325 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 272 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 289 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 307 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 309 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 339 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 341 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 277 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 277 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 353 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 352 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 517 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 520 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 332 KiB

Binary file not shown.

View File

@@ -0,0 +1,278 @@
'''
A custom Keras layer to generate anchor boxes.
Copyright (C) 2018 Pierluigi Ferrari
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''
from __future__ import division
import numpy as np
import keras.backend as K
from keras.engine.topology import InputSpec
from keras.engine.topology import Layer
from bounding_box_utils.bounding_box_utils import convert_coordinates
class AnchorBoxes(Layer):
'''
A Keras layer to create an output tensor containing anchor box coordinates
and variances based on the input tensor and the passed arguments.
A set of 2D anchor boxes of different aspect ratios is created for each spatial unit of
the input tensor. The number of anchor boxes created per unit depends on the arguments
`aspect_ratios` and `two_boxes_for_ar1`, in the default case it is 4. The boxes
are parameterized by the coordinate tuple `(xmin, xmax, ymin, ymax)`.
The logic implemented by this layer is identical to the logic in the module
`ssd_box_encode_decode_utils.py`.
The purpose of having this layer in the network is to make the model self-sufficient
at inference time. Since the model is predicting offsets to the anchor boxes
(rather than predicting absolute box coordinates directly), one needs to know the anchor
box coordinates in order to construct the final prediction boxes from the predicted offsets.
If the model's output tensor did not contain the anchor box coordinates, the necessary
information to convert the predicted offsets back to absolute coordinates would be missing
in the model output. The reason why it is necessary to predict offsets to the anchor boxes
rather than to predict absolute box coordinates directly is explained in `README.md`.
Input shape:
4D tensor of shape `(batch, channels, height, width)` if `dim_ordering = 'th'`
or `(batch, height, width, channels)` if `dim_ordering = 'tf'`.
Output shape:
5D tensor of shape `(batch, height, width, n_boxes, 8)`. The last axis contains
the four anchor box coordinates and the four variance values for each box.
'''
def __init__(self,
img_height,
img_width,
this_scale,
next_scale,
aspect_ratios=[0.5, 1.0, 2.0],
two_boxes_for_ar1=True,
this_steps=None,
this_offsets=None,
clip_boxes=False,
variances=[0.1, 0.1, 0.2, 0.2],
coords='centroids',
normalize_coords=False,
**kwargs):
'''
All arguments need to be set to the same values as in the box encoding process, otherwise the behavior is undefined.
Some of these arguments are explained in more detail in the documentation of the `SSDBoxEncoder` class.
Arguments:
img_height (int): The height of the input images.
img_width (int): The width of the input images.
this_scale (float): A float in [0, 1], the scaling factor for the size of the generated anchor boxes
as a fraction of the shorter side of the input image.
next_scale (float): A float in [0, 1], the next larger scaling factor. Only relevant if
`self.two_boxes_for_ar1 == True`.
aspect_ratios (list, optional): The list of aspect ratios for which default boxes are to be
generated for this layer.
two_boxes_for_ar1 (bool, optional): Only relevant if `aspect_ratios` contains 1.
If `True`, two default boxes will be generated for aspect ratio 1. The first will be generated
using the scaling factor for the respective layer, the second one will be generated using
geometric mean of said scaling factor and next bigger scaling factor.
clip_boxes (bool, optional): If `True`, clips the anchor box coordinates to stay within image boundaries.
variances (list, optional): A list of 4 floats >0. The anchor box offset for each coordinate will be divided by
its respective variance value.
coords (str, optional): The box coordinate format to be used internally in the model (i.e. this is not the input format
of the ground truth labels). Can be either 'centroids' for the format `(cx, cy, w, h)` (box center coordinates, width, and height),
'corners' for the format `(xmin, ymin, xmax, ymax)`, or 'minmax' for the format `(xmin, xmax, ymin, ymax)`.
normalize_coords (bool, optional): Set to `True` if the model uses relative instead of absolute coordinates,
i.e. if the model predicts box coordinates within [0,1] instead of absolute coordinates.
'''
if K.backend() != 'tensorflow':
raise TypeError("This layer only supports TensorFlow at the moment, but you are using the {} backend.".format(K.backend()))
if (this_scale < 0) or (next_scale < 0) or (this_scale > 1):
raise ValueError("`this_scale` must be in [0, 1] and `next_scale` must be >0, but `this_scale` == {}, `next_scale` == {}".format(this_scale, next_scale))
if len(variances) != 4:
raise ValueError("4 variance values must be pased, but {} values were received.".format(len(variances)))
variances = np.array(variances)
if np.any(variances <= 0):
raise ValueError("All variances must be >0, but the variances given are {}".format(variances))
self.img_height = img_height
self.img_width = img_width
self.this_scale = this_scale
self.next_scale = next_scale
self.aspect_ratios = aspect_ratios
self.two_boxes_for_ar1 = two_boxes_for_ar1
self.this_steps = this_steps
self.this_offsets = this_offsets
self.clip_boxes = clip_boxes
self.variances = variances
self.coords = coords
self.normalize_coords = normalize_coords
# Compute the number of boxes per cell
if (1 in aspect_ratios) and two_boxes_for_ar1:
self.n_boxes = len(aspect_ratios) + 1
else:
self.n_boxes = len(aspect_ratios)
super(AnchorBoxes, self).__init__(**kwargs)
def build(self, input_shape):
self.input_spec = [InputSpec(shape=input_shape)]
super(AnchorBoxes, self).build(input_shape)
def call(self, x, mask=None):
'''
Return an anchor box tensor based on the shape of the input tensor.
The logic implemented here is identical to the logic in the module `ssd_box_encode_decode_utils.py`.
Note that this tensor does not participate in any graph computations at runtime. It is being created
as a constant once during graph creation and is just being output along with the rest of the model output
during runtime. Because of this, all logic is implemented as Numpy array operations and it is sufficient
to convert the resulting Numpy array into a Keras tensor at the very end before outputting it.
Arguments:
x (tensor): 4D tensor of shape `(batch, channels, height, width)` if `dim_ordering = 'th'`
or `(batch, height, width, channels)` if `dim_ordering = 'tf'`. The input for this
layer must be the output of the localization predictor layer.
'''
# Compute box width and height for each aspect ratio
# The shorter side of the image will be used to compute `w` and `h` using `scale` and `aspect_ratios`.
size = min(self.img_height, self.img_width)
# Compute the box widths and and heights for all aspect ratios
wh_list = []
for ar in self.aspect_ratios:
if (ar == 1):
# Compute the regular anchor box for aspect ratio 1.
box_height = box_width = self.this_scale * size
wh_list.append((box_width, box_height))
if self.two_boxes_for_ar1:
# Compute one slightly larger version using the geometric mean of this scale value and the next.
box_height = box_width = np.sqrt(self.this_scale * self.next_scale) * size
wh_list.append((box_width, box_height))
else:
box_height = self.this_scale * size / np.sqrt(ar)
box_width = self.this_scale * size * np.sqrt(ar)
wh_list.append((box_width, box_height))
wh_list = np.array(wh_list)
# We need the shape of the input tensor
if K.image_dim_ordering() == 'tf':
batch_size, feature_map_height, feature_map_width, feature_map_channels = x._keras_shape
else: # Not yet relevant since TensorFlow is the only supported backend right now, but it can't harm to have this in here for the future
batch_size, feature_map_channels, feature_map_height, feature_map_width = x._keras_shape
# Compute the grid of box center points. They are identical for all aspect ratios.
# Compute the step sizes, i.e. how far apart the anchor box center points will be vertically and horizontally.
if (self.this_steps is None):
step_height = self.img_height / feature_map_height
step_width = self.img_width / feature_map_width
else:
if isinstance(self.this_steps, (list, tuple)) and (len(self.this_steps) == 2):
step_height = self.this_steps[0]
step_width = self.this_steps[1]
elif isinstance(self.this_steps, (int, float)):
step_height = self.this_steps
step_width = self.this_steps
# Compute the offsets, i.e. at what pixel values the first anchor box center point will be from the top and from the left of the image.
if (self.this_offsets is None):
offset_height = 0.5
offset_width = 0.5
else:
if isinstance(self.this_offsets, (list, tuple)) and (len(self.this_offsets) == 2):
offset_height = self.this_offsets[0]
offset_width = self.this_offsets[1]
elif isinstance(self.this_offsets, (int, float)):
offset_height = self.this_offsets
offset_width = self.this_offsets
# Now that we have the offsets and step sizes, compute the grid of anchor box center points.
cy = np.linspace(offset_height * step_height, (offset_height + feature_map_height - 1) * step_height, feature_map_height)
cx = np.linspace(offset_width * step_width, (offset_width + feature_map_width - 1) * step_width, feature_map_width)
cx_grid, cy_grid = np.meshgrid(cx, cy)
cx_grid = np.expand_dims(cx_grid, -1) # This is necessary for np.tile() to do what we want further down
cy_grid = np.expand_dims(cy_grid, -1) # This is necessary for np.tile() to do what we want further down
# Create a 4D tensor template of shape `(feature_map_height, feature_map_width, n_boxes, 4)`
# where the last dimension will contain `(cx, cy, w, h)`
boxes_tensor = np.zeros((feature_map_height, feature_map_width, self.n_boxes, 4))
boxes_tensor[:, :, :, 0] = np.tile(cx_grid, (1, 1, self.n_boxes)) # Set cx
boxes_tensor[:, :, :, 1] = np.tile(cy_grid, (1, 1, self.n_boxes)) # Set cy
boxes_tensor[:, :, :, 2] = wh_list[:, 0] # Set w
boxes_tensor[:, :, :, 3] = wh_list[:, 1] # Set h
# Convert `(cx, cy, w, h)` to `(xmin, xmax, ymin, ymax)`
boxes_tensor = convert_coordinates(boxes_tensor, start_index=0, conversion='centroids2corners')
# If `clip_boxes` is enabled, clip the coordinates to lie within the image boundaries
if self.clip_boxes:
x_coords = boxes_tensor[:,:,:,[0, 2]]
x_coords[x_coords >= self.img_width] = self.img_width - 1
x_coords[x_coords < 0] = 0
boxes_tensor[:,:,:,[0, 2]] = x_coords
y_coords = boxes_tensor[:,:,:,[1, 3]]
y_coords[y_coords >= self.img_height] = self.img_height - 1
y_coords[y_coords < 0] = 0
boxes_tensor[:,:,:,[1, 3]] = y_coords
# If `normalize_coords` is enabled, normalize the coordinates to be within [0,1]
if self.normalize_coords:
boxes_tensor[:, :, :, [0, 2]] /= self.img_width
boxes_tensor[:, :, :, [1, 3]] /= self.img_height
# TODO: Implement box limiting directly for `(cx, cy, w, h)` so that we don't have to unnecessarily convert back and forth.
if self.coords == 'centroids':
# Convert `(xmin, ymin, xmax, ymax)` back to `(cx, cy, w, h)`.
boxes_tensor = convert_coordinates(boxes_tensor, start_index=0, conversion='corners2centroids', border_pixels='half')
elif self.coords == 'minmax':
# Convert `(xmin, ymin, xmax, ymax)` to `(xmin, xmax, ymin, ymax).
boxes_tensor = convert_coordinates(boxes_tensor, start_index=0, conversion='corners2minmax', border_pixels='half')
# Create a tensor to contain the variances and append it to `boxes_tensor`. This tensor has the same shape
# as `boxes_tensor` and simply contains the same 4 variance values for every position in the last axis.
variances_tensor = np.zeros_like(boxes_tensor) # Has shape `(feature_map_height, feature_map_width, n_boxes, 4)`
variances_tensor += self.variances # Long live broadcasting
# Now `boxes_tensor` becomes a tensor of shape `(feature_map_height, feature_map_width, n_boxes, 8)`
boxes_tensor = np.concatenate((boxes_tensor, variances_tensor), axis=-1)
# Now prepend one dimension to `boxes_tensor` to account for the batch size and tile it along
# The result will be a 5D tensor of shape `(batch_size, feature_map_height, feature_map_width, n_boxes, 8)`
boxes_tensor = np.expand_dims(boxes_tensor, axis=0)
boxes_tensor = K.tile(K.constant(boxes_tensor, dtype='float32'), (K.shape(x)[0], 1, 1, 1, 1))
return boxes_tensor
def compute_output_shape(self, input_shape):
if K.image_dim_ordering() == 'tf':
batch_size, feature_map_height, feature_map_width, feature_map_channels = input_shape
else: # Not yet relevant since TensorFlow is the only supported backend right now, but it can't harm to have this in here for the future
batch_size, feature_map_channels, feature_map_height, feature_map_width = input_shape
return (batch_size, feature_map_height, feature_map_width, self.n_boxes, 8)
def get_config(self):
config = {
'img_height': self.img_height,
'img_width': self.img_width,
'this_scale': self.this_scale,
'next_scale': self.next_scale,
'aspect_ratios': list(self.aspect_ratios),
'two_boxes_for_ar1': self.two_boxes_for_ar1,
'clip_boxes': self.clip_boxes,
'variances': list(self.variances),
'coords': self.coords,
'normalize_coords': self.normalize_coords
}
base_config = super(AnchorBoxes, self).get_config()
return dict(list(base_config.items()) + list(config.items()))

View File

@@ -0,0 +1,283 @@
'''
A custom Keras layer to decode the raw SSD prediction output. Corresponds to the
`DetectionOutput` layer type in the original Caffe implementation of SSD.
Copyright (C) 2018 Pierluigi Ferrari
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''
from __future__ import division
import numpy as np
import tensorflow as tf
import keras.backend as K
from keras.engine.topology import InputSpec
from keras.engine.topology import Layer
class DecodeDetections(Layer):
'''
A Keras layer to decode the raw SSD prediction output.
Input shape:
3D tensor of shape `(batch_size, n_boxes, n_classes + 12)`.
Output shape:
3D tensor of shape `(batch_size, top_k, 6)`.
'''
def __init__(self,
confidence_thresh=0.01,
iou_threshold=0.45,
top_k=200,
nms_max_output_size=400,
coords='centroids',
normalize_coords=True,
img_height=None,
img_width=None,
**kwargs):
'''
All default argument values follow the Caffe implementation.
Arguments:
confidence_thresh (float, optional): A float in [0,1), the minimum classification confidence in a specific
positive class in order to be considered for the non-maximum suppression stage for the respective class.
A lower value will result in a larger part of the selection process being done by the non-maximum suppression
stage, while a larger value will result in a larger part of the selection process happening in the confidence
thresholding stage.
iou_threshold (float, optional): A float in [0,1]. All boxes with a Jaccard similarity of greater than `iou_threshold`
with a locally maximal box will be removed from the set of predictions for a given class, where 'maximal' refers
to the box score.
top_k (int, optional): The number of highest scoring predictions to be kept for each batch item after the
non-maximum suppression stage.
nms_max_output_size (int, optional): The maximum number of predictions that will be left after performing non-maximum
suppression.
coords (str, optional): The box coordinate format that the model outputs. Must be 'centroids'
i.e. the format `(cx, cy, w, h)` (box center coordinates, width, and height). Other coordinate formats are
currently not supported.
normalize_coords (bool, optional): Set to `True` if the model outputs relative coordinates (i.e. coordinates in [0,1])
and you wish to transform these relative coordinates back to absolute coordinates. If the model outputs
relative coordinates, but you do not want to convert them back to absolute coordinates, set this to `False`.
Do not set this to `True` if the model already outputs absolute coordinates, as that would result in incorrect
coordinates. Requires `img_height` and `img_width` if set to `True`.
img_height (int, optional): The height of the input images. Only needed if `normalize_coords` is `True`.
img_width (int, optional): The width of the input images. Only needed if `normalize_coords` is `True`.
'''
if K.backend() != 'tensorflow':
raise TypeError("This layer only supports TensorFlow at the moment, but you are using the {} backend.".format(K.backend()))
if normalize_coords and ((img_height is None) or (img_width is None)):
raise ValueError("If relative box coordinates are supposed to be converted to absolute coordinates, the decoder needs the image size in order to decode the predictions, but `img_height == {}` and `img_width == {}`".format(img_height, img_width))
if coords != 'centroids':
raise ValueError("The DetectionOutput layer currently only supports the 'centroids' coordinate format.")
# We need these members for the config.
self.confidence_thresh = confidence_thresh
self.iou_threshold = iou_threshold
self.top_k = top_k
self.normalize_coords = normalize_coords
self.img_height = img_height
self.img_width = img_width
self.coords = coords
self.nms_max_output_size = nms_max_output_size
# We need these members for TensorFlow.
self.tf_confidence_thresh = tf.constant(self.confidence_thresh, name='confidence_thresh')
self.tf_iou_threshold = tf.constant(self.iou_threshold, name='iou_threshold')
self.tf_top_k = tf.constant(self.top_k, name='top_k')
self.tf_normalize_coords = tf.constant(self.normalize_coords, name='normalize_coords')
self.tf_img_height = tf.constant(self.img_height, dtype=tf.float32, name='img_height')
self.tf_img_width = tf.constant(self.img_width, dtype=tf.float32, name='img_width')
self.tf_nms_max_output_size = tf.constant(self.nms_max_output_size, name='nms_max_output_size')
super(DecodeDetections, self).__init__(**kwargs)
def build(self, input_shape):
self.input_spec = [InputSpec(shape=input_shape)]
super(DecodeDetections, self).build(input_shape)
def call(self, y_pred, mask=None):
'''
Returns:
3D tensor of shape `(batch_size, top_k, 6)`. The second axis is zero-padded
to always yield `top_k` predictions per batch item. The last axis contains
the coordinates for each predicted box in the format
`[class_id, confidence, xmin, ymin, xmax, ymax]`.
'''
#####################################################################################
# 1. Convert the box coordinates from predicted anchor box offsets to predicted
# absolute coordinates
#####################################################################################
# Convert anchor box offsets to image offsets.
cx = y_pred[...,-12] * y_pred[...,-4] * y_pred[...,-6] + y_pred[...,-8] # cx = cx_pred * cx_variance * w_anchor + cx_anchor
cy = y_pred[...,-11] * y_pred[...,-3] * y_pred[...,-5] + y_pred[...,-7] # cy = cy_pred * cy_variance * h_anchor + cy_anchor
w = tf.exp(y_pred[...,-10] * y_pred[...,-2]) * y_pred[...,-6] # w = exp(w_pred * variance_w) * w_anchor
h = tf.exp(y_pred[...,-9] * y_pred[...,-1]) * y_pred[...,-5] # h = exp(h_pred * variance_h) * h_anchor
# Convert 'centroids' to 'corners'.
xmin = cx - 0.5 * w
ymin = cy - 0.5 * h
xmax = cx + 0.5 * w
ymax = cy + 0.5 * h
# If the model predicts box coordinates relative to the image dimensions and they are supposed
# to be converted back to absolute coordinates, do that.
def normalized_coords():
xmin1 = tf.expand_dims(xmin * self.tf_img_width, axis=-1)
ymin1 = tf.expand_dims(ymin * self.tf_img_height, axis=-1)
xmax1 = tf.expand_dims(xmax * self.tf_img_width, axis=-1)
ymax1 = tf.expand_dims(ymax * self.tf_img_height, axis=-1)
return xmin1, ymin1, xmax1, ymax1
def non_normalized_coords():
return tf.expand_dims(xmin, axis=-1), tf.expand_dims(ymin, axis=-1), tf.expand_dims(xmax, axis=-1), tf.expand_dims(ymax, axis=-1)
xmin, ymin, xmax, ymax = tf.cond(self.tf_normalize_coords, normalized_coords, non_normalized_coords)
# Concatenate the one-hot class confidences and the converted box coordinates to form the decoded predictions tensor.
y_pred = tf.concat(values=[y_pred[...,:-12], xmin, ymin, xmax, ymax], axis=-1)
#####################################################################################
# 2. Perform confidence thresholding, per-class non-maximum suppression, and
# top-k filtering.
#####################################################################################
batch_size = tf.shape(y_pred)[0] # Output dtype: tf.int32
n_boxes = tf.shape(y_pred)[1]
n_classes = y_pred.shape[2] - 4
class_indices = tf.range(1, n_classes)
# Create a function that filters the predictions for the given batch item. Specifically, it performs:
# - confidence thresholding
# - non-maximum suppression (NMS)
# - top-k filtering
def filter_predictions(batch_item):
# Create a function that filters the predictions for one single class.
def filter_single_class(index):
# From a tensor of shape (n_boxes, n_classes + 4 coordinates) extract
# a tensor of shape (n_boxes, 1 + 4 coordinates) that contains the
# confidnece values for just one class, determined by `index`.
confidences = tf.expand_dims(batch_item[..., index], axis=-1)
class_id = tf.fill(dims=tf.shape(confidences), value=tf.to_float(index))
box_coordinates = batch_item[...,-4:]
single_class = tf.concat([class_id, confidences, box_coordinates], axis=-1)
# Apply confidence thresholding with respect to the class defined by `index`.
threshold_met = single_class[:,1] > self.tf_confidence_thresh
single_class = tf.boolean_mask(tensor=single_class,
mask=threshold_met)
# If any boxes made the threshold, perform NMS.
def perform_nms():
scores = single_class[...,1]
# `tf.image.non_max_suppression()` needs the box coordinates in the format `(ymin, xmin, ymax, xmax)`.
xmin = tf.expand_dims(single_class[...,-4], axis=-1)
ymin = tf.expand_dims(single_class[...,-3], axis=-1)
xmax = tf.expand_dims(single_class[...,-2], axis=-1)
ymax = tf.expand_dims(single_class[...,-1], axis=-1)
boxes = tf.concat(values=[ymin, xmin, ymax, xmax], axis=-1)
maxima_indices = tf.image.non_max_suppression(boxes=boxes,
scores=scores,
max_output_size=self.tf_nms_max_output_size,
iou_threshold=self.iou_threshold,
name='non_maximum_suppresion')
maxima = tf.gather(params=single_class,
indices=maxima_indices,
axis=0)
return maxima
def no_confident_predictions():
return tf.constant(value=0.0, shape=(1,6))
single_class_nms = tf.cond(tf.equal(tf.size(single_class), 0), no_confident_predictions, perform_nms)
# Make sure `single_class` is exactly `self.nms_max_output_size` elements long.
padded_single_class = tf.pad(tensor=single_class_nms,
paddings=[[0, self.tf_nms_max_output_size - tf.shape(single_class_nms)[0]], [0, 0]],
mode='CONSTANT',
constant_values=0.0)
return padded_single_class
# Iterate `filter_single_class()` over all class indices.
filtered_single_classes = tf.map_fn(fn=lambda i: filter_single_class(i),
elems=tf.range(1,n_classes),
dtype=tf.float32,
parallel_iterations=128,
back_prop=False,
swap_memory=False,
infer_shape=True,
name='loop_over_classes')
# Concatenate the filtered results for all individual classes to one tensor.
filtered_predictions = tf.reshape(tensor=filtered_single_classes, shape=(-1,6))
# Perform top-k filtering for this batch item or pad it in case there are
# fewer than `self.top_k` boxes left at this point. Either way, produce a
# tensor of length `self.top_k`. By the time we return the final results tensor
# for the whole batch, all batch items must have the same number of predicted
# boxes so that the tensor dimensions are homogenous. If fewer than `self.top_k`
# predictions are left after the filtering process above, we pad the missing
# predictions with zeros as dummy entries.
def top_k():
return tf.gather(params=filtered_predictions,
indices=tf.nn.top_k(filtered_predictions[:, 1], k=self.tf_top_k, sorted=True).indices,
axis=0)
def pad_and_top_k():
padded_predictions = tf.pad(tensor=filtered_predictions,
paddings=[[0, self.tf_top_k - tf.shape(filtered_predictions)[0]], [0, 0]],
mode='CONSTANT',
constant_values=0.0)
return tf.gather(params=padded_predictions,
indices=tf.nn.top_k(padded_predictions[:, 1], k=self.tf_top_k, sorted=True).indices,
axis=0)
top_k_boxes = tf.cond(tf.greater_equal(tf.shape(filtered_predictions)[0], self.tf_top_k), top_k, pad_and_top_k)
return top_k_boxes
# Iterate `filter_predictions()` over all batch items.
output_tensor = tf.map_fn(fn=lambda x: filter_predictions(x),
elems=y_pred,
dtype=None,
parallel_iterations=128,
back_prop=False,
swap_memory=False,
infer_shape=True,
name='loop_over_batch')
return output_tensor
def compute_output_shape(self, input_shape):
batch_size, n_boxes, last_axis = input_shape
return (batch_size, self.tf_top_k, 6) # Last axis: (class_ID, confidence, 4 box coordinates)
def get_config(self):
config = {
'confidence_thresh': self.confidence_thresh,
'iou_threshold': self.iou_threshold,
'top_k': self.top_k,
'nms_max_output_size': self.nms_max_output_size,
'coords': self.coords,
'normalize_coords': self.normalize_coords,
'img_height': self.img_height,
'img_width': self.img_width,
}
base_config = super(DecodeDetections, self).get_config()
return dict(list(base_config.items()) + list(config.items()))

View File

@@ -0,0 +1,266 @@
'''
A custom Keras layer to decode the raw SSD prediction output. This is a modified
and more efficient version of the `DetectionOutput` layer type in the original Caffe
implementation of SSD. For a faithful replication of the original layer, please
refer to the `DecodeDetections` layer.
Copyright (C) 2018 Pierluigi Ferrari
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''
from __future__ import division
import numpy as np
import tensorflow as tf
import keras.backend as K
from keras.engine.topology import InputSpec
from keras.engine.topology import Layer
class DecodeDetectionsFast(Layer):
'''
A Keras layer to decode the raw SSD prediction output.
Input shape:
3D tensor of shape `(batch_size, n_boxes, n_classes + 12)`.
Output shape:
3D tensor of shape `(batch_size, top_k, 6)`.
'''
def __init__(self,
confidence_thresh=0.01,
iou_threshold=0.45,
top_k=200,
nms_max_output_size=400,
coords='centroids',
normalize_coords=True,
img_height=None,
img_width=None,
**kwargs):
'''
All default argument values follow the Caffe implementation.
Arguments:
confidence_thresh (float, optional): A float in [0,1), the minimum classification confidence in a specific
positive class in order to be considered for the non-maximum suppression stage for the respective class.
A lower value will result in a larger part of the selection process being done by the non-maximum suppression
stage, while a larger value will result in a larger part of the selection process happening in the confidence
thresholding stage.
iou_threshold (float, optional): A float in [0,1]. All boxes with a Jaccard similarity of greater than `iou_threshold`
with a locally maximal box will be removed from the set of predictions for a given class, where 'maximal' refers
to the box score.
top_k (int, optional): The number of highest scoring predictions to be kept for each batch item after the
non-maximum suppression stage.
nms_max_output_size (int, optional): The maximum number of predictions that will be left after performing non-maximum
suppression.
coords (str, optional): The box coordinate format that the model outputs. Must be 'centroids'
i.e. the format `(cx, cy, w, h)` (box center coordinates, width, and height). Other coordinate formats are
currently not supported.
normalize_coords (bool, optional): Set to `True` if the model outputs relative coordinates (i.e. coordinates in [0,1])
and you wish to transform these relative coordinates back to absolute coordinates. If the model outputs
relative coordinates, but you do not want to convert them back to absolute coordinates, set this to `False`.
Do not set this to `True` if the model already outputs absolute coordinates, as that would result in incorrect
coordinates. Requires `img_height` and `img_width` if set to `True`.
img_height (int, optional): The height of the input images. Only needed if `normalize_coords` is `True`.
img_width (int, optional): The width of the input images. Only needed if `normalize_coords` is `True`.
'''
if K.backend() != 'tensorflow':
raise TypeError("This layer only supports TensorFlow at the moment, but you are using the {} backend.".format(K.backend()))
if normalize_coords and ((img_height is None) or (img_width is None)):
raise ValueError("If relative box coordinates are supposed to be converted to absolute coordinates, the decoder needs the image size in order to decode the predictions, but `img_height == {}` and `img_width == {}`".format(img_height, img_width))
if coords != 'centroids':
raise ValueError("The DetectionOutput layer currently only supports the 'centroids' coordinate format.")
# We need these members for the config.
self.confidence_thresh = confidence_thresh
self.iou_threshold = iou_threshold
self.top_k = top_k
self.normalize_coords = normalize_coords
self.img_height = img_height
self.img_width = img_width
self.coords = coords
self.nms_max_output_size = nms_max_output_size
# We need these members for TensorFlow.
self.tf_confidence_thresh = tf.constant(self.confidence_thresh, name='confidence_thresh')
self.tf_iou_threshold = tf.constant(self.iou_threshold, name='iou_threshold')
self.tf_top_k = tf.constant(self.top_k, name='top_k')
self.tf_normalize_coords = tf.constant(self.normalize_coords, name='normalize_coords')
self.tf_img_height = tf.constant(self.img_height, dtype=tf.float32, name='img_height')
self.tf_img_width = tf.constant(self.img_width, dtype=tf.float32, name='img_width')
self.tf_nms_max_output_size = tf.constant(self.nms_max_output_size, name='nms_max_output_size')
super(DecodeDetectionsFast, self).__init__(**kwargs)
def build(self, input_shape):
self.input_spec = [InputSpec(shape=input_shape)]
super(DecodeDetectionsFast, self).build(input_shape)
def call(self, y_pred, mask=None):
'''
Returns:
3D tensor of shape `(batch_size, top_k, 6)`. The second axis is zero-padded
to always yield `top_k` predictions per batch item. The last axis contains
the coordinates for each predicted box in the format
`[class_id, confidence, xmin, ymin, xmax, ymax]`.
'''
#####################################################################################
# 1. Convert the box coordinates from predicted anchor box offsets to predicted
# absolute coordinates
#####################################################################################
# Extract the predicted class IDs as the indices of the highest confidence values.
class_ids = tf.expand_dims(tf.to_float(tf.argmax(y_pred[...,:-12], axis=-1)), axis=-1)
# Extract the confidences of the maximal classes.
confidences = tf.reduce_max(y_pred[...,:-12], axis=-1, keep_dims=True)
# Convert anchor box offsets to image offsets.
cx = y_pred[...,-12] * y_pred[...,-4] * y_pred[...,-6] + y_pred[...,-8] # cx = cx_pred * cx_variance * w_anchor + cx_anchor
cy = y_pred[...,-11] * y_pred[...,-3] * y_pred[...,-5] + y_pred[...,-7] # cy = cy_pred * cy_variance * h_anchor + cy_anchor
w = tf.exp(y_pred[...,-10] * y_pred[...,-2]) * y_pred[...,-6] # w = exp(w_pred * variance_w) * w_anchor
h = tf.exp(y_pred[...,-9] * y_pred[...,-1]) * y_pred[...,-5] # h = exp(h_pred * variance_h) * h_anchor
# Convert 'centroids' to 'corners'.
xmin = cx - 0.5 * w
ymin = cy - 0.5 * h
xmax = cx + 0.5 * w
ymax = cy + 0.5 * h
# If the model predicts box coordinates relative to the image dimensions and they are supposed
# to be converted back to absolute coordinates, do that.
def normalized_coords():
xmin1 = tf.expand_dims(xmin * self.tf_img_width, axis=-1)
ymin1 = tf.expand_dims(ymin * self.tf_img_height, axis=-1)
xmax1 = tf.expand_dims(xmax * self.tf_img_width, axis=-1)
ymax1 = tf.expand_dims(ymax * self.tf_img_height, axis=-1)
return xmin1, ymin1, xmax1, ymax1
def non_normalized_coords():
return tf.expand_dims(xmin, axis=-1), tf.expand_dims(ymin, axis=-1), tf.expand_dims(xmax, axis=-1), tf.expand_dims(ymax, axis=-1)
xmin, ymin, xmax, ymax = tf.cond(self.tf_normalize_coords, normalized_coords, non_normalized_coords)
# Concatenate the one-hot class confidences and the converted box coordinates to form the decoded predictions tensor.
y_pred = tf.concat(values=[class_ids, confidences, xmin, ymin, xmax, ymax], axis=-1)
#####################################################################################
# 2. Perform confidence thresholding, non-maximum suppression, and top-k filtering.
#####################################################################################
batch_size = tf.shape(y_pred)[0] # Output dtype: tf.int32
n_boxes = tf.shape(y_pred)[1]
n_classes = y_pred.shape[2] - 4
class_indices = tf.range(1, n_classes)
# Create a function that filters the predictions for the given batch item. Specifically, it performs:
# - confidence thresholding
# - non-maximum suppression (NMS)
# - top-k filtering
def filter_predictions(batch_item):
# Keep only the non-background boxes.
positive_boxes = tf.not_equal(batch_item[...,0], 0.0)
predictions = tf.boolean_mask(tensor=batch_item,
mask=positive_boxes)
def perform_confidence_thresholding():
# Apply confidence thresholding.
threshold_met = predictions[:,1] > self.tf_confidence_thresh
return tf.boolean_mask(tensor=predictions,
mask=threshold_met)
def no_positive_boxes():
return tf.constant(value=0.0, shape=(1,6))
# If there are any positive predictions, perform confidence thresholding.
predictions_conf_thresh = tf.cond(tf.equal(tf.size(predictions), 0), no_positive_boxes, perform_confidence_thresholding)
def perform_nms():
scores = predictions_conf_thresh[...,1]
# `tf.image.non_max_suppression()` needs the box coordinates in the format `(ymin, xmin, ymax, xmax)`.
xmin = tf.expand_dims(predictions_conf_thresh[...,-4], axis=-1)
ymin = tf.expand_dims(predictions_conf_thresh[...,-3], axis=-1)
xmax = tf.expand_dims(predictions_conf_thresh[...,-2], axis=-1)
ymax = tf.expand_dims(predictions_conf_thresh[...,-1], axis=-1)
boxes = tf.concat(values=[ymin, xmin, ymax, xmax], axis=-1)
maxima_indices = tf.image.non_max_suppression(boxes=boxes,
scores=scores,
max_output_size=self.tf_nms_max_output_size,
iou_threshold=self.iou_threshold,
name='non_maximum_suppresion')
maxima = tf.gather(params=predictions_conf_thresh,
indices=maxima_indices,
axis=0)
return maxima
def no_confident_predictions():
return tf.constant(value=0.0, shape=(1,6))
# If any boxes made the threshold, perform NMS.
predictions_nms = tf.cond(tf.equal(tf.size(predictions_conf_thresh), 0), no_confident_predictions, perform_nms)
# Perform top-k filtering for this batch item or pad it in case there are
# fewer than `self.top_k` boxes left at this point. Either way, produce a
# tensor of length `self.top_k`. By the time we return the final results tensor
# for the whole batch, all batch items must have the same number of predicted
# boxes so that the tensor dimensions are homogenous. If fewer than `self.top_k`
# predictions are left after the filtering process above, we pad the missing
# predictions with zeros as dummy entries.
def top_k():
return tf.gather(params=predictions_nms,
indices=tf.nn.top_k(predictions_nms[:, 1], k=self.tf_top_k, sorted=True).indices,
axis=0)
def pad_and_top_k():
padded_predictions = tf.pad(tensor=predictions_nms,
paddings=[[0, self.tf_top_k - tf.shape(predictions_nms)[0]], [0, 0]],
mode='CONSTANT',
constant_values=0.0)
return tf.gather(params=padded_predictions,
indices=tf.nn.top_k(padded_predictions[:, 1], k=self.tf_top_k, sorted=True).indices,
axis=0)
top_k_boxes = tf.cond(tf.greater_equal(tf.shape(predictions_nms)[0], self.tf_top_k), top_k, pad_and_top_k)
return top_k_boxes
# Iterate `filter_predictions()` over all batch items.
output_tensor = tf.map_fn(fn=lambda x: filter_predictions(x),
elems=y_pred,
dtype=None,
parallel_iterations=128,
back_prop=False,
swap_memory=False,
infer_shape=True,
name='loop_over_batch')
return output_tensor
def compute_output_shape(self, input_shape):
batch_size, n_boxes, last_axis = input_shape
return (batch_size, self.tf_top_k, 6) # Last axis: (class_ID, confidence, 4 box coordinates)
def get_config(self):
config = {
'confidence_thresh': self.confidence_thresh,
'iou_threshold': self.iou_threshold,
'top_k': self.top_k,
'nms_max_output_size': self.nms_max_output_size,
'coords': self.coords,
'normalize_coords': self.normalize_coords,
'img_height': self.img_height,
'img_width': self.img_width,
}
base_config = super(DecodeDetectionsFast, self).get_config()
return dict(list(base_config.items()) + list(config.items()))

View File

@@ -0,0 +1,70 @@
'''
A custom Keras layer to perform L2-normalization.
Copyright (C) 2018 Pierluigi Ferrari
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''
from __future__ import division
import numpy as np
import keras.backend as K
from keras.engine.topology import InputSpec
from keras.engine.topology import Layer
class L2Normalization(Layer):
'''
Performs L2 normalization on the input tensor with a learnable scaling parameter
as described in the paper "Parsenet: Looking Wider to See Better" (see references)
and as used in the original SSD model.
Arguments:
gamma_init (int): The initial scaling parameter. Defaults to 20 following the
SSD paper.
Input shape:
4D tensor of shape `(batch, channels, height, width)` if `dim_ordering = 'th'`
or `(batch, height, width, channels)` if `dim_ordering = 'tf'`.
Returns:
The scaled tensor. Same shape as the input tensor.
References:
http://cs.unc.edu/~wliu/papers/parsenet.pdf
'''
def __init__(self, gamma_init=20, **kwargs):
if K.image_dim_ordering() == 'tf':
self.axis = 3
else:
self.axis = 1
self.gamma_init = gamma_init
super(L2Normalization, self).__init__(**kwargs)
def build(self, input_shape):
self.input_spec = [InputSpec(shape=input_shape)]
gamma = self.gamma_init * np.ones((input_shape[self.axis],))
self.gamma = K.variable(gamma, name='{}_gamma'.format(self.name))
self.trainable_weights = [self.gamma]
super(L2Normalization, self).build(input_shape)
def call(self, x, mask=None):
output = K.l2_normalize(x, self.axis)
return output * self.gamma
def get_config(self):
config = {
'gamma_init': self.gamma_init
}
base_config = super(L2Normalization, self).get_config()
return dict(list(base_config.items()) + list(config.items()))

Binary file not shown.

View File

@@ -0,0 +1,211 @@
'''
The Keras-compatible loss function for the SSD model. Currently supports TensorFlow only.
Copyright (C) 2018 Pierluigi Ferrari
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''
from __future__ import division
import tensorflow as tf
class SSDLoss:
'''
The SSD loss, see https://arxiv.org/abs/1512.02325.
'''
def __init__(self,
neg_pos_ratio=3,
n_neg_min=0,
alpha=1.0):
'''
Arguments:
neg_pos_ratio (int, optional): The maximum ratio of negative (i.e. background)
to positive ground truth boxes to include in the loss computation.
There are no actual background ground truth boxes of course, but `y_true`
contains anchor boxes labeled with the background class. Since
the number of background boxes in `y_true` will usually exceed
the number of positive boxes by far, it is necessary to balance
their influence on the loss. Defaults to 3 following the paper.
n_neg_min (int, optional): The minimum number of negative ground truth boxes to
enter the loss computation *per batch*. This argument can be used to make
sure that the model learns from a minimum number of negatives in batches
in which there are very few, or even none at all, positive ground truth
boxes. It defaults to 0 and if used, it should be set to a value that
stands in reasonable proportion to the batch size used for training.
alpha (float, optional): A factor to weight the localization loss in the
computation of the total loss. Defaults to 1.0 following the paper.
'''
self.neg_pos_ratio = neg_pos_ratio
self.n_neg_min = n_neg_min
self.alpha = alpha
def smooth_L1_loss(self, y_true, y_pred):
'''
Compute smooth L1 loss, see references.
Arguments:
y_true (nD tensor): A TensorFlow tensor of any shape containing the ground truth data.
In this context, the expected tensor has shape `(batch_size, #boxes, 4)` and
contains the ground truth bounding box coordinates, where the last dimension
contains `(xmin, xmax, ymin, ymax)`.
y_pred (nD tensor): A TensorFlow tensor of identical structure to `y_true` containing
the predicted data, in this context the predicted bounding box coordinates.
Returns:
The smooth L1 loss, a nD-1 Tensorflow tensor. In this context a 2D tensor
of shape (batch, n_boxes_total).
References:
https://arxiv.org/abs/1504.08083
'''
absolute_loss = tf.abs(y_true - y_pred)
square_loss = 0.5 * (y_true - y_pred)**2
l1_loss = tf.where(tf.less(absolute_loss, 1.0), square_loss, absolute_loss - 0.5)
return tf.reduce_sum(l1_loss, axis=-1)
def log_loss(self, y_true, y_pred):
'''
Compute the softmax log loss.
Arguments:
y_true (nD tensor): A TensorFlow tensor of any shape containing the ground truth data.
In this context, the expected tensor has shape (batch_size, #boxes, #classes)
and contains the ground truth bounding box categories.
y_pred (nD tensor): A TensorFlow tensor of identical structure to `y_true` containing
the predicted data, in this context the predicted bounding box categories.
Returns:
The softmax log loss, a nD-1 Tensorflow tensor. In this context a 2D tensor
of shape (batch, n_boxes_total).
'''
# Make sure that `y_pred` doesn't contain any zeros (which would break the log function)
y_pred = tf.maximum(y_pred, 1e-15)
# Compute the log loss
log_loss = -tf.reduce_sum(y_true * tf.log(y_pred), axis=-1)
return log_loss
def compute_loss(self, y_true, y_pred):
'''
Compute the loss of the SSD model prediction against the ground truth.
Arguments:
y_true (array): A Numpy array of shape `(batch_size, #boxes, #classes + 12)`,
where `#boxes` is the total number of boxes that the model predicts
per image. Be careful to make sure that the index of each given
box in `y_true` is the same as the index for the corresponding
box in `y_pred`. The last axis must have length `#classes + 12` and contain
`[classes one-hot encoded, 4 ground truth box coordinate offsets, 8 arbitrary entries]`
in this order, including the background class. The last eight entries of the
last axis are not used by this function and therefore their contents are
irrelevant, they only exist so that `y_true` has the same shape as `y_pred`,
where the last four entries of the last axis contain the anchor box
coordinates, which are needed during inference. Important: Boxes that
you want the cost function to ignore need to have a one-hot
class vector of all zeros.
y_pred (Keras tensor): The model prediction. The shape is identical
to that of `y_true`, i.e. `(batch_size, #boxes, #classes + 12)`.
The last axis must contain entries in the format
`[classes one-hot encoded, 4 predicted box coordinate offsets, 8 arbitrary entries]`.
Returns:
A scalar, the total multitask loss for classification and localization.
'''
self.neg_pos_ratio = tf.constant(self.neg_pos_ratio)
self.n_neg_min = tf.constant(self.n_neg_min)
self.alpha = tf.constant(self.alpha)
batch_size = tf.shape(y_pred)[0] # Output dtype: tf.int32
n_boxes = tf.shape(y_pred)[1] # Output dtype: tf.int32, note that `n_boxes` in this context denotes the total number of boxes per image, not the number of boxes per cell.
# 1: Compute the losses for class and box predictions for every box.
classification_loss = tf.to_float(self.log_loss(y_true[:,:,:-12], y_pred[:,:,:-12])) # Output shape: (batch_size, n_boxes)
localization_loss = tf.to_float(self.smooth_L1_loss(y_true[:,:,-12:-8], y_pred[:,:,-12:-8])) # Output shape: (batch_size, n_boxes)
# 2: Compute the classification losses for the positive and negative targets.
# Create masks for the positive and negative ground truth classes.
negatives = y_true[:,:,0] # Tensor of shape (batch_size, n_boxes)
positives = tf.to_float(tf.reduce_max(y_true[:,:,1:-12], axis=-1)) # Tensor of shape (batch_size, n_boxes)
# Count the number of positive boxes (classes 1 to n) in y_true across the whole batch.
n_positive = tf.reduce_sum(positives)
# Now mask all negative boxes and sum up the losses for the positive boxes PER batch item
# (Keras loss functions must output one scalar loss value PER batch item, rather than just
# one scalar for the entire batch, that's why we're not summing across all axes).
pos_class_loss = tf.reduce_sum(classification_loss * positives, axis=-1) # Tensor of shape (batch_size,)
# Compute the classification loss for the negative default boxes (if there are any).
# First, compute the classification loss for all negative boxes.
neg_class_loss_all = classification_loss * negatives # Tensor of shape (batch_size, n_boxes)
n_neg_losses = tf.count_nonzero(neg_class_loss_all, dtype=tf.int32) # The number of non-zero loss entries in `neg_class_loss_all`
# What's the point of `n_neg_losses`? For the next step, which will be to compute which negative boxes enter the classification
# loss, we don't just want to know how many negative ground truth boxes there are, but for how many of those there actually is
# a positive (i.e. non-zero) loss. This is necessary because `tf.nn.top-k()` in the function below will pick the top k boxes with
# the highest losses no matter what, even if it receives a vector where all losses are zero. In the unlikely event that all negative
# classification losses ARE actually zero though, this behavior might lead to `tf.nn.top-k()` returning the indices of positive
# boxes, leading to an incorrect negative classification loss computation, and hence an incorrect overall loss computation.
# We therefore need to make sure that `n_negative_keep`, which assumes the role of the `k` argument in `tf.nn.top-k()`,
# is at most the number of negative boxes for which there is a positive classification loss.
# Compute the number of negative examples we want to account for in the loss.
# We'll keep at most `self.neg_pos_ratio` times the number of positives in `y_true`, but at least `self.n_neg_min` (unless `n_neg_loses` is smaller).
n_negative_keep = tf.minimum(tf.maximum(self.neg_pos_ratio * tf.to_int32(n_positive), self.n_neg_min), n_neg_losses)
# In the unlikely case when either (1) there are no negative ground truth boxes at all
# or (2) the classification loss for all negative boxes is zero, return zero as the `neg_class_loss`.
def f1():
return tf.zeros([batch_size])
# Otherwise compute the negative loss.
def f2():
# Now we'll identify the top-k (where k == `n_negative_keep`) boxes with the highest confidence loss that
# belong to the background class in the ground truth data. Note that this doesn't necessarily mean that the model
# predicted the wrong class for those boxes, it just means that the loss for those boxes is the highest.
# To do this, we reshape `neg_class_loss_all` to 1D...
neg_class_loss_all_1D = tf.reshape(neg_class_loss_all, [-1]) # Tensor of shape (batch_size * n_boxes,)
# ...and then we get the indices for the `n_negative_keep` boxes with the highest loss out of those...
values, indices = tf.nn.top_k(neg_class_loss_all_1D,
k=n_negative_keep,
sorted=False) # We don't need them sorted.
# ...and with these indices we'll create a mask...
negatives_keep = tf.scatter_nd(indices=tf.expand_dims(indices, axis=1),
updates=tf.ones_like(indices, dtype=tf.int32),
shape=tf.shape(neg_class_loss_all_1D)) # Tensor of shape (batch_size * n_boxes,)
negatives_keep = tf.to_float(tf.reshape(negatives_keep, [batch_size, n_boxes])) # Tensor of shape (batch_size, n_boxes)
# ...and use it to keep only those boxes and mask all other classification losses
neg_class_loss = tf.reduce_sum(classification_loss * negatives_keep, axis=-1) # Tensor of shape (batch_size,)
return neg_class_loss
neg_class_loss = tf.cond(tf.equal(n_neg_losses, tf.constant(0)), f1, f2)
class_loss = pos_class_loss + neg_class_loss # Tensor of shape (batch_size,)
# 3: Compute the localization loss for the positive targets.
# We don't compute a localization loss for negative predicted boxes (obviously: there are no ground truth boxes they would correspond to).
loc_loss = tf.reduce_sum(localization_loss * positives, axis=-1) # Tensor of shape (batch_size,)
# 4: Compute the total loss.
total_loss = (class_loss + self.alpha * loc_loss) / tf.maximum(1.0, n_positive) # In case `n_positive == 0`
# Keras has the annoying habit of dividing the loss by the batch size, which sucks in our case
# because the relevant criterion to average our loss over is the number of positive boxes in the batch
# (by which we're dividing in the line above), not the batch size. So in order to revert Keras' averaging
# over the batch size, we'll have to multiply by it.
total_loss = total_loss * tf.to_float(batch_size)
return total_loss

398
ssd_keras-master/log.csv Normal file
View File

@@ -0,0 +1,398 @@
epoch,loss,val_loss
0,20.277508449554443,18.43082230275991
0,7.1915742305224075,6.3664290333280755
1,6.165657311318146,5.740384768223276
2,5.619835971168131,5.055156362981212
3,5.369787324809428,4.892946821913427
4,5.132327414380266,4.604732026761892
5,5.0042940591924046,5.3367882135449625
6,4.817910179700142,4.068967317586043
7,4.781344171415022,4.1435956740622615
0,4.711332150380216,3.9899120714713114
1,4.565739538037641,3.8868639851346307
2,4.5467505189074835,4.518684427008337
3,4.446662645487534,3.6808233204909735
4,4.384432893490333,3.9689779205224953
5,4.338269265632533,3.632280271783167
6,4.232728542852971,3.5473593521848015
7,4.24265526459042,3.675496294182174
8,4.198724102928925,3.8537149546584306
9,4.149862735920051,3.242039015268793
10,4.086929438042281,3.2605271822092483
11,4.080999140106535,3.4492100918293
12,4.051774456474609,3.4600228681856273
13,4.047840290988972,3.3476012737167125
14,4.004921493658581,3.253551969005137
15,3.980387693464584,3.1475591296809062
16,3.963608845837807,3.439066876367647
17,3.9319142337899247,3.227249937106152
18,3.9267380777162972,3.1900331236391652
19,3.88652819715875,3.4196941712194557
20,3.891526617915775,3.2850187503561683
21,3.8810042729401117,3.178472664550859
22,3.845480888771085,3.1901466000080108
23,3.8601040031731895,3.313066850292439
24,3.833493468026303,3.1854778224959666
25,3.846581113314497,3.3080512863762523
26,3.7922811536337204,3.180536364243955
27,3.796920354224469,3.230332650749051
28,3.77545154190615,3.0828077124941107
29,3.7792578078768884,3.088481028274614
30,3.799028399090284,3.040389155903641
31,3.7627443589116534,3.5030247830858037
32,3.7849467292635994,3.430109476434941
33,3.7617942428636812,3.076172747368715
34,3.69820216201435,3.0636714777776173
35,3.7106582014515714,3.1230773734316535
36,3.7186973696831696,3.2554605943086194
37,3.73952524356691,3.322282877552266
38,3.7127712956183494,3.1296832254833107
39,3.743936704607518,2.9867530497969415
40,3.6970554582580717,3.1789982390160465
41,3.675703220584339,3.181230039730364
42,3.6800708012869046,2.9879249061492024
43,3.6996077292521954,2.909196970170858
44,3.6874807460968984,3.0076349156608386
0,5.301829584752811,3.880189075713255
1,4.237945272776155,3.4012826725658107
2,3.967824310967916,3.3003988957161807
3,3.811831304539573,4.506175486311621
4,3.738135123222297,3.005752024164005
5,3.6835490122348116,3.1180185431728558
6,3.641814648534126,3.1133831091559663
7,3.62354097024078,2.9153411725467566
8,3.6078073618738418,3.366496021869231
9,3.5808043392312707,2.916329422021399
10,3.558084192422922,3.240870023990164
11,3.543319643807441,3.101310088415535
12,3.5348065467450422,3.1885890494803992
13,3.5254843150241024,3.081872761176557
14,3.532030360467253,3.140263960093868
15,3.502065125688435,2.9188486405051486
0,5.393483727917134,3.9214612931864603
1,4.453533147422382,3.54439757006509
2,4.242004539589649,3.4558666370109634
3,4.149720608257483,3.3706597017755313
4,4.070950873899982,3.365175929799372
5,4.036489350869629,3.2572230537570253
6,3.9832210350476616,3.244548196865588
7,3.9685796719093296,3.186030752999442
8,3.9331882864690937,3.1441271540583395
9,3.9003949890268834,3.15957534821666
10,3.869919748394882,3.108536195487392
11,3.8835768105307977,3.199020989184477
12,3.859471693538142,3.1070882893095213
13,3.831863939446656,3.0932964833172
14,3.837622393809239,3.06195428563624
15,3.840526257394262,3.1301962093431124
16,3.806362711088162,3.022412871852213
17,3.7940347837546606,3.0083183234078543
0,3.583134603498268,2.899540112699781
1,3.5758411770470437,2.8950582454885754
2,3.552036837656122,3.1472871506943996
3,3.549508606013809,3.1411996200133343
4,3.536157440913235,2.8483173701836138
0,3.792046067396762,3.302008090408481
1,3.7974349303578765,3.048314040972262
2,3.774148417477707,3.0506933468458604
3,3.770723174057516,2.995274811155942
4,3.7590958569089965,2.9874864899868867
5,3.756503374499173,2.9651731917809467
6,3.7571751423402726,3.040447744666314
7,3.7372909294386574,2.9443872574884065
8,3.735867359026198,3.062159067051751
9,3.739980499226432,3.0531008007575053
10,3.720392432353275,3.0481183667328895
11,3.7236807229637274,3.054165705065338
12,3.738948112931461,2.9650489200134666
13,3.725341242685843,3.0501689881937843
14,3.717509788593112,2.906775436851443
15,3.7082020335921086,2.8870353331614513
16,3.7062981972233824,2.949263564859118
17,3.696328950256506,2.898153125753208
18,3.694157539690318,2.9727682158411763
19,3.6942759911594907,2.936929242586603
20,3.688544740044238,2.9775942695262483
21,3.685629239042291,3.008222121559844
22,3.6829704219200865,2.963830918925149
23,3.687984784862804,2.9122491843116527
24,3.6889919993408293,2.920590233389212
25,3.6744058603034384,3.156900228261948
26,3.678084367126283,2.9845731163024904
27,3.6721215580784996,2.864051256106824
28,3.6761251689142522,2.9282362472281163
29,3.670322015591761,2.916268544233575
30,3.6746563016959897,2.93533816483556
31,3.68822167135601,3.1514753251659626
32,3.673546909383746,2.9957992443989734
33,3.6622909284476606,2.9045850520109644
34,3.6821291304096255,2.846033621831816
35,3.6746097554417245,2.946126491439586
36,3.6777817256709335,2.895232820973104
37,3.6627973835325474,2.8787644779925445
38,3.656972606852946,2.910535082330509
39,3.6593537592859406,2.9323528041158404
40,3.6681329787519275,2.8517043751113267
41,3.651395040133656,2.896727514206147
42,3.6524612816863242,2.9148669353918155
43,3.700482932643524,2.8747942129324895
44,3.6568134403923254,2.9226642251501276
45,3.6591445048575175,3.0673245993195746
46,3.667327405720154,2.923382118916025
47,3.666819140911102,2.8185447474280183
48,3.6595909573863317,2.8957767840307587
49,3.6621842928973347,2.8596115811990233
50,3.6531581074987307,2.8383416561447845
51,3.6436384217574638,2.8588484636618166
52,3.653585070944212,2.9522264416363773
53,3.6567384985471123,2.869638296122454
54,3.642349756571655,2.9472995919840677
55,3.6503870386117447,2.907968194436054
56,3.6526537667205465,2.97333710225261
0,3.527430093659335,2.9499444086697637
1,3.518461085146007,2.7458512868321674
2,3.497357860100584,2.750564945644262
3,3.505551182790645,2.934770630172321
4,3.5010816638422906,2.97169387428128
5,3.4826972408190953,2.9497152687700425
6,3.482018192963834,2.769856216822352
7,3.483267085125539,3.1796632373089695
8,3.4743894621160525,2.8287636573582278
9,3.476382410010232,2.7349996355723363
10,3.4668052105305263,3.3569845280476978
11,3.464654709591174,2.956850599500598
12,3.469194402020336,2.9216046754559692
13,3.4582013188650005,2.766444167117683
14,3.4473263411923116,2.574555371805113
15,3.440576969020528,2.961924656763369
16,3.440138483498881,2.5872522278221286
17,3.4416094603475087,2.94846072853828
18,3.431826221428119,2.8613412939285747
19,3.426628774009799,2.703764006556297
20,3.418753977854678,2.794771757904364
21,3.422769560654093,2.614489094541997
22,3.4175067899222435,3.1974430852641866
23,3.4211014375507407,2.886513024483408
0,7.251550525006652,6.727467104464161
1,6.5805361751809714,6.240762918117095
2,6.426467575960234,6.261519120201773
3,6.382249749480933,6.350825642858233
4,6.346947526265681,6.231373463066257
5,6.340085077037663,6.2704033127123
6,6.330737964940444,6.153748500079525
7,6.277891544860601,6.231535480581984
8,6.251803111067415,6.1931941764208736
9,6.227573697961121,6.1739476390760775
10,6.275304704672471,6.228755848821328
11,6.2669886230789125,6.165582716902908
12,6.26808624998182,6.197941570963178
13,6.244138752343505,6.2477391894982786
14,6.226582288056612,6.1598160204838734
15,6.240093465811573,6.190422949109759
16,6.226800398378633,6.153926610241131
17,6.2345370232274755,6.151287238792498
18,6.169805461343378,6.16343318907582
19,6.222987758064456,6.15657544605586
20,6.226215513777733,6.116615871896549
21,6.1693542201332745,6.131834213733673
22,6.179976459885388,6.121268307749106
23,6.176489705658332,6.11803888705312
24,6.173434678544104,6.119943920398245
25,6.18218496430628,6.11959202433119
26,6.18542869777903,6.127611294644219
27,6.211702589542791,6.17060423977521
28,6.180811053234339,6.125907942786509
29,6.218389826829731,6.138787643155273
30,6.196371093794331,6.146416762896947
31,6.194937759243325,6.137278949557518
32,6.19617432137914,6.117240555602677
33,6.183772964755073,6.095507074229571
34,6.194296064190194,6.1405050451901495
35,6.1723592400137335,6.105950779063361
36,6.174787776814774,6.0992349732652
37,6.179421859563514,6.131365781170981
38,6.206091299973801,6.113531703194793
39,6.196451108533144,6.130458958051642
40,6.175900908626988,6.109128736549494
41,6.194291211727261,6.113921468866114
42,6.174372794448212,6.096925194433757
43,6.190038766126334,6.118585459577794
44,6.198391766541452,6.129865546883369
45,6.167305888028443,6.117949859998664
46,6.191963202090934,6.125024612679773
47,6.200261698535457,6.133790407521384
48,6.168389475045353,6.152726840948572
49,6.1765353440582755,6.131584920445267
50,6.2232111624162645,6.099260960938979
51,6.197849844019115,6.152765401674777
52,6.204225545700453,6.084189028399331
53,6.184185870978422,6.141087080483534
54,6.17963873746153,6.105953947792248
55,6.161203468418494,6.137251816039183
56,6.196522537269443,6.117956788977798
57,6.171350480107404,6.110516171017472
58,6.190189365617,6.14508030217521
59,6.173887720591202,6.144299315992667
60,6.172330596484616,6.110945329617481
61,6.208208630987257,6.116943371952797
62,6.118527266681567,6.113059694937297
63,6.16644262248762,6.105053848952664
64,6.198519827031158,6.110208214083497
65,6.159561837555282,6.1219949465138574
66,6.154563741190173,6.131545047346426
67,6.154950792051479,6.12576116649472
68,6.166894907002337,6.118150112434309
69,6.207425719556958,6.139764730662716
70,6.158974324407429,6.1250802998153535
71,6.177289243172854,6.1134030548650395
72,6.177155931112543,6.133135798464016
73,6.204698515486717,6.099255515945201
74,6.142999435611628,6.093320363468053
75,6.1286770001064985,6.118610319020797
76,6.195084075715206,6.1120577796624636
77,6.188022490739077,6.125176494997375
78,6.173007156901806,6.131874611621
79,6.169041640301794,6.136714840640827
80,6.158187964781932,6.088659851551056
81,6.124840645731054,6.073482194530721
82,6.11550829057917,6.0627874065175344
83,6.142901296035387,6.071128609764333
84,6.134051843394339,6.059427362023568
85,6.131704535519704,6.076201498459796
86,6.131018532524816,6.066524108332031
87,6.129356051648408,6.077126537366789
88,6.119815099205821,6.066800057766389
89,6.135372443350591,6.076053387072622
90,6.143837644260377,6.0712920576455645
91,6.128566016821563,6.073624474102137
92,6.127937101376616,6.072794317153035
93,6.107040068831481,6.070746099608285
94,6.114805160044693,6.065637336336836
95,6.0848008258445185,6.077359912711747
96,6.120510688652285,6.07685818613792
97,6.121718007607199,6.077014412150091
98,6.138589511550031,6.0753674102559385
99,6.14711362022683,6.072682453807519
0,5.459651564954003,4.123017392645077
1,4.404505207740189,3.7492737925782498
2,4.218900287955977,4.037354177358199
3,4.113248738984554,3.6995143434952715
4,4.0643949411929805,3.5690866557919247
5,4.025469528097724,3.5282344099940084
6,4.0457330943727445,3.557335000719343
7,3.9765755680961963,3.4300471677585525
8,3.9456495689745363,3.430577338277077
9,3.915462172458488,3.672232191513996
10,3.908008457826215,3.3030753601813805
11,3.882619246215927,3.3868675887827973
12,3.8698588563729848,3.4117591986364246
13,3.8457697521586476,3.333784429394469
14,3.829451695427381,3.344102716445923
15,3.829135890411407,3.4384743941560085
16,3.8185325401698296,3.2764437681314895
17,3.811728405532498,3.3070398575919016
18,3.8193986879577944,3.3324410565045417
19,3.8045236736306745,3.2836114091289286
20,3.7939121548769945,3.187099374848969
21,3.7803091051688136,3.2528416243377998
22,3.7838520857264712,3.217187368042615
23,3.7684858936724477,3.1406039195158044
24,3.7643722954691268,3.263700286222964
25,3.764850974855544,3.3180053218530148
26,3.762447085700412,3.216772758328185
27,3.7541283817721425,3.2951196048697646
28,3.7537426449527898,3.2274136846892687
29,3.747767803558582,3.1421889299275922
30,3.7473671009458767,3.2217603358443903
31,3.7368718368898564,3.1999600636229224
32,3.7414575453173877,3.2511923570049053
33,3.7405168471323598,3.144407963266178
34,3.7356711041752884,3.186742841662193
35,3.7488686152503843,3.2660124095605343
36,3.732873413276759,3.1449162584421586
37,3.721955984815495,3.2085911212648663
38,3.7279464378143787,3.1323465240244963
39,3.7387744518253365,3.342787105404601
40,3.7224520102824785,3.129538404892902
41,3.729480919002857,3.1475857473879443
42,3.7171482909510605,3.2321544336786077
43,3.7299101766625076,3.268986857667261
44,3.7153538607347905,3.2292985235914893
45,3.710317350398387,3.193034345860384
46,3.7254012536669867,3.2186355395219763
0,5.233777807004288,3.754912186447455
1,4.155797876627342,3.3218673182506953
2,3.9582901459886544,3.3553187786802954
3,3.854809833242829,3.0148061607321917
4,3.774991210410646,2.9839462475873986
5,3.7435560775237327,2.944469572962547
6,3.690073798276478,2.9661303657414964
7,3.688818962637007,3.0197564858806376
8,3.6428677247337298,2.9265284273575762
9,3.6327320056962416,2.859319869936729
10,3.5973561518366832,2.8247028306065776
0,3.5895758565068245,2.878430740298057
1,3.57021487172842,2.891466063382674
2,3.5890834237337113,2.810688891021573
3,3.5526506499171258,2.8535458037318016
4,3.557103377425671,2.819768620607804
5,3.5469331957697867,2.8853546847129357
6,3.5281683020591736,2.8020037130433684
7,3.5180265937447546,2.760811321102843
8,3.502203633582592,2.810385329772015
9,3.4997954434514047,2.799852768936936
10,3.4855113650679588,3.1717292711686116
11,3.4743440980196,2.7339165886080994
12,3.478519773006439,2.7316556148139797
13,3.470843297624588,2.7262015390396117
14,3.468432496213913,2.74120696826857
15,3.4654084459781647,2.9365515257387744
16,3.4587265821099282,2.7102422758024565
17,3.442611147677898,2.7034222851967327
18,3.451740815258026,2.759180706374499
19,3.4330322801709174,2.8289515509897347
20,3.433631085264683,2.7586020536811984
21,3.42998209284544,2.7625791699545723
22,3.438184221959114,2.709267522266933
23,3.443043806731701,2.69548738421226
24,3.443766810965538,2.8324346519003107
25,3.419732032930851,2.6618910677578986
26,3.4263635221004485,2.670782587868827
27,3.4174978087067602,2.6961302435154817
28,3.4232222255468367,2.6739595896857127
29,3.418972552371025,2.657863086875604
30,3.4127560631990432,2.6398553523238824
31,3.4195866104125976,2.6925482973760486
32,3.426458643066883,2.678463907339135
33,3.421615189695358,2.887123303413391
34,3.409783862519264,2.6411300960852175
35,3.4169762951254845,2.7320803192683627
36,3.405258295547962,2.6527417176110406
37,3.4024106793880464,2.6393382494790214
38,3.4015627622365954,2.7767718032914765
39,3.4063776480317114,2.685369894845145
40,3.393300777006149,2.6538100697069753
41,3.4112252692580225,2.6793857584194263
42,3.4120474227547644,2.726417605049756
43,3.3938982912421225,2.6654360306019687
44,3.4041917283177376,2.743247573035104
45,3.408869186782837,2.637516905920846
46,3.3951859700441362,2.6712587169725066
47,3.4072072798848154,2.649881097248622
48,3.3960764342546463,2.700681756953804
49,3.3881560341000556,2.6843594738901877
50,3.389593525660038,2.6199345495262922
51,3.382925266957283,2.6239259885281934
52,3.3866692927956583,2.6355316166001925
53,3.3969139186263084,2.72334972177233
54,3.3867322647333147,2.7168657478021117
55,3.3895327091932295,2.738639141296854
56,3.3796878326773645,2.638875687462943
57,3.3830816565036774,2.6640367179014244
58,3.382064008331299,2.682919617380415
59,3.3827162971138955,2.7199460838278946
60,3.3851185901761056,2.6911930183488497
61,3.3796840319156645,2.6435422468185426
62,3.3814005301952363,2.710524764060974
63,3.3771395704865457,2.6270531114266844
64,3.4597128042459486,2.7650137408898803
1 epoch loss val_loss
2 0 20.277508449554443 18.43082230275991
3 0 7.1915742305224075 6.3664290333280755
4 1 6.165657311318146 5.740384768223276
5 2 5.619835971168131 5.055156362981212
6 3 5.369787324809428 4.892946821913427
7 4 5.132327414380266 4.604732026761892
8 5 5.0042940591924046 5.3367882135449625
9 6 4.817910179700142 4.068967317586043
10 7 4.781344171415022 4.1435956740622615
11 0 4.711332150380216 3.9899120714713114
12 1 4.565739538037641 3.8868639851346307
13 2 4.5467505189074835 4.518684427008337
14 3 4.446662645487534 3.6808233204909735
15 4 4.384432893490333 3.9689779205224953
16 5 4.338269265632533 3.632280271783167
17 6 4.232728542852971 3.5473593521848015
18 7 4.24265526459042 3.675496294182174
19 8 4.198724102928925 3.8537149546584306
20 9 4.149862735920051 3.242039015268793
21 10 4.086929438042281 3.2605271822092483
22 11 4.080999140106535 3.4492100918293
23 12 4.051774456474609 3.4600228681856273
24 13 4.047840290988972 3.3476012737167125
25 14 4.004921493658581 3.253551969005137
26 15 3.980387693464584 3.1475591296809062
27 16 3.963608845837807 3.439066876367647
28 17 3.9319142337899247 3.227249937106152
29 18 3.9267380777162972 3.1900331236391652
30 19 3.88652819715875 3.4196941712194557
31 20 3.891526617915775 3.2850187503561683
32 21 3.8810042729401117 3.178472664550859
33 22 3.845480888771085 3.1901466000080108
34 23 3.8601040031731895 3.313066850292439
35 24 3.833493468026303 3.1854778224959666
36 25 3.846581113314497 3.3080512863762523
37 26 3.7922811536337204 3.180536364243955
38 27 3.796920354224469 3.230332650749051
39 28 3.77545154190615 3.0828077124941107
40 29 3.7792578078768884 3.088481028274614
41 30 3.799028399090284 3.040389155903641
42 31 3.7627443589116534 3.5030247830858037
43 32 3.7849467292635994 3.430109476434941
44 33 3.7617942428636812 3.076172747368715
45 34 3.69820216201435 3.0636714777776173
46 35 3.7106582014515714 3.1230773734316535
47 36 3.7186973696831696 3.2554605943086194
48 37 3.73952524356691 3.322282877552266
49 38 3.7127712956183494 3.1296832254833107
50 39 3.743936704607518 2.9867530497969415
51 40 3.6970554582580717 3.1789982390160465
52 41 3.675703220584339 3.181230039730364
53 42 3.6800708012869046 2.9879249061492024
54 43 3.6996077292521954 2.909196970170858
55 44 3.6874807460968984 3.0076349156608386
56 0 5.301829584752811 3.880189075713255
57 1 4.237945272776155 3.4012826725658107
58 2 3.967824310967916 3.3003988957161807
59 3 3.811831304539573 4.506175486311621
60 4 3.738135123222297 3.005752024164005
61 5 3.6835490122348116 3.1180185431728558
62 6 3.641814648534126 3.1133831091559663
63 7 3.62354097024078 2.9153411725467566
64 8 3.6078073618738418 3.366496021869231
65 9 3.5808043392312707 2.916329422021399
66 10 3.558084192422922 3.240870023990164
67 11 3.543319643807441 3.101310088415535
68 12 3.5348065467450422 3.1885890494803992
69 13 3.5254843150241024 3.081872761176557
70 14 3.532030360467253 3.140263960093868
71 15 3.502065125688435 2.9188486405051486
72 0 5.393483727917134 3.9214612931864603
73 1 4.453533147422382 3.54439757006509
74 2 4.242004539589649 3.4558666370109634
75 3 4.149720608257483 3.3706597017755313
76 4 4.070950873899982 3.365175929799372
77 5 4.036489350869629 3.2572230537570253
78 6 3.9832210350476616 3.244548196865588
79 7 3.9685796719093296 3.186030752999442
80 8 3.9331882864690937 3.1441271540583395
81 9 3.9003949890268834 3.15957534821666
82 10 3.869919748394882 3.108536195487392
83 11 3.8835768105307977 3.199020989184477
84 12 3.859471693538142 3.1070882893095213
85 13 3.831863939446656 3.0932964833172
86 14 3.837622393809239 3.06195428563624
87 15 3.840526257394262 3.1301962093431124
88 16 3.806362711088162 3.022412871852213
89 17 3.7940347837546606 3.0083183234078543
90 0 3.583134603498268 2.899540112699781
91 1 3.5758411770470437 2.8950582454885754
92 2 3.552036837656122 3.1472871506943996
93 3 3.549508606013809 3.1411996200133343
94 4 3.536157440913235 2.8483173701836138
95 0 3.792046067396762 3.302008090408481
96 1 3.7974349303578765 3.048314040972262
97 2 3.774148417477707 3.0506933468458604
98 3 3.770723174057516 2.995274811155942
99 4 3.7590958569089965 2.9874864899868867
100 5 3.756503374499173 2.9651731917809467
101 6 3.7571751423402726 3.040447744666314
102 7 3.7372909294386574 2.9443872574884065
103 8 3.735867359026198 3.062159067051751
104 9 3.739980499226432 3.0531008007575053
105 10 3.720392432353275 3.0481183667328895
106 11 3.7236807229637274 3.054165705065338
107 12 3.738948112931461 2.9650489200134666
108 13 3.725341242685843 3.0501689881937843
109 14 3.717509788593112 2.906775436851443
110 15 3.7082020335921086 2.8870353331614513
111 16 3.7062981972233824 2.949263564859118
112 17 3.696328950256506 2.898153125753208
113 18 3.694157539690318 2.9727682158411763
114 19 3.6942759911594907 2.936929242586603
115 20 3.688544740044238 2.9775942695262483
116 21 3.685629239042291 3.008222121559844
117 22 3.6829704219200865 2.963830918925149
118 23 3.687984784862804 2.9122491843116527
119 24 3.6889919993408293 2.920590233389212
120 25 3.6744058603034384 3.156900228261948
121 26 3.678084367126283 2.9845731163024904
122 27 3.6721215580784996 2.864051256106824
123 28 3.6761251689142522 2.9282362472281163
124 29 3.670322015591761 2.916268544233575
125 30 3.6746563016959897 2.93533816483556
126 31 3.68822167135601 3.1514753251659626
127 32 3.673546909383746 2.9957992443989734
128 33 3.6622909284476606 2.9045850520109644
129 34 3.6821291304096255 2.846033621831816
130 35 3.6746097554417245 2.946126491439586
131 36 3.6777817256709335 2.895232820973104
132 37 3.6627973835325474 2.8787644779925445
133 38 3.656972606852946 2.910535082330509
134 39 3.6593537592859406 2.9323528041158404
135 40 3.6681329787519275 2.8517043751113267
136 41 3.651395040133656 2.896727514206147
137 42 3.6524612816863242 2.9148669353918155
138 43 3.700482932643524 2.8747942129324895
139 44 3.6568134403923254 2.9226642251501276
140 45 3.6591445048575175 3.0673245993195746
141 46 3.667327405720154 2.923382118916025
142 47 3.666819140911102 2.8185447474280183
143 48 3.6595909573863317 2.8957767840307587
144 49 3.6621842928973347 2.8596115811990233
145 50 3.6531581074987307 2.8383416561447845
146 51 3.6436384217574638 2.8588484636618166
147 52 3.653585070944212 2.9522264416363773
148 53 3.6567384985471123 2.869638296122454
149 54 3.642349756571655 2.9472995919840677
150 55 3.6503870386117447 2.907968194436054
151 56 3.6526537667205465 2.97333710225261
152 0 3.527430093659335 2.9499444086697637
153 1 3.518461085146007 2.7458512868321674
154 2 3.497357860100584 2.750564945644262
155 3 3.505551182790645 2.934770630172321
156 4 3.5010816638422906 2.97169387428128
157 5 3.4826972408190953 2.9497152687700425
158 6 3.482018192963834 2.769856216822352
159 7 3.483267085125539 3.1796632373089695
160 8 3.4743894621160525 2.8287636573582278
161 9 3.476382410010232 2.7349996355723363
162 10 3.4668052105305263 3.3569845280476978
163 11 3.464654709591174 2.956850599500598
164 12 3.469194402020336 2.9216046754559692
165 13 3.4582013188650005 2.766444167117683
166 14 3.4473263411923116 2.574555371805113
167 15 3.440576969020528 2.961924656763369
168 16 3.440138483498881 2.5872522278221286
169 17 3.4416094603475087 2.94846072853828
170 18 3.431826221428119 2.8613412939285747
171 19 3.426628774009799 2.703764006556297
172 20 3.418753977854678 2.794771757904364
173 21 3.422769560654093 2.614489094541997
174 22 3.4175067899222435 3.1974430852641866
175 23 3.4211014375507407 2.886513024483408
176 0 7.251550525006652 6.727467104464161
177 1 6.5805361751809714 6.240762918117095
178 2 6.426467575960234 6.261519120201773
179 3 6.382249749480933 6.350825642858233
180 4 6.346947526265681 6.231373463066257
181 5 6.340085077037663 6.2704033127123
182 6 6.330737964940444 6.153748500079525
183 7 6.277891544860601 6.231535480581984
184 8 6.251803111067415 6.1931941764208736
185 9 6.227573697961121 6.1739476390760775
186 10 6.275304704672471 6.228755848821328
187 11 6.2669886230789125 6.165582716902908
188 12 6.26808624998182 6.197941570963178
189 13 6.244138752343505 6.2477391894982786
190 14 6.226582288056612 6.1598160204838734
191 15 6.240093465811573 6.190422949109759
192 16 6.226800398378633 6.153926610241131
193 17 6.2345370232274755 6.151287238792498
194 18 6.169805461343378 6.16343318907582
195 19 6.222987758064456 6.15657544605586
196 20 6.226215513777733 6.116615871896549
197 21 6.1693542201332745 6.131834213733673
198 22 6.179976459885388 6.121268307749106
199 23 6.176489705658332 6.11803888705312
200 24 6.173434678544104 6.119943920398245
201 25 6.18218496430628 6.11959202433119
202 26 6.18542869777903 6.127611294644219
203 27 6.211702589542791 6.17060423977521
204 28 6.180811053234339 6.125907942786509
205 29 6.218389826829731 6.138787643155273
206 30 6.196371093794331 6.146416762896947
207 31 6.194937759243325 6.137278949557518
208 32 6.19617432137914 6.117240555602677
209 33 6.183772964755073 6.095507074229571
210 34 6.194296064190194 6.1405050451901495
211 35 6.1723592400137335 6.105950779063361
212 36 6.174787776814774 6.0992349732652
213 37 6.179421859563514 6.131365781170981
214 38 6.206091299973801 6.113531703194793
215 39 6.196451108533144 6.130458958051642
216 40 6.175900908626988 6.109128736549494
217 41 6.194291211727261 6.113921468866114
218 42 6.174372794448212 6.096925194433757
219 43 6.190038766126334 6.118585459577794
220 44 6.198391766541452 6.129865546883369
221 45 6.167305888028443 6.117949859998664
222 46 6.191963202090934 6.125024612679773
223 47 6.200261698535457 6.133790407521384
224 48 6.168389475045353 6.152726840948572
225 49 6.1765353440582755 6.131584920445267
226 50 6.2232111624162645 6.099260960938979
227 51 6.197849844019115 6.152765401674777
228 52 6.204225545700453 6.084189028399331
229 53 6.184185870978422 6.141087080483534
230 54 6.17963873746153 6.105953947792248
231 55 6.161203468418494 6.137251816039183
232 56 6.196522537269443 6.117956788977798
233 57 6.171350480107404 6.110516171017472
234 58 6.190189365617 6.14508030217521
235 59 6.173887720591202 6.144299315992667
236 60 6.172330596484616 6.110945329617481
237 61 6.208208630987257 6.116943371952797
238 62 6.118527266681567 6.113059694937297
239 63 6.16644262248762 6.105053848952664
240 64 6.198519827031158 6.110208214083497
241 65 6.159561837555282 6.1219949465138574
242 66 6.154563741190173 6.131545047346426
243 67 6.154950792051479 6.12576116649472
244 68 6.166894907002337 6.118150112434309
245 69 6.207425719556958 6.139764730662716
246 70 6.158974324407429 6.1250802998153535
247 71 6.177289243172854 6.1134030548650395
248 72 6.177155931112543 6.133135798464016
249 73 6.204698515486717 6.099255515945201
250 74 6.142999435611628 6.093320363468053
251 75 6.1286770001064985 6.118610319020797
252 76 6.195084075715206 6.1120577796624636
253 77 6.188022490739077 6.125176494997375
254 78 6.173007156901806 6.131874611621
255 79 6.169041640301794 6.136714840640827
256 80 6.158187964781932 6.088659851551056
257 81 6.124840645731054 6.073482194530721
258 82 6.11550829057917 6.0627874065175344
259 83 6.142901296035387 6.071128609764333
260 84 6.134051843394339 6.059427362023568
261 85 6.131704535519704 6.076201498459796
262 86 6.131018532524816 6.066524108332031
263 87 6.129356051648408 6.077126537366789
264 88 6.119815099205821 6.066800057766389
265 89 6.135372443350591 6.076053387072622
266 90 6.143837644260377 6.0712920576455645
267 91 6.128566016821563 6.073624474102137
268 92 6.127937101376616 6.072794317153035
269 93 6.107040068831481 6.070746099608285
270 94 6.114805160044693 6.065637336336836
271 95 6.0848008258445185 6.077359912711747
272 96 6.120510688652285 6.07685818613792
273 97 6.121718007607199 6.077014412150091
274 98 6.138589511550031 6.0753674102559385
275 99 6.14711362022683 6.072682453807519
276 0 5.459651564954003 4.123017392645077
277 1 4.404505207740189 3.7492737925782498
278 2 4.218900287955977 4.037354177358199
279 3 4.113248738984554 3.6995143434952715
280 4 4.0643949411929805 3.5690866557919247
281 5 4.025469528097724 3.5282344099940084
282 6 4.0457330943727445 3.557335000719343
283 7 3.9765755680961963 3.4300471677585525
284 8 3.9456495689745363 3.430577338277077
285 9 3.915462172458488 3.672232191513996
286 10 3.908008457826215 3.3030753601813805
287 11 3.882619246215927 3.3868675887827973
288 12 3.8698588563729848 3.4117591986364246
289 13 3.8457697521586476 3.333784429394469
290 14 3.829451695427381 3.344102716445923
291 15 3.829135890411407 3.4384743941560085
292 16 3.8185325401698296 3.2764437681314895
293 17 3.811728405532498 3.3070398575919016
294 18 3.8193986879577944 3.3324410565045417
295 19 3.8045236736306745 3.2836114091289286
296 20 3.7939121548769945 3.187099374848969
297 21 3.7803091051688136 3.2528416243377998
298 22 3.7838520857264712 3.217187368042615
299 23 3.7684858936724477 3.1406039195158044
300 24 3.7643722954691268 3.263700286222964
301 25 3.764850974855544 3.3180053218530148
302 26 3.762447085700412 3.216772758328185
303 27 3.7541283817721425 3.2951196048697646
304 28 3.7537426449527898 3.2274136846892687
305 29 3.747767803558582 3.1421889299275922
306 30 3.7473671009458767 3.2217603358443903
307 31 3.7368718368898564 3.1999600636229224
308 32 3.7414575453173877 3.2511923570049053
309 33 3.7405168471323598 3.144407963266178
310 34 3.7356711041752884 3.186742841662193
311 35 3.7488686152503843 3.2660124095605343
312 36 3.732873413276759 3.1449162584421586
313 37 3.721955984815495 3.2085911212648663
314 38 3.7279464378143787 3.1323465240244963
315 39 3.7387744518253365 3.342787105404601
316 40 3.7224520102824785 3.129538404892902
317 41 3.729480919002857 3.1475857473879443
318 42 3.7171482909510605 3.2321544336786077
319 43 3.7299101766625076 3.268986857667261
320 44 3.7153538607347905 3.2292985235914893
321 45 3.710317350398387 3.193034345860384
322 46 3.7254012536669867 3.2186355395219763
323 0 5.233777807004288 3.754912186447455
324 1 4.155797876627342 3.3218673182506953
325 2 3.9582901459886544 3.3553187786802954
326 3 3.854809833242829 3.0148061607321917
327 4 3.774991210410646 2.9839462475873986
328 5 3.7435560775237327 2.944469572962547
329 6 3.690073798276478 2.9661303657414964
330 7 3.688818962637007 3.0197564858806376
331 8 3.6428677247337298 2.9265284273575762
332 9 3.6327320056962416 2.859319869936729
333 10 3.5973561518366832 2.8247028306065776
334 0 3.5895758565068245 2.878430740298057
335 1 3.57021487172842 2.891466063382674
336 2 3.5890834237337113 2.810688891021573
337 3 3.5526506499171258 2.8535458037318016
338 4 3.557103377425671 2.819768620607804
339 5 3.5469331957697867 2.8853546847129357
340 6 3.5281683020591736 2.8020037130433684
341 7 3.5180265937447546 2.760811321102843
342 8 3.502203633582592 2.810385329772015
343 9 3.4997954434514047 2.799852768936936
344 10 3.4855113650679588 3.1717292711686116
345 11 3.4743440980196 2.7339165886080994
346 12 3.478519773006439 2.7316556148139797
347 13 3.470843297624588 2.7262015390396117
348 14 3.468432496213913 2.74120696826857
349 15 3.4654084459781647 2.9365515257387744
350 16 3.4587265821099282 2.7102422758024565
351 17 3.442611147677898 2.7034222851967327
352 18 3.451740815258026 2.759180706374499
353 19 3.4330322801709174 2.8289515509897347
354 20 3.433631085264683 2.7586020536811984
355 21 3.42998209284544 2.7625791699545723
356 22 3.438184221959114 2.709267522266933
357 23 3.443043806731701 2.69548738421226
358 24 3.443766810965538 2.8324346519003107
359 25 3.419732032930851 2.6618910677578986
360 26 3.4263635221004485 2.670782587868827
361 27 3.4174978087067602 2.6961302435154817
362 28 3.4232222255468367 2.6739595896857127
363 29 3.418972552371025 2.657863086875604
364 30 3.4127560631990432 2.6398553523238824
365 31 3.4195866104125976 2.6925482973760486
366 32 3.426458643066883 2.678463907339135
367 33 3.421615189695358 2.887123303413391
368 34 3.409783862519264 2.6411300960852175
369 35 3.4169762951254845 2.7320803192683627
370 36 3.405258295547962 2.6527417176110406
371 37 3.4024106793880464 2.6393382494790214
372 38 3.4015627622365954 2.7767718032914765
373 39 3.4063776480317114 2.685369894845145
374 40 3.393300777006149 2.6538100697069753
375 41 3.4112252692580225 2.6793857584194263
376 42 3.4120474227547644 2.726417605049756
377 43 3.3938982912421225 2.6654360306019687
378 44 3.4041917283177376 2.743247573035104
379 45 3.408869186782837 2.637516905920846
380 46 3.3951859700441362 2.6712587169725066
381 47 3.4072072798848154 2.649881097248622
382 48 3.3960764342546463 2.700681756953804
383 49 3.3881560341000556 2.6843594738901877
384 50 3.389593525660038 2.6199345495262922
385 51 3.382925266957283 2.6239259885281934
386 52 3.3866692927956583 2.6355316166001925
387 53 3.3969139186263084 2.72334972177233
388 54 3.3867322647333147 2.7168657478021117
389 55 3.3895327091932295 2.738639141296854
390 56 3.3796878326773645 2.638875687462943
391 57 3.3830816565036774 2.6640367179014244
392 58 3.382064008331299 2.682919617380415
393 59 3.3827162971138955 2.7199460838278946
394 60 3.3851185901761056 2.6911930183488497
395 61 3.3796840319156645 2.6435422468185426
396 62 3.3814005301952363 2.710524764060974
397 63 3.3771395704865457 2.6270531114266844
398 64 3.4597128042459486 2.7650137408898803

View File

View File

@@ -0,0 +1,177 @@
'''
Utilities that are useful to sub- or up-sample weights tensors.
Copyright (C) 2018 Pierluigi Ferrari
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''
import numpy as np
def sample_tensors(weights_list, sampling_instructions, axes=None, init=None, mean=0.0, stddev=0.005):
'''
Can sub-sample and/or up-sample individual dimensions of the tensors in the given list
of input tensors.
It is possible to sub-sample some dimensions and up-sample other dimensions at the same time.
The tensors in the list will be sampled consistently, i.e. for any given dimension that
corresponds among all tensors in the list, the same elements will be picked for every tensor
along that dimension.
For dimensions that are being sub-sampled, you can either provide a list of the indices
that should be picked, or you can provide the number of elements to be sub-sampled, in which
case the elements will be chosen at random.
For dimensions that are being up-sampled, "filler" elements will be insterted at random
positions along the respective dimension. These filler elements will be initialized either
with zero or from a normal distribution with selectable mean and standard deviation.
Arguments:
weights_list (list): A list of Numpy arrays. Each array represents one of the tensors
to be sampled. The tensor with the greatest number of dimensions must be the first
element in the list. For example, in the case of the weights of a 2D convolutional
layer, the kernel must be the first element in the list and the bias the second,
not the other way around. For all tensors in the list after the first tensor, the
lengths of each of their axes must identical to the length of some axis of the
first tensor.
sampling_instructions (list): A list that contains the sampling instructions for each
dimension of the first tensor. If the first tensor has `n` dimensions, then this
must be a list of length `n`. That means, sampling instructions for every dimension
of the first tensor must still be given even if not all dimensions should be changed.
The elements of this list can be either lists of integers or integers. If the sampling
instruction for a given dimension is a list of integers, then these integers represent
the indices of the elements of that dimension that will be sub-sampled. If the sampling
instruction for a given dimension is an integer, then that number of elements will be
sampled along said dimension. If the integer is greater than the number of elements
of the input tensors in that dimension, that dimension will be up-sampled. If the integer
is smaller than the number of elements of the input tensors in that dimension, that
dimension will be sub-sampled. If the integer is equal to the number of elements
of the input tensors in that dimension, that dimension will remain the same.
axes (list, optional): Only relevant if `weights_list` contains more than one tensor.
This list contains a list for each additional tensor in `weights_list` beyond the first.
Each of these lists contains integers that determine to which axes of the first tensor
the axes of the respective tensor correspond. For example, let the first tensor be a
4D tensor and the second tensor in the list be a 2D tensor. If the first element of
`axis` is the list `[2,3]`, then that means that the two axes of the second tensor
correspond to the last two axes of the first tensor, in the same order. The point of
this list is for the program to know, if a given dimension of the first tensor is to
be sub- or up-sampled, which dimensions of the other tensors in the list must be
sub- or up-sampled accordingly.
init (list, optional): Only relevant for up-sampling. Must be `None` or a list of strings
that determines for each tensor in `weights_list` how the newly inserted values should
be initialized. The possible values are 'gaussian' for initialization from a normal
distribution with the selected mean and standard deviation (see the following two arguments),
or 'zeros' for zero-initialization. If `None`, all initializations default to
'gaussian'.
mean (float, optional): Only relevant for up-sampling. The mean of the values that will
be inserted into the tensors at random in the case of up-sampling.
stddev (float, optional): Only relevant for up-sampling. The standard deviation of the
values that will be inserted into the tensors at random in the case of up-sampling.
Returns:
A list containing the sampled tensors in the same order in which they were given.
'''
first_tensor = weights_list[0]
if (not isinstance(sampling_instructions, (list, tuple))) or (len(sampling_instructions) != first_tensor.ndim):
raise ValueError("The sampling instructions must be a list whose length is the number of dimensions of the first tensor in `weights_list`.")
if (not init is None) and len(init) != len(weights_list):
raise ValueError("`init` must either be `None` or a list of strings that has the same length as `weights_list`.")
up_sample = [] # Store the dimensions along which we need to up-sample.
out_shape = [] # Store the shape of the output tensor here.
# Store two stages of the new (sub-sampled and/or up-sampled) weights tensors in the following two lists.
subsampled_weights_list = [] # Tensors after sub-sampling, but before up-sampling (if any).
upsampled_weights_list = [] # Sub-sampled tensors after up-sampling (if any), i.e. final output tensors.
# Create the slicing arrays from the sampling instructions.
sampling_slices = []
for i, sampling_inst in enumerate(sampling_instructions):
if isinstance(sampling_inst, (list, tuple)):
amax = np.amax(np.array(sampling_inst))
if amax >= first_tensor.shape[i]:
raise ValueError("The sample instructions for dimension {} contain index {}, which is greater than the length of that dimension.".format(i, amax))
sampling_slices.append(np.array(sampling_inst))
out_shape.append(len(sampling_inst))
elif isinstance(sampling_inst, int):
out_shape.append(sampling_inst)
if sampling_inst == first_tensor.shape[i]:
# Nothing to sample here, we're keeping the original number of elements along this axis.
sampling_slice = np.arange(sampling_inst)
sampling_slices.append(sampling_slice)
elif sampling_inst < first_tensor.shape[i]:
# We want to SUB-sample this dimension. Randomly pick `sample_inst` many elements from it.
sampling_slice1 = np.array([0]) # We will always sample class 0, the background class.
# Sample the rest of the classes.
sampling_slice2 = np.sort(np.random.choice(np.arange(1, first_tensor.shape[i]), sampling_inst - 1, replace=False))
sampling_slice = np.concatenate([sampling_slice1, sampling_slice2])
sampling_slices.append(sampling_slice)
else:
# We want to UP-sample. Pick all elements from this dimension.
sampling_slice = np.arange(first_tensor.shape[i])
sampling_slices.append(sampling_slice)
up_sample.append(i)
else:
raise ValueError("Each element of the sampling instructions must be either an integer or a list/tuple of integers, but received `{}`".format(type(sampling_inst)))
# Process the first tensor.
subsampled_first_tensor = np.copy(first_tensor[np.ix_(*sampling_slices)])
subsampled_weights_list.append(subsampled_first_tensor)
# Process the other tensors.
if len(weights_list) > 1:
for j in range(1, len(weights_list)):
this_sampling_slices = [sampling_slices[i] for i in axes[j-1]] # Get the sampling slices for this tensor.
subsampled_weights_list.append(np.copy(weights_list[j][np.ix_(*this_sampling_slices)]))
if up_sample:
# Take care of the dimensions that are to be up-sampled.
out_shape = np.array(out_shape)
# Process the first tensor.
if init is None or init[0] == 'gaussian':
upsampled_first_tensor = np.random.normal(loc=mean, scale=stddev, size=out_shape)
elif init[0] == 'zeros':
upsampled_first_tensor = np.zeros(out_shape)
else:
raise ValueError("Valid initializations are 'gaussian' and 'zeros', but received '{}'.".format(init[0]))
# Pick the indices of the elements in `upsampled_first_tensor` that should be occupied by `subsampled_first_tensor`.
up_sample_slices = [np.arange(k) for k in subsampled_first_tensor.shape]
for i in up_sample:
# Randomly select across which indices of this dimension to scatter the elements of `new_weights_tensor` in this dimension.
up_sample_slice1 = np.array([0])
up_sample_slice2 = np.sort(np.random.choice(np.arange(1, upsampled_first_tensor.shape[i]), subsampled_first_tensor.shape[i] - 1, replace=False))
up_sample_slices[i] = np.concatenate([up_sample_slice1, up_sample_slice2])
upsampled_first_tensor[np.ix_(*up_sample_slices)] = subsampled_first_tensor
upsampled_weights_list.append(upsampled_first_tensor)
# Process the other tensors
if len(weights_list) > 1:
for j in range(1, len(weights_list)):
if init is None or init[j] == 'gaussian':
upsampled_tensor = np.random.normal(loc=mean, scale=stddev, size=out_shape[axes[j-1]])
elif init[j] == 'zeros':
upsampled_tensor = np.zeros(out_shape[axes[j-1]])
else:
raise ValueError("Valid initializations are 'gaussian' and 'zeros', but received '{}'.".format(init[j]))
this_up_sample_slices = [up_sample_slices[i] for i in axes[j-1]] # Get the up-sampling slices for this tensor.
upsampled_tensor[np.ix_(*this_up_sample_slices)] = subsampled_weights_list[j]
upsampled_weights_list.append(upsampled_tensor)
return upsampled_weights_list
else:
return subsampled_weights_list

View File

Binary file not shown.

Some files were not shown because too many files have changed in this diff Show More