Summary
4
ssd_keras-master/.gitignore
vendored
Executable file
@@ -0,0 +1,4 @@
|
||||
*.jpg
|
||||
*.jpeg
|
||||
*.weights
|
||||
*.h5
|
||||
22
ssd_keras-master/CONTRIBUTING.md
Executable file
@@ -0,0 +1,22 @@
|
||||
# Contributing Guidelines
|
||||
---
|
||||
|
||||
Contributions to this repository are welcome, but before you create a pull request, consider the following guidelines:
|
||||
|
||||
1. The To-do list in the README of this repository defines the main topics for which contributions are welcome. If you want to contribute, ideally contribute to one of the topics listed there.
|
||||
2. If you'd like to contribute features that are not mentioned on the to-do list in the README, make sure to explain why your proposed change adds value, i.e. what relevant use case it solves. The benefit of any new feature will be compared against the cost of maintaining it and your contribution will be accepter or rejected based on this trade-off.
|
||||
3. One pull request should be about one specific feature or improvement, i.e. it should not contain multiple unrelated changes. If you want to contribute multiple features and/or improvements, create a separate pull request for every individual feature or improvement.
|
||||
3. When you create a pull request, make sure to explain properly
|
||||
* why your propsed change adds value, i.e. what problem or use case it solves,
|
||||
* all the API changes it will introduce, if any,
|
||||
* all behavioral changes in any existing parts of the project it will introduce, if any.
|
||||
4. This should go without saying, but you are responsible for updating any parts of the code or the tutorial notebooks that are affected by your introduced changes.
|
||||
5. Any submitted code must conform to the coding standards and style of this repository. There is no formal guide for coding standards and style, but here are a few things to note:
|
||||
* Any new modules, classes or functions must provide proper docstrings unless they are trivial. These docstrings must have sections for Arguments, Returns, and Raises (if applicable). For every argument of a function, the docstring must explain precisely what the argument does, what data type it expects, whether or not it is optional, and any requirements for the range of values it expects. The same goes for the returns. Use existing docstrings as templates.
|
||||
* Naming:
|
||||
* `ClassNames` consist of capitalized words without underscores.
|
||||
* `module_names.py` consist of lower case words connected with underscores.
|
||||
* `function_names` consist of lower case words connected with underscores.
|
||||
* `variable_names` consist of lower case words connected with underscores.
|
||||
* All module, class, function, and variable names must be descriptive in order to meet the goal that all code should always be as self-explanatory as possible. A longer and descriptive name is always preferable over a shorter and non-descriptive name. Abbreviations are generally to be avoided unless the full words would really make the name too long.
|
||||
* More in-line comments are better than fewer in-line comments and all comments should be precise and succinct.
|
||||
29
ssd_keras-master/ISSUE_TEMPLATE.md
Executable file
@@ -0,0 +1,29 @@
|
||||
### If you open a GitHub issue, here is the policy:
|
||||
|
||||
Your issue must be about one of the following:
|
||||
|
||||
1. a bug,
|
||||
2. a feature request,
|
||||
3. a documentation issue, or
|
||||
4. a question that is **specific to this SSD implementation**.
|
||||
|
||||
You will only get help if you adhere to the following guidelines:
|
||||
|
||||
* Before you open an issue, search the open **and closed** issues first. Your problem/question might already have been solved/answered before.
|
||||
* If you're getting unexpected behavior from code I wrote, open an issue and I'll try to help. If you're getting unexpected behavior from code **you** wrote, you'll have to fix it yourself. E.g. if you made a ton of changes to the code or the tutorials and now it doesn't work anymore, that's your own problem. I don't want to spend my time debugging your code.
|
||||
* Make sure you're using the latest master. If you're 30 commits behind and have a problem, the only answer you'll likely get is to pull the latest master and try again.
|
||||
* Read the documentation. All of it. If the answer to your problem/question can be found in the documentation, you might not get an answer, because, seriously, you could really have figured this out yourself.
|
||||
* If you're asking a question, it must be specific to this SSD implementation. General deep learning or object detection questions will likely get closed without an answer. E.g. a question like "How do I get the mAP of an SSD for my own dataset?" has nothing to do with this particular SSD implementation, because computing the mAP works the same way for any object detection model. You should ask such a question in an appropriate forum or on the [Data Science section of StackOverflow](https://datascience.stackexchange.com/) instead.
|
||||
* If you get an error:
|
||||
* Provide the full stack trace of the error you're getting, not just the error message itself.
|
||||
* Make sure any code you post is properly formatted as such.
|
||||
* Provide any useful information about your environment, e.g.:
|
||||
* Operating System
|
||||
* Which commit of this repository you're on
|
||||
* Keras version
|
||||
* TensorFlow version
|
||||
* Provide a minimal reproducible example, i.e. post code and explain clearly how you ended up with this error.
|
||||
* Provide any useful information about your specific use case and parameters:
|
||||
* What model are you trying to use/train?
|
||||
* Describe the dataset you're using.
|
||||
* List the values of any parameters you changed that might be relevant.
|
||||
176
ssd_keras-master/LICENSE.txt
Normal file
@@ -0,0 +1,176 @@
|
||||
Copyright 2018 Pierluigi Ferrari.
|
||||
|
||||
Apache License
|
||||
Version 2.0, January 2004
|
||||
http://www.apache.org/licenses/
|
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||
|
||||
1. Definitions.
|
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction,
|
||||
and distribution as defined by Sections 1 through 9 of this document.
|
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by
|
||||
the copyright owner that is granting the License.
|
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all
|
||||
other entities that control, are controlled by, or are under common
|
||||
control with that entity. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the
|
||||
direction or management of such entity, whether by contract or
|
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity
|
||||
exercising permissions granted by this License.
|
||||
|
||||
"Source" form shall mean the preferred form for making modifications,
|
||||
including but not limited to software source code, documentation
|
||||
source, and configuration files.
|
||||
|
||||
"Object" form shall mean any form resulting from mechanical
|
||||
transformation or translation of a Source form, including but
|
||||
not limited to compiled object code, generated documentation,
|
||||
and conversions to other media types.
|
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or
|
||||
Object form, made available under the License, as indicated by a
|
||||
copyright notice that is included in or attached to the work
|
||||
(an example is provided in the Appendix below).
|
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object
|
||||
form, that is based on (or derived from) the Work and for which the
|
||||
editorial revisions, annotations, elaborations, or other modifications
|
||||
represent, as a whole, an original work of authorship. For the purposes
|
||||
of this License, Derivative Works shall not include works that remain
|
||||
separable from, or merely link (or bind by name) to the interfaces of,
|
||||
the Work and Derivative Works thereof.
|
||||
|
||||
"Contribution" shall mean any work of authorship, including
|
||||
the original version of the Work and any modifications or additions
|
||||
to that Work or Derivative Works thereof, that is intentionally
|
||||
submitted to Licensor for inclusion in the Work by the copyright owner
|
||||
or by an individual or Legal Entity authorized to submit on behalf of
|
||||
the copyright owner. For the purposes of this definition, "submitted"
|
||||
means any form of electronic, verbal, or written communication sent
|
||||
to the Licensor or its representatives, including but not limited to
|
||||
communication on electronic mailing lists, source code control systems,
|
||||
and issue tracking systems that are managed by, or on behalf of, the
|
||||
Licensor for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise
|
||||
designated in writing by the copyright owner as "Not a Contribution."
|
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||
on behalf of whom a Contribution has been received by Licensor and
|
||||
subsequently incorporated within the Work.
|
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare Derivative Works of,
|
||||
publicly display, publicly perform, sublicense, and distribute the
|
||||
Work and such Derivative Works in Source or Object form.
|
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made,
|
||||
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||
where such license applies only to those patent claims licensable
|
||||
by such Contributor that are necessarily infringed by their
|
||||
Contribution(s) alone or by combination of their Contribution(s)
|
||||
with the Work to which such Contribution(s) was submitted. If You
|
||||
institute patent litigation against any entity (including a
|
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||
or a Contribution incorporated within the Work constitutes direct
|
||||
or contributory patent infringement, then any patent licenses
|
||||
granted to You under this License for that Work shall terminate
|
||||
as of the date such litigation is filed.
|
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the
|
||||
Work or Derivative Works thereof in any medium, with or without
|
||||
modifications, and in Source or Object form, provided that You
|
||||
meet the following conditions:
|
||||
|
||||
(a) You must give any other recipients of the Work or
|
||||
Derivative Works a copy of this License; and
|
||||
|
||||
(b) You must cause any modified files to carry prominent notices
|
||||
stating that You changed the files; and
|
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works
|
||||
that You distribute, all copyright, patent, trademark, and
|
||||
attribution notices from the Source form of the Work,
|
||||
excluding those notices that do not pertain to any part of
|
||||
the Derivative Works; and
|
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its
|
||||
distribution, then any Derivative Works that You distribute must
|
||||
include a readable copy of the attribution notices contained
|
||||
within such NOTICE file, excluding those notices that do not
|
||||
pertain to any part of the Derivative Works, in at least one
|
||||
of the following places: within a NOTICE text file distributed
|
||||
as part of the Derivative Works; within the Source form or
|
||||
documentation, if provided along with the Derivative Works; or,
|
||||
within a display generated by the Derivative Works, if and
|
||||
wherever such third-party notices normally appear. The contents
|
||||
of the NOTICE file are for informational purposes only and
|
||||
do not modify the License. You may add Your own attribution
|
||||
notices within Derivative Works that You distribute, alongside
|
||||
or as an addendum to the NOTICE text from the Work, provided
|
||||
that such additional attribution notices cannot be construed
|
||||
as modifying the License.
|
||||
|
||||
You may add Your own copyright statement to Your modifications and
|
||||
may provide additional or different license terms and conditions
|
||||
for use, reproduction, or distribution of Your modifications, or
|
||||
for any such Derivative Works as a whole, provided Your use,
|
||||
reproduction, and distribution of the Work otherwise complies with
|
||||
the conditions stated in this License.
|
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||
any Contribution intentionally submitted for inclusion in the Work
|
||||
by You to the Licensor shall be under the terms and conditions of
|
||||
this License, without any additional terms or conditions.
|
||||
Notwithstanding the above, nothing herein shall supersede or modify
|
||||
the terms of any separate license agreement you may have executed
|
||||
with Licensor regarding such Contributions.
|
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade
|
||||
names, trademarks, service marks, or product names of the Licensor,
|
||||
except as required for reasonable and customary use in describing the
|
||||
origin of the Work and reproducing the content of the NOTICE file.
|
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||
agreed to in writing, Licensor provides the Work (and each
|
||||
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied, including, without limitation, any warranties or conditions
|
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||
appropriateness of using or redistributing the Work and assume any
|
||||
risks associated with Your exercise of permissions under this License.
|
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory,
|
||||
whether in tort (including negligence), contract, or otherwise,
|
||||
unless required by applicable law (such as deliberate and grossly
|
||||
negligent acts) or agreed to in writing, shall any Contributor be
|
||||
liable to You for damages, including any direct, indirect, special,
|
||||
incidental, or consequential damages of any character arising as a
|
||||
result of this License or out of the use or inability to use the
|
||||
Work (including but not limited to damages for loss of goodwill,
|
||||
work stoppage, computer failure or malfunction, or any and all
|
||||
other commercial damages or losses), even if such Contributor
|
||||
has been advised of the possibility of such damages.
|
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing
|
||||
the Work or Derivative Works thereof, You may choose to offer,
|
||||
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||
or other liability obligations and/or rights consistent with this
|
||||
License. However, in accepting such obligations, You may act only
|
||||
on Your own behalf and on Your sole responsibility, not on behalf
|
||||
of any other Contributor, and only if You agree to indemnify,
|
||||
defend, and hold each Contributor harmless for any liability
|
||||
incurred by, or claims asserted against, such Contributor by reason
|
||||
of your accepting any such warranty or additional liability.
|
||||
283
ssd_keras-master/Prueba_trainingssd300.py
Normal file
@@ -0,0 +1,283 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
Created on Thu May 16 16:09:31 2019
|
||||
|
||||
@author: dlsaavedra
|
||||
"""
|
||||
|
||||
from keras.optimizers import Adam, SGD
|
||||
from keras.callbacks import ModelCheckpoint, LearningRateScheduler, TerminateOnNaN, CSVLogger
|
||||
from keras import backend as K
|
||||
from keras.models import load_model
|
||||
from math import ceil
|
||||
import numpy as np
|
||||
from matplotlib import pyplot as plt
|
||||
|
||||
from models.keras_ssd512 import ssd_512
|
||||
from models.keras_ssd300 import ssd_300
|
||||
from keras_loss_function.keras_ssd_loss import SSDLoss
|
||||
from keras_layers.keras_layer_AnchorBoxes import AnchorBoxes
|
||||
from keras_layers.keras_layer_DecodeDetections import DecodeDetections
|
||||
from keras_layers.keras_layer_DecodeDetectionsFast import DecodeDetectionsFast
|
||||
from keras_layers.keras_layer_L2Normalization import L2Normalization
|
||||
|
||||
from ssd_encoder_decoder.ssd_input_encoder import SSDInputEncoder
|
||||
from ssd_encoder_decoder.ssd_output_decoder import decode_detections, decode_detections_fast
|
||||
|
||||
from data_generator.object_detection_2d_data_generator import DataGenerator
|
||||
from data_generator.object_detection_2d_geometric_ops import Resize
|
||||
from data_generator.object_detection_2d_photometric_ops import ConvertTo3Channels
|
||||
from data_generator.data_augmentation_chain_original_ssd import SSDDataAugmentation
|
||||
from data_generator.object_detection_2d_misc_utils import apply_inverse_transforms
|
||||
|
||||
#%%
|
||||
img_height = 300 # Height of the model input images
|
||||
img_width = 300 # Width of the model input images
|
||||
img_channels = 3 # Number of color channels of the model input images
|
||||
mean_color = [123, 117, 104] # The per-channel mean of the images in the dataset. Do not change this value if you're using any of the pre-trained weights.
|
||||
swap_channels = [2, 1, 0] # The color channel order in the original SSD is BGR, so we'll have the model reverse the color channel order of the input images.
|
||||
n_classes = 20 # Number of positive classes, e.g. 20 for Pascal VOC, 80 for MS COCO
|
||||
scales_pascal = [0.1, 0.2, 0.37, 0.54, 0.71, 0.88, 1.05] # The anchor box scaling factors used in the original SSD300 for the Pascal VOC datasets
|
||||
scales_coco = [0.07, 0.15, 0.33, 0.51, 0.69, 0.87, 1.05] # The anchor box scaling factors used in the original SSD300 for the MS COCO datasets
|
||||
scales = scales_pascal
|
||||
aspect_ratios = [[1.0, 2.0, 0.5],
|
||||
[1.0, 2.0, 0.5, 3.0, 1.0/3.0],
|
||||
[1.0, 2.0, 0.5, 3.0, 1.0/3.0],
|
||||
[1.0, 2.0, 0.5, 3.0, 1.0/3.0],
|
||||
[1.0, 2.0, 0.5],
|
||||
[1.0, 2.0, 0.5]] # The anchor box aspect ratios used in the original SSD300; the order matters
|
||||
two_boxes_for_ar1 = True
|
||||
steps = [8, 16, 32, 64, 100, 300] # The space between two adjacent anchor box center points for each predictor layer.
|
||||
offsets = [0.5, 0.5, 0.5, 0.5, 0.5, 0.5] # The offsets of the first anchor box center points from the top and left borders of the image as a fraction of the step size for each predictor layer.
|
||||
clip_boxes = False # Whether or not to clip the anchor boxes to lie entirely within the image boundaries
|
||||
variances = [0.1, 0.1, 0.2, 0.2] # The variances by which the encoded target coordinates are divided as in the original implementation
|
||||
normalize_coords = True
|
||||
|
||||
K.clear_session() # Clear previous models from memory.
|
||||
|
||||
model = ssd_300(image_size=(img_height, img_width, img_channels),
|
||||
n_classes=n_classes,
|
||||
mode='training',
|
||||
l2_regularization=0.0005,
|
||||
scales=scales,
|
||||
aspect_ratios_per_layer=aspect_ratios,
|
||||
two_boxes_for_ar1=two_boxes_for_ar1,
|
||||
steps=steps,
|
||||
offsets=offsets,
|
||||
clip_boxes=clip_boxes,
|
||||
variances=variances,
|
||||
normalize_coords=normalize_coords,
|
||||
subtract_mean=mean_color,
|
||||
swap_channels=swap_channels)
|
||||
#%%
|
||||
# 2: Load some weights into the model.
|
||||
|
||||
# TODO: Set the path to the weights you want to load.
|
||||
#weights_path = 'VGG_VOC0712Plus_SSD_300x300_ft_iter_160000.h5'
|
||||
weights_path = 'VGG_ILSVRC_16_layers_fc_reduced.h5'
|
||||
|
||||
model.load_weights(weights_path, by_name=True)
|
||||
|
||||
# 3: Instantiate an optimizer and the SSD loss function and compile the model.
|
||||
# If you want to follow the original Caffe implementation, use the preset SGD
|
||||
# optimizer, otherwise I'd recommend the commented-out Adam optimizer.
|
||||
|
||||
#adam = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
|
||||
sgd = SGD(lr=0.001, momentum=0.9, decay=0.0, nesterov=False)
|
||||
|
||||
ssd_loss = SSDLoss(neg_pos_ratio=3, alpha=1.0)
|
||||
|
||||
model.compile(optimizer=sgd, loss=ssd_loss.compute_loss)
|
||||
model.summary()
|
||||
#%%
|
||||
|
||||
# 1: Instantiate two `DataGenerator` objects: One for training, one for validation.
|
||||
|
||||
# Optional: If you have enough memory, consider loading the images into memory for the reasons explained above.
|
||||
|
||||
train_dataset = DataGenerator(load_images_into_memory=False, hdf5_dataset_path=None)
|
||||
val_dataset = DataGenerator(load_images_into_memory=False, hdf5_dataset_path=None)
|
||||
|
||||
# 2: Parse the image and label lists for the training and validation datasets. This can take a while.
|
||||
|
||||
# TODO: Set the paths to the datasets here.
|
||||
|
||||
# The directories that contain the images.
|
||||
VOC_2007_images_dir = '../VOCdevkit/VOC2007/JPEGImages/'
|
||||
VOC_2012_images_dir = '../VOCdevkit/VOC2012/JPEGImages/'
|
||||
|
||||
# The directories that contain the annotations.
|
||||
VOC_2007_annotations_dir = '../VOCdevkit/VOC2007/Annotations/'
|
||||
VOC_2012_annotations_dir = '../VOCdevkit/VOC2012/Annotations/'
|
||||
|
||||
# The paths to the image sets.
|
||||
VOC_2007_train_image_set_filename = '../VOCdevkit/VOC2007/ImageSets/Main/train.txt'
|
||||
VOC_2012_train_image_set_filename = '../VOCdevkit/VOC2012/ImageSets/Main/train.txt'
|
||||
VOC_2007_val_image_set_filename = '../VOCdevkit/VOC2007/ImageSets/Main/val.txt'
|
||||
VOC_2012_val_image_set_filename = '../VOCdevkit/VOC2012/ImageSets/Main/val.txt'
|
||||
VOC_2007_trainval_image_set_filename = '../VOCdevkit/VOC2007/ImageSets/Main/trainval.txt'
|
||||
VOC_2012_trainval_image_set_filename = '../VOCdevkit/VOC2012/ImageSets/Main/trainval.txt'
|
||||
VOC_2007_test_image_set_filename = '../VOCdevkit/VOC2007/ImageSets/Main/test.txt'
|
||||
|
||||
# The XML parser needs to now what object class names to look for and in which order to map them to integers.
|
||||
classes = ['background',
|
||||
'aeroplane', 'bicycle', 'bird', 'boat',
|
||||
'bottle', 'bus', 'car', 'cat',
|
||||
'chair', 'cow', 'diningtable', 'dog',
|
||||
'horse', 'motorbike', 'person', 'pottedplant',
|
||||
'sheep', 'sofa', 'train', 'tvmonitor']
|
||||
|
||||
classes = ['background', 'Gun', 'Knife', 'Razor', 'Shuriken']
|
||||
|
||||
train_dataset.parse_xml(images_dirs= ['/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/images'],
|
||||
image_set_filenames=["/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/train.txt"],
|
||||
annotations_dirs=["/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/anns"],
|
||||
classes=classes,
|
||||
include_classes='all',
|
||||
exclude_truncated=False,
|
||||
exclude_difficult=False,
|
||||
ret=False)
|
||||
|
||||
val_dataset.parse_xml(images_dirs= ['/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/images'],
|
||||
image_set_filenames=["/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/train.txt"],
|
||||
annotations_dirs=["/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/anns"],
|
||||
classes=classes,
|
||||
include_classes='all',
|
||||
exclude_truncated=False,
|
||||
exclude_difficult=False,
|
||||
ret=False)
|
||||
|
||||
#train_dataset.parse_xml(images_dirs=[VOC_2012_images_dir],
|
||||
# image_set_filenames=[VOC_2012_trainval_image_set_filename],
|
||||
# annotations_dirs=[VOC_2012_annotations_dir],
|
||||
# classes=classes,
|
||||
# include_classes='all',
|
||||
# exclude_truncated=False,
|
||||
# exclude_difficult=False,
|
||||
# ret=False)
|
||||
#
|
||||
#val_dataset.parse_xml(images_dirs=[VOC_2012_images_dir],
|
||||
# image_set_filenames=[VOC_2012_trainval_image_set_filename],
|
||||
# annotations_dirs=[VOC_2012_annotations_dir],
|
||||
# classes=classes,
|
||||
# include_classes='all',
|
||||
# exclude_truncated=False,
|
||||
# exclude_difficult=True,
|
||||
# ret=False)
|
||||
|
||||
#%%
|
||||
|
||||
# 3: Set the batch size.
|
||||
|
||||
batch_size = 32 # Change the batch size if you like, or if you run into GPU memory issues.
|
||||
|
||||
# 4: Set the image transformations for pre-processing and data augmentation options.
|
||||
|
||||
# For the training generator:
|
||||
ssd_data_augmentation = SSDDataAugmentation(img_height=img_height,
|
||||
img_width=img_width,
|
||||
background=mean_color)
|
||||
|
||||
# For the validation generator:
|
||||
convert_to_3_channels = ConvertTo3Channels()
|
||||
resize = Resize(height=img_height, width=img_width)
|
||||
|
||||
# 5: Instantiate an encoder that can encode ground truth labels into the format needed by the SSD loss function.
|
||||
|
||||
# The encoder constructor needs the spatial dimensions of the model's predictor layers to create the anchor boxes.
|
||||
predictor_sizes = [model.get_layer('conv4_3_norm_mbox_conf').output_shape[1:3],
|
||||
model.get_layer('fc7_mbox_conf').output_shape[1:3],
|
||||
model.get_layer('conv6_2_mbox_conf').output_shape[1:3],
|
||||
model.get_layer('conv7_2_mbox_conf').output_shape[1:3],
|
||||
model.get_layer('conv8_2_mbox_conf').output_shape[1:3],
|
||||
model.get_layer('conv9_2_mbox_conf').output_shape[1:3]]
|
||||
|
||||
ssd_input_encoder = SSDInputEncoder(img_height=img_height,
|
||||
img_width=img_width,
|
||||
n_classes=n_classes,
|
||||
predictor_sizes=predictor_sizes,
|
||||
scales=scales,
|
||||
aspect_ratios_per_layer=aspect_ratios,
|
||||
two_boxes_for_ar1=two_boxes_for_ar1,
|
||||
steps=steps,
|
||||
offsets=offsets,
|
||||
clip_boxes=clip_boxes,
|
||||
variances=variances,
|
||||
matching_type='multi',
|
||||
pos_iou_threshold=0.5,
|
||||
neg_iou_limit=0.5,
|
||||
normalize_coords=normalize_coords)
|
||||
|
||||
# 6: Create the generator handles that will be passed to Keras' `fit_generator()` function.
|
||||
|
||||
train_generator = train_dataset.generate(batch_size=batch_size,
|
||||
shuffle=True,
|
||||
transformations=[ssd_data_augmentation],
|
||||
label_encoder=ssd_input_encoder,
|
||||
returns={'processed_images',
|
||||
'encoded_labels'},
|
||||
keep_images_without_gt=False)
|
||||
|
||||
val_generator = val_dataset.generate(batch_size=batch_size,
|
||||
shuffle=False,
|
||||
transformations=[convert_to_3_channels,
|
||||
resize],
|
||||
label_encoder=ssd_input_encoder,
|
||||
returns={'processed_images',
|
||||
'encoded_labels'},
|
||||
keep_images_without_gt=False)
|
||||
|
||||
# Get the number of samples in the training and validations datasets.
|
||||
train_dataset_size = train_dataset.get_dataset_size()
|
||||
val_dataset_size = val_dataset.get_dataset_size()
|
||||
|
||||
print("Number of images in the training dataset:\t{:>6}".format(train_dataset_size))
|
||||
print("Number of images in the validation dataset:\t{:>6}".format(val_dataset_size))
|
||||
|
||||
#%%
|
||||
def lr_schedule(epoch):
|
||||
if epoch < 80:
|
||||
return 0.001
|
||||
elif epoch < 100:
|
||||
return 0.0001
|
||||
else:
|
||||
return 0.00001
|
||||
|
||||
# Define model callbacks.
|
||||
|
||||
# TODO: Set the filepath under which you want to save the model.
|
||||
model_checkpoint = ModelCheckpoint(filepath='ssd300_pascal_07+12_epoch-{epoch:02d}_loss-{loss:.4f}_val_loss-{val_loss:.4f}.h5',
|
||||
monitor='val_loss',
|
||||
verbose=1,
|
||||
save_best_only=True,
|
||||
save_weights_only=False,
|
||||
mode='auto',
|
||||
period=1)
|
||||
#model_checkpoint.best =
|
||||
|
||||
csv_logger = CSVLogger(filename='ssd300_pascal_07+12_training_log.csv',
|
||||
separator=',',
|
||||
append=True)
|
||||
|
||||
learning_rate_scheduler = LearningRateScheduler(schedule=lr_schedule,
|
||||
verbose=1)
|
||||
|
||||
terminate_on_nan = TerminateOnNaN()
|
||||
|
||||
callbacks = [model_checkpoint,
|
||||
csv_logger,
|
||||
learning_rate_scheduler,
|
||||
terminate_on_nan]
|
||||
#%%
|
||||
initial_epoch = 0
|
||||
final_epoch = 120
|
||||
steps_per_epoch = 1000
|
||||
|
||||
history = model.fit_generator(generator=train_generator,
|
||||
steps_per_epoch=steps_per_epoch,
|
||||
epochs=final_epoch,
|
||||
callbacks=callbacks,
|
||||
validation_data=val_generator,
|
||||
validation_steps=ceil(val_dataset_size/batch_size),
|
||||
initial_epoch=initial_epoch)
|
||||
266
ssd_keras-master/README.md
Executable file
@@ -0,0 +1,266 @@
|
||||
## SSD: Single-Shot MultiBox Detector implementation in Keras
|
||||
---
|
||||
### Contents
|
||||
|
||||
1. [Overview](#overview)
|
||||
2. [Performance](#performance)
|
||||
3. [Examples](#examples)
|
||||
4. [Dependencies](#dependencies)
|
||||
5. [How to use it](#how-to-use-it)
|
||||
6. [Download the convolutionalized VGG-16 weights](#download-the-convolutionalized-vgg-16-weights)
|
||||
7. [Download the original trained model weights](#download-the-original-trained-model-weights)
|
||||
8. [How to fine-tune one of the trained models on your own dataset](#how-to-fine-tune-one-of-the-trained-models-on-your-own-dataset)
|
||||
9. [ToDo](#todo)
|
||||
10. [Important notes](#important-notes)
|
||||
11. [Terminology](#terminology)
|
||||
|
||||
### Overview
|
||||
|
||||
This is a Keras port of the SSD model architecture introduced by Wei Liu et al. in the paper [SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325).
|
||||
|
||||
Ports of the trained weights of all the original models are provided below. This implementation is accurate, meaning that both the ported weights and models trained from scratch produce the same mAP values as the respective models of the original Caffe implementation (see performance section below).
|
||||
|
||||
The main goal of this project is to create an SSD implementation that is well documented for those who are interested in a low-level understanding of the model. The provided tutorials, documentation and detailed comments hopefully make it a bit easier to dig into the code and adapt or build upon the model than with most other implementations out there (Keras or otherwise) that provide little to no documentation and comments.
|
||||
|
||||
The repository currently provides the following network architectures:
|
||||
* SSD300: [`keras_ssd300.py`](models/keras_ssd300.py)
|
||||
* SSD512: [`keras_ssd512.py`](models/keras_ssd512.py)
|
||||
* SSD7: [`keras_ssd7.py`](models/keras_ssd7.py) - a smaller 7-layer version that can be trained from scratch relatively quickly even on a mid-tier GPU, yet is capable enough for less complex object detection tasks and testing. You're obviously not going to get state-of-the-art results with that one, but it's fast.
|
||||
|
||||
If you would like to use one of the provided trained models for transfer learning (i.e. fine-tune one of the trained models on your own dataset), there is a [Jupyter notebook tutorial](weight_sampling_tutorial.ipynb) that helps you sub-sample the trained weights so that they are compatible with your dataset, see further below.
|
||||
|
||||
If you would like to build an SSD with your own base network architecture, you can use [`keras_ssd7.py`](models/keras_ssd7.py) as a template, it provides documentation and comments to help you.
|
||||
|
||||
### Performance
|
||||
|
||||
Here are the mAP evaluation results of the ported weights and below that the evaluation results of a model trained from scratch using this implementation. All models were evaluated using the official Pascal VOC test server (for 2012 `test`) or the official Pascal VOC Matlab evaluation script (for 2007 `test`). In all cases the results match (or slightly surpass) those of the original Caffe models. Download links to all ported weights are available further below.
|
||||
|
||||
<table width="70%">
|
||||
<tr>
|
||||
<td></td>
|
||||
<td colspan=3 align=center>Mean Average Precision</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>evaluated on</td>
|
||||
<td colspan=2 align=center>VOC2007 test</td>
|
||||
<td align=center>VOC2012 test</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>trained on<br>IoU rule</td>
|
||||
<td align=center width="25%">07+12<br>0.5</td>
|
||||
<td align=center width="25%">07+12+COCO<br>0.5</td>
|
||||
<td align=center width="25%">07++12+COCO<br>0.5</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><b>SSD300</td>
|
||||
<td align=center><b>77.5</td>
|
||||
<td align=center><b>81.2</td>
|
||||
<td align=center><b>79.4</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><b>SSD512</td>
|
||||
<td align=center><b>79.8</td>
|
||||
<td align=center><b>83.2</td>
|
||||
<td align=center><b>82.3</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
Training an SSD300 from scratch to convergence on Pascal VOC 2007 `trainval` and 2012 `trainval` produces the same mAP on Pascal VOC 2007 `test` as the original Caffe SSD300 "07+12" model. You can find a summary of the training [here](training_summaries/ssd300_pascal_07+12_training_summary.md).
|
||||
|
||||
<table width="95%">
|
||||
<tr>
|
||||
<td></td>
|
||||
<td colspan=3 align=center>Mean Average Precision</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td></td>
|
||||
<td align=center>Original Caffe Model</td>
|
||||
<td align=center>Ported Weights</td>
|
||||
<td align=center>Trained from Scratch</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><b>SSD300 "07+12"</td>
|
||||
<td align=center width="26%"><b>0.772</td>
|
||||
<td align=center width="26%"><b>0.775</td>
|
||||
<td align=center width="26%"><b><a href="https://drive.google.com/file/d/1-MYYaZbIHNPtI2zzklgVBAjssbP06BeA/view">0.771</a></td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
The models achieve the following average number of frames per second (FPS) on Pascal VOC on an NVIDIA GeForce GTX 1070 mobile (i.e. the laptop version) and cuDNN v6. There are two things to note here. First, note that the benchmark prediction speeds of the original Caffe implementation were achieved using a TitanX GPU and cuDNN v4. Second, the paper says they measured the prediction speed at batch size 8, which I think isn't a meaningful way of measuring the speed. The whole point of measuring the speed of a detection model is to know how many individual sequential images the model can process per second, therefore measuring the prediction speed on batches of images and then deducing the time spent on each individual image in the batch defeats the purpose. For the sake of comparability, below you find the prediction speed for the original Caffe SSD implementation and the prediction speed for this implementation under the same conditions, i.e. at batch size 8. In addition you find the prediction speed for this implementation at batch size 1, which in my opinion is the more meaningful number.
|
||||
|
||||
<table width>
|
||||
<tr>
|
||||
<td></td>
|
||||
<td colspan=3 align=center>Frames per Second</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td></td>
|
||||
<td align=center>Original Caffe Implementation</td>
|
||||
<td colspan=2 align=center>This Implementation</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="14%">Batch Size</td>
|
||||
<td width="27%" align=center>8</td>
|
||||
<td width="27%" align=center>8</td>
|
||||
<td width="27%" align=center>1</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><b>SSD300</td>
|
||||
<td align=center><b>46</td>
|
||||
<td align=center><b>49</td>
|
||||
<td align=center><b>39</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><b>SSD512</td>
|
||||
<td align=center><b>19</td>
|
||||
<td align=center><b>25</td>
|
||||
<td align=center><b>20</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><b>SSD7</td>
|
||||
<td align=center><b></td>
|
||||
<td align=center><b>216</td>
|
||||
<td align=center><b>127</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
### Examples
|
||||
|
||||
Below are some prediction examples of the fully trained original SSD300 "07+12" model (i.e. trained on Pascal VOC2007 `trainval` and VOC2012 `trainval`). The predictions were made on Pascal VOC2007 `test`.
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
|  |  |
|
||||
|  |  |
|
||||
|
||||
Here are some prediction examples of an SSD7 (i.e. the small 7-layer version) partially trained on two road traffic datasets released by [Udacity](https://github.com/udacity/self-driving-car/tree/master/annotations) with roughly 20,000 images in total and 5 object categories (more info in [`ssd7_training.ipynb`](ssd7_training.ipynb)). The predictions you see below were made after 10,000 training steps at batch size 32. Admittedly, cars are comparatively easy objects to detect and I picked a few of the better examples, but it is nonetheless remarkable what such a small model can do after only 10,000 training iterations.
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
|  |  |
|
||||
|  |  |
|
||||
|
||||
### Dependencies
|
||||
|
||||
* Python 3.x
|
||||
* Numpy
|
||||
* TensorFlow 1.x
|
||||
* Keras 2.x
|
||||
* OpenCV
|
||||
* Beautiful Soup 4.x
|
||||
|
||||
The Theano and CNTK backends are currently not supported.
|
||||
|
||||
Python 2 compatibility: This implementation seems to work with Python 2.7, but I don't provide any support for it. It's 2018 and nobody should be using Python 2 anymore.
|
||||
|
||||
### How to use it
|
||||
|
||||
This repository provides Jupyter notebook tutorials that explain training, inference and evaluation, and there are a bunch of explanations in the subsequent sections that complement the notebooks.
|
||||
|
||||
How to use a trained model for inference:
|
||||
* [`ssd300_inference.ipynb`](ssd300_inference.ipynb)
|
||||
* [`ssd512_inference.ipynb`](ssd512_inference.ipynb)
|
||||
|
||||
How to train a model:
|
||||
* [`ssd300_training.ipynb`](ssd300_training.ipynb)
|
||||
* [`ssd7_training.ipynb`](ssd7_training.ipynb)
|
||||
|
||||
How to use one of the provided trained models for transfer learning on your own dataset:
|
||||
* [Read below](#how-to-fine-tune-one-of-the-trained-models-on-your-own-dataset)
|
||||
|
||||
How to evaluate a trained model:
|
||||
* In general: [`ssd300_evaluation.ipynb`](ssd300_evaluation.ipynb)
|
||||
* On MS COCO: [`ssd300_evaluation_COCO.ipynb`](ssd300_evaluation_COCO.ipynb)
|
||||
|
||||
How to use the data generator:
|
||||
* The data generator used here has its own repository with a detailed tutorial [here](https://github.com/pierluigiferrari/data_generator_object_detection_2d)
|
||||
|
||||
#### Training details
|
||||
|
||||
The general training setup is layed out and explained in [`ssd7_training.ipynb`](ssd7_training.ipynb) and in [`ssd300_training.ipynb`](ssd300_training.ipynb). The setup and explanations are similar in both notebooks for the most part, so it doesn't matter which one you look at to understand the general training setup, but the parameters in [`ssd300_training.ipynb`](ssd300_training.ipynb) are preset to copy the setup of the original Caffe implementation for training on Pascal VOC, while the parameters in [`ssd7_training.ipynb`](ssd7_training.ipynb) are preset to train on the [Udacity traffic datasets](https://github.com/udacity/self-driving-car/tree/master/annotations).
|
||||
|
||||
To train the original SSD300 model on Pascal VOC:
|
||||
|
||||
1. Download the datasets:
|
||||
```c
|
||||
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
|
||||
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
|
||||
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
|
||||
```
|
||||
2. Download the weights for the convolutionalized VGG-16 or for one of the trained original models provided below.
|
||||
3. Set the file paths for the datasets and model weights accordingly in [`ssd300_training.ipynb`](ssd300_training.ipynb) and execute the cells.
|
||||
|
||||
The procedure for training SSD512 is the same of course. It is imperative that you load the pre-trained VGG-16 weights when attempting to train an SSD300 or SSD512 from scratch, otherwise the training will probably fail. Here is a summary of a full training of the SSD300 "07+12" model for comparison with your own training:
|
||||
|
||||
* [SSD300 Pascal VOC "07+12" training summary](training_summaries/ssd300_pascal_07+12_training_summary.md)
|
||||
|
||||
#### Encoding and decoding boxes
|
||||
|
||||
The [`ssd_encoder_decoder`](ssd_encoder_decoder) sub-package contains all functions and classes related to encoding and decoding boxes. Encoding boxes means converting ground truth labels into the target format that the loss function needs during training. It is this encoding process in which the matching of ground truth boxes to anchor boxes (the paper calls them default boxes and in the original C++ code they are called priors - all the same thing) happens. Decoding boxes means converting raw model output back to the input label format, which entails various conversion and filtering processes such as non-maximum suppression (NMS).
|
||||
|
||||
In order to train the model, you need to create an instance of `SSDInputEncoder` that needs to be passed to the data generator. The data generator does the rest, so you don't usually need to call any of `SSDInputEncoder`'s methods manually.
|
||||
|
||||
Models can be created in 'training' or 'inference' mode. In 'training' mode, the model outputs the raw prediction tensor that still needs to be post-processed with coordinate conversion, confidence thresholding, non-maximum suppression, etc. The functions `decode_detections()` and `decode_detections_fast()` are responsible for that. The former follows the original Caffe implementation, which entails performing NMS per object class, while the latter performs NMS globally across all object classes and is thus more efficient, but also behaves slightly differently. Read the documentation for details about both functions. If a model is created in 'inference' mode, its last layer is the `DecodeDetections` layer, which performs all the post-processing that `decode_detections()` does, but in TensorFlow. That means the output of the model is already the post-processed output. In order to be trainable, a model must be created in 'training' mode. The trained weights can then later be loaded into a model that was created in 'inference' mode.
|
||||
|
||||
A note on the anchor box offset coordinates used internally by the model: This may or may not be obvious to you, but it is important to understand that it is not possible for the model to predict absolute coordinates for the predicted bounding boxes. In order to be able to predict absolute box coordinates, the convolutional layers responsible for localization would need to produce different output values for the same object instance at different locations within the input image. This isn't possible of course: For a given input to the filter of a convolutional layer, the filter will produce the same output regardless of the spatial position within the image because of the shared weights. This is the reason why the model predicts offsets to anchor boxes instead of absolute coordinates, and why during training, absolute ground truth coordinates are converted to anchor box offsets in the encoding process. The fact that the model predicts offsets to anchor box coordinates is in turn the reason why the model contains anchor box layers that do nothing but output the anchor box coordinates so that the model's output tensor can include those. If the model's output tensor did not contain the anchor box coordinates, the information to convert the predicted offsets back to absolute coordinates would be missing in the model output.
|
||||
|
||||
#### Using a different base network architecture
|
||||
|
||||
If you want to build a different base network architecture, you could use [`keras_ssd7.py`](models/keras_ssd7.py) as a template. It provides documentation and comments to help you turn it into a different base network. Put together the base network you want and add a predictor layer on top of each network layer from which you would like to make predictions. Create two predictor heads for each, one for localization, one for classification. Create an anchor box layer for each predictor layer and set the respective localization head's output as the input for the anchor box layer. The structure of all tensor reshaping and concatenation operations remains the same, you just have to make sure to include all of your predictor and anchor box layers of course.
|
||||
|
||||
### Download the convolutionalized VGG-16 weights
|
||||
|
||||
In order to train an SSD300 or SSD512 from scratch, download the weights of the fully convolutionalized VGG-16 model trained to convergence on ImageNet classification here:
|
||||
|
||||
[`VGG_ILSVRC_16_layers_fc_reduced.h5`](https://drive.google.com/open?id=1sBmajn6vOE7qJ8GnxUJt4fGPuffVUZox).
|
||||
|
||||
As with all other weights files below, this is a direct port of the corresponding `.caffemodel` file that is provided in the repository of the original Caffe implementation.
|
||||
|
||||
### Download the original trained model weights
|
||||
|
||||
Here are the ported weights for all the original trained models. The filenames correspond to their respective `.caffemodel` counterparts. The asterisks and footnotes refer to those in the README of the [original Caffe implementation](https://github.com/weiliu89/caffe/tree/ssd#models).
|
||||
|
||||
1. PASCAL VOC models:
|
||||
|
||||
* 07+12: [SSD300*](https://drive.google.com/open?id=121-kCXaOHOkJE_Kf5lKcJvC_5q1fYb_q), [SSD512*](https://drive.google.com/open?id=19NIa0baRCFYT3iRxQkOKCD7CpN6BFO8p)
|
||||
* 07++12: [SSD300*](https://drive.google.com/open?id=1M99knPZ4DpY9tI60iZqxXsAxX2bYWDvZ), [SSD512*](https://drive.google.com/open?id=18nFnqv9fG5Rh_fx6vUtOoQHOLySt4fEx)
|
||||
* COCO[1]: [SSD300*](https://drive.google.com/open?id=17G1J4zEpFwiOzgBmq886ci4P3YaIz8bY), [SSD512*](https://drive.google.com/open?id=1wGc368WyXSHZOv4iow2tri9LnB0vm9X-)
|
||||
* 07+12+COCO: [SSD300*](https://drive.google.com/open?id=1vtNI6kSnv7fkozl7WxyhGyReB6JvDM41), [SSD512*](https://drive.google.com/open?id=14mELuzm0OvXnwjb0mzAiG-Ake9_NP_LQ)
|
||||
* 07++12+COCO: [SSD300*](https://drive.google.com/open?id=1fyDDUcIOSjeiP08vl1WCndcFdtboFXua), [SSD512*](https://drive.google.com/open?id=1a-64b6y6xsQr5puUsHX_wxI1orQDercM)
|
||||
|
||||
|
||||
2. COCO models:
|
||||
|
||||
* trainval35k: [SSD300*](https://drive.google.com/open?id=1vmEF7FUsWfHquXyCqO17UaXOPpRbwsdj), [SSD512*](https://drive.google.com/open?id=1IJWZKmjkcFMlvaz2gYukzFx4d6mH3py5)
|
||||
|
||||
|
||||
3. ILSVRC models:
|
||||
|
||||
* trainval1: [SSD300*](https://drive.google.com/open?id=1VWkj1oQS2RUhyJXckx3OaDYs5fx2mMCq), [SSD500](https://drive.google.com/open?id=1LcBPsd9CJbuBw4KiSuE1o1fMA-Pz2Zvw)
|
||||
|
||||
### How to fine-tune one of the trained models on your own dataset
|
||||
|
||||
If you want to fine-tune one of the provided trained models on your own dataset, chances are your dataset doesn't have the same number of classes as the trained model. The following tutorial explains how to deal with this problem:
|
||||
|
||||
[`weight_sampling_tutorial.ipynb`](weight_sampling_tutorial.ipynb)
|
||||
|
||||
### ToDo
|
||||
|
||||
The following things are on the to-do list, ranked by priority. Contributions are welcome, but please read the [contributing guidelines](CONTRIBUTING.md).
|
||||
|
||||
1. Add model definitions and trained weights for SSDs based on other base networks such as MobileNet, InceptionResNetV2, or DenseNet.
|
||||
2. Add support for the Theano and CNTK backends. Requires porting the custom layers and the loss function from TensorFlow to the abstract Keras backend.
|
||||
|
||||
Currently in the works:
|
||||
|
||||
* A new [Focal Loss](https://arxiv.org/abs/1708.02002) loss function.
|
||||
|
||||
### Important notes
|
||||
|
||||
* All trained models that were trained on MS COCO use the smaller anchor box scaling factors provided in all of the Jupyter notebooks. In particular, note that the '07+12+COCO' and '07++12+COCO' models use the smaller scaling factors.
|
||||
|
||||
### Terminology
|
||||
|
||||
* "Anchor boxes": The paper calls them "default boxes", in the original C++ code they are called "prior boxes" or "priors", and the Faster R-CNN paper calls them "anchor boxes". All terms mean the same thing, but I slightly prefer the name "anchor boxes" because I find it to be the most descriptive of these names. I call them "prior boxes" or "priors" in `keras_ssd300.py` and `keras_ssd512.py` to stay consistent with the original Caffe implementation, but everywhere else I use the name "anchor boxes" or "anchors".
|
||||
* "Labels": For the purpose of this project, datasets consist of "images" and "labels". Everything that belongs to the annotations of a given image is the "labels" of that image: Not just object category labels, but also bounding box coordinates. "Labels" is just shorter than "annotations". I also use the terms "labels" and "targets" more or less interchangeably throughout the documentation, although "targets" means labels specifically in the context of training.
|
||||
* "Predictor layer": The "predictor layers" or "predictors" are all the last convolution layers of the network, i.e. all convolution layers that do not feed into any subsequent convolution layers.
|
||||
0
ssd_keras-master/__init__.py
Normal file
0
ssd_keras-master/bounding_box_utils/__init__.py
Normal file
BIN
ssd_keras-master/bounding_box_utils/__init__.pyc
Normal file
383
ssd_keras-master/bounding_box_utils/bounding_box_utils.py
Normal file
@@ -0,0 +1,383 @@
|
||||
'''
|
||||
Includes:
|
||||
* Function to compute the IoU similarity for axis-aligned, rectangular, 2D bounding boxes
|
||||
* Function for coordinate conversion for axis-aligned, rectangular, 2D bounding boxes
|
||||
|
||||
Copyright (C) 2018 Pierluigi Ferrari
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
'''
|
||||
|
||||
from __future__ import division
|
||||
import numpy as np
|
||||
|
||||
def convert_coordinates(tensor, start_index, conversion, border_pixels='half'):
|
||||
'''
|
||||
Convert coordinates for axis-aligned 2D boxes between two coordinate formats.
|
||||
|
||||
Creates a copy of `tensor`, i.e. does not operate in place. Currently there are
|
||||
three supported coordinate formats that can be converted from and to each other:
|
||||
1) (xmin, xmax, ymin, ymax) - the 'minmax' format
|
||||
2) (xmin, ymin, xmax, ymax) - the 'corners' format
|
||||
2) (cx, cy, w, h) - the 'centroids' format
|
||||
|
||||
Arguments:
|
||||
tensor (array): A Numpy nD array containing the four consecutive coordinates
|
||||
to be converted somewhere in the last axis.
|
||||
start_index (int): The index of the first coordinate in the last axis of `tensor`.
|
||||
conversion (str, optional): The conversion direction. Can be 'minmax2centroids',
|
||||
'centroids2minmax', 'corners2centroids', 'centroids2corners', 'minmax2corners',
|
||||
or 'corners2minmax'.
|
||||
border_pixels (str, optional): How to treat the border pixels of the bounding boxes.
|
||||
Can be 'include', 'exclude', or 'half'. If 'include', the border pixels belong
|
||||
to the boxes. If 'exclude', the border pixels do not belong to the boxes.
|
||||
If 'half', then one of each of the two horizontal and vertical borders belong
|
||||
to the boxex, but not the other.
|
||||
|
||||
Returns:
|
||||
A Numpy nD array, a copy of the input tensor with the converted coordinates
|
||||
in place of the original coordinates and the unaltered elements of the original
|
||||
tensor elsewhere.
|
||||
'''
|
||||
if border_pixels == 'half':
|
||||
d = 0
|
||||
elif border_pixels == 'include':
|
||||
d = 1
|
||||
elif border_pixels == 'exclude':
|
||||
d = -1
|
||||
|
||||
ind = start_index
|
||||
tensor1 = np.copy(tensor).astype(np.float)
|
||||
if conversion == 'minmax2centroids':
|
||||
tensor1[..., ind] = (tensor[..., ind] + tensor[..., ind+1]) / 2.0 # Set cx
|
||||
tensor1[..., ind+1] = (tensor[..., ind+2] + tensor[..., ind+3]) / 2.0 # Set cy
|
||||
tensor1[..., ind+2] = tensor[..., ind+1] - tensor[..., ind] + d # Set w
|
||||
tensor1[..., ind+3] = tensor[..., ind+3] - tensor[..., ind+2] + d # Set h
|
||||
elif conversion == 'centroids2minmax':
|
||||
tensor1[..., ind] = tensor[..., ind] - tensor[..., ind+2] / 2.0 # Set xmin
|
||||
tensor1[..., ind+1] = tensor[..., ind] + tensor[..., ind+2] / 2.0 # Set xmax
|
||||
tensor1[..., ind+2] = tensor[..., ind+1] - tensor[..., ind+3] / 2.0 # Set ymin
|
||||
tensor1[..., ind+3] = tensor[..., ind+1] + tensor[..., ind+3] / 2.0 # Set ymax
|
||||
elif conversion == 'corners2centroids':
|
||||
tensor1[..., ind] = (tensor[..., ind] + tensor[..., ind+2]) / 2.0 # Set cx
|
||||
tensor1[..., ind+1] = (tensor[..., ind+1] + tensor[..., ind+3]) / 2.0 # Set cy
|
||||
tensor1[..., ind+2] = tensor[..., ind+2] - tensor[..., ind] + d # Set w
|
||||
tensor1[..., ind+3] = tensor[..., ind+3] - tensor[..., ind+1] + d # Set h
|
||||
elif conversion == 'centroids2corners':
|
||||
tensor1[..., ind] = tensor[..., ind] - tensor[..., ind+2] / 2.0 # Set xmin
|
||||
tensor1[..., ind+1] = tensor[..., ind+1] - tensor[..., ind+3] / 2.0 # Set ymin
|
||||
tensor1[..., ind+2] = tensor[..., ind] + tensor[..., ind+2] / 2.0 # Set xmax
|
||||
tensor1[..., ind+3] = tensor[..., ind+1] + tensor[..., ind+3] / 2.0 # Set ymax
|
||||
elif (conversion == 'minmax2corners') or (conversion == 'corners2minmax'):
|
||||
tensor1[..., ind+1] = tensor[..., ind+2]
|
||||
tensor1[..., ind+2] = tensor[..., ind+1]
|
||||
else:
|
||||
raise ValueError("Unexpected conversion value. Supported values are 'minmax2centroids', 'centroids2minmax', 'corners2centroids', 'centroids2corners', 'minmax2corners', and 'corners2minmax'.")
|
||||
|
||||
return tensor1
|
||||
|
||||
def convert_coordinates2(tensor, start_index, conversion):
|
||||
'''
|
||||
A matrix multiplication implementation of `convert_coordinates()`.
|
||||
Supports only conversion between the 'centroids' and 'minmax' formats.
|
||||
|
||||
This function is marginally slower on average than `convert_coordinates()`,
|
||||
probably because it involves more (unnecessary) arithmetic operations (unnecessary
|
||||
because the two matrices are sparse).
|
||||
|
||||
For details please refer to the documentation of `convert_coordinates()`.
|
||||
'''
|
||||
ind = start_index
|
||||
tensor1 = np.copy(tensor).astype(np.float)
|
||||
if conversion == 'minmax2centroids':
|
||||
M = np.array([[0.5, 0. , -1., 0.],
|
||||
[0.5, 0. , 1., 0.],
|
||||
[0. , 0.5, 0., -1.],
|
||||
[0. , 0.5, 0., 1.]])
|
||||
tensor1[..., ind:ind+4] = np.dot(tensor1[..., ind:ind+4], M)
|
||||
elif conversion == 'centroids2minmax':
|
||||
M = np.array([[ 1. , 1. , 0. , 0. ],
|
||||
[ 0. , 0. , 1. , 1. ],
|
||||
[-0.5, 0.5, 0. , 0. ],
|
||||
[ 0. , 0. , -0.5, 0.5]]) # The multiplicative inverse of the matrix above
|
||||
tensor1[..., ind:ind+4] = np.dot(tensor1[..., ind:ind+4], M)
|
||||
else:
|
||||
raise ValueError("Unexpected conversion value. Supported values are 'minmax2centroids' and 'centroids2minmax'.")
|
||||
|
||||
return tensor1
|
||||
|
||||
def intersection_area(boxes1, boxes2, coords='centroids', mode='outer_product', border_pixels='half'):
|
||||
'''
|
||||
Computes the intersection areas of two sets of axis-aligned 2D rectangular boxes.
|
||||
|
||||
Let `boxes1` and `boxes2` contain `m` and `n` boxes, respectively.
|
||||
|
||||
In 'outer_product' mode, returns an `(m,n)` matrix with the intersection areas for all possible
|
||||
combinations of the boxes in `boxes1` and `boxes2`.
|
||||
|
||||
In 'element-wise' mode, `m` and `n` must be broadcast-compatible. Refer to the explanation
|
||||
of the `mode` argument for details.
|
||||
|
||||
Arguments:
|
||||
boxes1 (array): Either a 1D Numpy array of shape `(4, )` containing the coordinates for one box in the
|
||||
format specified by `coords` or a 2D Numpy array of shape `(m, 4)` containing the coordinates for `m` boxes.
|
||||
If `mode` is set to 'element_wise', the shape must be broadcast-compatible with `boxes2`.
|
||||
boxes2 (array): Either a 1D Numpy array of shape `(4, )` containing the coordinates for one box in the
|
||||
format specified by `coords` or a 2D Numpy array of shape `(n, 4)` containing the coordinates for `n` boxes.
|
||||
If `mode` is set to 'element_wise', the shape must be broadcast-compatible with `boxes1`.
|
||||
coords (str, optional): The coordinate format in the input arrays. Can be either 'centroids' for the format
|
||||
`(cx, cy, w, h)`, 'minmax' for the format `(xmin, xmax, ymin, ymax)`, or 'corners' for the format
|
||||
`(xmin, ymin, xmax, ymax)`.
|
||||
mode (str, optional): Can be one of 'outer_product' and 'element-wise'. In 'outer_product' mode, returns an
|
||||
`(m,n)` matrix with the intersection areas for all possible combinations of the `m` boxes in `boxes1` with the
|
||||
`n` boxes in `boxes2`. In 'element-wise' mode, returns a 1D array and the shapes of `boxes1` and `boxes2`
|
||||
must be boadcast-compatible. If both `boxes1` and `boxes2` have `m` boxes, then this returns an array of
|
||||
length `m` where the i-th position contains the intersection area of `boxes1[i]` with `boxes2[i]`.
|
||||
border_pixels (str, optional): How to treat the border pixels of the bounding boxes.
|
||||
Can be 'include', 'exclude', or 'half'. If 'include', the border pixels belong
|
||||
to the boxes. If 'exclude', the border pixels do not belong to the boxes.
|
||||
If 'half', then one of each of the two horizontal and vertical borders belong
|
||||
to the boxex, but not the other.
|
||||
|
||||
Returns:
|
||||
A 1D or 2D Numpy array (refer to the `mode` argument for details) of dtype float containing values with
|
||||
the intersection areas of the boxes in `boxes1` and `boxes2`.
|
||||
'''
|
||||
|
||||
# Make sure the boxes have the right shapes.
|
||||
if boxes1.ndim > 2: raise ValueError("boxes1 must have rank either 1 or 2, but has rank {}.".format(boxes1.ndim))
|
||||
if boxes2.ndim > 2: raise ValueError("boxes2 must have rank either 1 or 2, but has rank {}.".format(boxes2.ndim))
|
||||
|
||||
if boxes1.ndim == 1: boxes1 = np.expand_dims(boxes1, axis=0)
|
||||
if boxes2.ndim == 1: boxes2 = np.expand_dims(boxes2, axis=0)
|
||||
|
||||
if not (boxes1.shape[1] == boxes2.shape[1] == 4): raise ValueError("All boxes must consist of 4 coordinates, but the boxes in `boxes1` and `boxes2` have {} and {} coordinates, respectively.".format(boxes1.shape[1], boxes2.shape[1]))
|
||||
if not mode in {'outer_product', 'element-wise'}: raise ValueError("`mode` must be one of 'outer_product' and 'element-wise', but got '{}'.",format(mode))
|
||||
|
||||
# Convert the coordinates if necessary.
|
||||
if coords == 'centroids':
|
||||
boxes1 = convert_coordinates(boxes1, start_index=0, conversion='centroids2corners')
|
||||
boxes2 = convert_coordinates(boxes2, start_index=0, conversion='centroids2corners')
|
||||
coords = 'corners'
|
||||
elif not (coords in {'minmax', 'corners'}):
|
||||
raise ValueError("Unexpected value for `coords`. Supported values are 'minmax', 'corners' and 'centroids'.")
|
||||
|
||||
m = boxes1.shape[0] # The number of boxes in `boxes1`
|
||||
n = boxes2.shape[0] # The number of boxes in `boxes2`
|
||||
|
||||
# Set the correct coordinate indices for the respective formats.
|
||||
if coords == 'corners':
|
||||
xmin = 0
|
||||
ymin = 1
|
||||
xmax = 2
|
||||
ymax = 3
|
||||
elif coords == 'minmax':
|
||||
xmin = 0
|
||||
xmax = 1
|
||||
ymin = 2
|
||||
ymax = 3
|
||||
|
||||
if border_pixels == 'half':
|
||||
d = 0
|
||||
elif border_pixels == 'include':
|
||||
d = 1 # If border pixels are supposed to belong to the bounding boxes, we have to add one pixel to any difference `xmax - xmin` or `ymax - ymin`.
|
||||
elif border_pixels == 'exclude':
|
||||
d = -1 # If border pixels are not supposed to belong to the bounding boxes, we have to subtract one pixel from any difference `xmax - xmin` or `ymax - ymin`.
|
||||
|
||||
# Compute the intersection areas.
|
||||
|
||||
if mode == 'outer_product':
|
||||
|
||||
# For all possible box combinations, get the greater xmin and ymin values.
|
||||
# This is a tensor of shape (m,n,2).
|
||||
min_xy = np.maximum(np.tile(np.expand_dims(boxes1[:,[xmin,ymin]], axis=1), reps=(1, n, 1)),
|
||||
np.tile(np.expand_dims(boxes2[:,[xmin,ymin]], axis=0), reps=(m, 1, 1)))
|
||||
|
||||
# For all possible box combinations, get the smaller xmax and ymax values.
|
||||
# This is a tensor of shape (m,n,2).
|
||||
max_xy = np.minimum(np.tile(np.expand_dims(boxes1[:,[xmax,ymax]], axis=1), reps=(1, n, 1)),
|
||||
np.tile(np.expand_dims(boxes2[:,[xmax,ymax]], axis=0), reps=(m, 1, 1)))
|
||||
|
||||
# Compute the side lengths of the intersection rectangles.
|
||||
side_lengths = np.maximum(0, max_xy - min_xy + d)
|
||||
|
||||
return side_lengths[:,:,0] * side_lengths[:,:,1]
|
||||
|
||||
elif mode == 'element-wise':
|
||||
|
||||
min_xy = np.maximum(boxes1[:,[xmin,ymin]], boxes2[:,[xmin,ymin]])
|
||||
max_xy = np.minimum(boxes1[:,[xmax,ymax]], boxes2[:,[xmax,ymax]])
|
||||
|
||||
# Compute the side lengths of the intersection rectangles.
|
||||
side_lengths = np.maximum(0, max_xy - min_xy + d)
|
||||
|
||||
return side_lengths[:,0] * side_lengths[:,1]
|
||||
|
||||
def intersection_area_(boxes1, boxes2, coords='corners', mode='outer_product', border_pixels='half'):
|
||||
'''
|
||||
The same as 'intersection_area()' but for internal use, i.e. without all the safety checks.
|
||||
'''
|
||||
|
||||
m = boxes1.shape[0] # The number of boxes in `boxes1`
|
||||
n = boxes2.shape[0] # The number of boxes in `boxes2`
|
||||
|
||||
# Set the correct coordinate indices for the respective formats.
|
||||
if coords == 'corners':
|
||||
xmin = 0
|
||||
ymin = 1
|
||||
xmax = 2
|
||||
ymax = 3
|
||||
elif coords == 'minmax':
|
||||
xmin = 0
|
||||
xmax = 1
|
||||
ymin = 2
|
||||
ymax = 3
|
||||
|
||||
if border_pixels == 'half':
|
||||
d = 0
|
||||
elif border_pixels == 'include':
|
||||
d = 1 # If border pixels are supposed to belong to the bounding boxes, we have to add one pixel to any difference `xmax - xmin` or `ymax - ymin`.
|
||||
elif border_pixels == 'exclude':
|
||||
d = -1 # If border pixels are not supposed to belong to the bounding boxes, we have to subtract one pixel from any difference `xmax - xmin` or `ymax - ymin`.
|
||||
|
||||
# Compute the intersection areas.
|
||||
|
||||
if mode == 'outer_product':
|
||||
|
||||
# For all possible box combinations, get the greater xmin and ymin values.
|
||||
# This is a tensor of shape (m,n,2).
|
||||
min_xy = np.maximum(np.tile(np.expand_dims(boxes1[:,[xmin,ymin]], axis=1), reps=(1, n, 1)),
|
||||
np.tile(np.expand_dims(boxes2[:,[xmin,ymin]], axis=0), reps=(m, 1, 1)))
|
||||
|
||||
# For all possible box combinations, get the smaller xmax and ymax values.
|
||||
# This is a tensor of shape (m,n,2).
|
||||
max_xy = np.minimum(np.tile(np.expand_dims(boxes1[:,[xmax,ymax]], axis=1), reps=(1, n, 1)),
|
||||
np.tile(np.expand_dims(boxes2[:,[xmax,ymax]], axis=0), reps=(m, 1, 1)))
|
||||
|
||||
# Compute the side lengths of the intersection rectangles.
|
||||
side_lengths = np.maximum(0, max_xy - min_xy + d)
|
||||
|
||||
return side_lengths[:,:,0] * side_lengths[:,:,1]
|
||||
|
||||
elif mode == 'element-wise':
|
||||
|
||||
min_xy = np.maximum(boxes1[:,[xmin,ymin]], boxes2[:,[xmin,ymin]])
|
||||
max_xy = np.minimum(boxes1[:,[xmax,ymax]], boxes2[:,[xmax,ymax]])
|
||||
|
||||
# Compute the side lengths of the intersection rectangles.
|
||||
side_lengths = np.maximum(0, max_xy - min_xy + d)
|
||||
|
||||
return side_lengths[:,0] * side_lengths[:,1]
|
||||
|
||||
|
||||
def iou(boxes1, boxes2, coords='centroids', mode='outer_product', border_pixels='half'):
|
||||
'''
|
||||
Computes the intersection-over-union similarity (also known as Jaccard similarity)
|
||||
of two sets of axis-aligned 2D rectangular boxes.
|
||||
|
||||
Let `boxes1` and `boxes2` contain `m` and `n` boxes, respectively.
|
||||
|
||||
In 'outer_product' mode, returns an `(m,n)` matrix with the IoUs for all possible
|
||||
combinations of the boxes in `boxes1` and `boxes2`.
|
||||
|
||||
In 'element-wise' mode, `m` and `n` must be broadcast-compatible. Refer to the explanation
|
||||
of the `mode` argument for details.
|
||||
|
||||
Arguments:
|
||||
boxes1 (array): Either a 1D Numpy array of shape `(4, )` containing the coordinates for one box in the
|
||||
format specified by `coords` or a 2D Numpy array of shape `(m, 4)` containing the coordinates for `m` boxes.
|
||||
If `mode` is set to 'element_wise', the shape must be broadcast-compatible with `boxes2`.
|
||||
boxes2 (array): Either a 1D Numpy array of shape `(4, )` containing the coordinates for one box in the
|
||||
format specified by `coords` or a 2D Numpy array of shape `(n, 4)` containing the coordinates for `n` boxes.
|
||||
If `mode` is set to 'element_wise', the shape must be broadcast-compatible with `boxes1`.
|
||||
coords (str, optional): The coordinate format in the input arrays. Can be either 'centroids' for the format
|
||||
`(cx, cy, w, h)`, 'minmax' for the format `(xmin, xmax, ymin, ymax)`, or 'corners' for the format
|
||||
`(xmin, ymin, xmax, ymax)`.
|
||||
mode (str, optional): Can be one of 'outer_product' and 'element-wise'. In 'outer_product' mode, returns an
|
||||
`(m,n)` matrix with the IoU overlaps for all possible combinations of the `m` boxes in `boxes1` with the
|
||||
`n` boxes in `boxes2`. In 'element-wise' mode, returns a 1D array and the shapes of `boxes1` and `boxes2`
|
||||
must be boadcast-compatible. If both `boxes1` and `boxes2` have `m` boxes, then this returns an array of
|
||||
length `m` where the i-th position contains the IoU overlap of `boxes1[i]` with `boxes2[i]`.
|
||||
border_pixels (str, optional): How to treat the border pixels of the bounding boxes.
|
||||
Can be 'include', 'exclude', or 'half'. If 'include', the border pixels belong
|
||||
to the boxes. If 'exclude', the border pixels do not belong to the boxes.
|
||||
If 'half', then one of each of the two horizontal and vertical borders belong
|
||||
to the boxex, but not the other.
|
||||
|
||||
Returns:
|
||||
A 1D or 2D Numpy array (refer to the `mode` argument for details) of dtype float containing values in [0,1],
|
||||
the Jaccard similarity of the boxes in `boxes1` and `boxes2`. 0 means there is no overlap between two given
|
||||
boxes, 1 means their coordinates are identical.
|
||||
'''
|
||||
|
||||
# Make sure the boxes have the right shapes.
|
||||
if boxes1.ndim > 2: raise ValueError("boxes1 must have rank either 1 or 2, but has rank {}.".format(boxes1.ndim))
|
||||
if boxes2.ndim > 2: raise ValueError("boxes2 must have rank either 1 or 2, but has rank {}.".format(boxes2.ndim))
|
||||
|
||||
if boxes1.ndim == 1: boxes1 = np.expand_dims(boxes1, axis=0)
|
||||
if boxes2.ndim == 1: boxes2 = np.expand_dims(boxes2, axis=0)
|
||||
|
||||
if not (boxes1.shape[1] == boxes2.shape[1] == 4): raise ValueError("All boxes must consist of 4 coordinates, but the boxes in `boxes1` and `boxes2` have {} and {} coordinates, respectively.".format(boxes1.shape[1], boxes2.shape[1]))
|
||||
if not mode in {'outer_product', 'element-wise'}: raise ValueError("`mode` must be one of 'outer_product' and 'element-wise', but got '{}'.".format(mode))
|
||||
|
||||
# Convert the coordinates if necessary.
|
||||
if coords == 'centroids':
|
||||
boxes1 = convert_coordinates(boxes1, start_index=0, conversion='centroids2corners')
|
||||
boxes2 = convert_coordinates(boxes2, start_index=0, conversion='centroids2corners')
|
||||
coords = 'corners'
|
||||
elif not (coords in {'minmax', 'corners'}):
|
||||
raise ValueError("Unexpected value for `coords`. Supported values are 'minmax', 'corners' and 'centroids'.")
|
||||
|
||||
# Compute the IoU.
|
||||
|
||||
# Compute the interesection areas.
|
||||
|
||||
intersection_areas = intersection_area_(boxes1, boxes2, coords=coords, mode=mode)
|
||||
|
||||
m = boxes1.shape[0] # The number of boxes in `boxes1`
|
||||
n = boxes2.shape[0] # The number of boxes in `boxes2`
|
||||
|
||||
# Compute the union areas.
|
||||
|
||||
# Set the correct coordinate indices for the respective formats.
|
||||
if coords == 'corners':
|
||||
xmin = 0
|
||||
ymin = 1
|
||||
xmax = 2
|
||||
ymax = 3
|
||||
elif coords == 'minmax':
|
||||
xmin = 0
|
||||
xmax = 1
|
||||
ymin = 2
|
||||
ymax = 3
|
||||
|
||||
if border_pixels == 'half':
|
||||
d = 0
|
||||
elif border_pixels == 'include':
|
||||
d = 1 # If border pixels are supposed to belong to the bounding boxes, we have to add one pixel to any difference `xmax - xmin` or `ymax - ymin`.
|
||||
elif border_pixels == 'exclude':
|
||||
d = -1 # If border pixels are not supposed to belong to the bounding boxes, we have to subtract one pixel from any difference `xmax - xmin` or `ymax - ymin`.
|
||||
|
||||
if mode == 'outer_product':
|
||||
|
||||
boxes1_areas = np.tile(np.expand_dims((boxes1[:,xmax] - boxes1[:,xmin] + d) * (boxes1[:,ymax] - boxes1[:,ymin] + d), axis=1), reps=(1,n))
|
||||
boxes2_areas = np.tile(np.expand_dims((boxes2[:,xmax] - boxes2[:,xmin] + d) * (boxes2[:,ymax] - boxes2[:,ymin] + d), axis=0), reps=(m,1))
|
||||
|
||||
elif mode == 'element-wise':
|
||||
|
||||
boxes1_areas = (boxes1[:,xmax] - boxes1[:,xmin] + d) * (boxes1[:,ymax] - boxes1[:,ymin] + d)
|
||||
boxes2_areas = (boxes2[:,xmax] - boxes2[:,xmin] + d) * (boxes2[:,ymax] - boxes2[:,ymin] + d)
|
||||
|
||||
union_areas = boxes1_areas + boxes2_areas - intersection_areas
|
||||
|
||||
return intersection_areas / union_areas
|
||||
BIN
ssd_keras-master/bounding_box_utils/bounding_box_utils.pyc
Normal file
33
ssd_keras-master/config.json
Normal file
@@ -0,0 +1,33 @@
|
||||
{
|
||||
"model" : {
|
||||
"backend": "ssd512",
|
||||
"input": 512,
|
||||
"labels": ["Gun" ,"Knife", "Razor", "Shuriken"]
|
||||
},
|
||||
|
||||
"train": {
|
||||
"train_image_folder": "/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/images",
|
||||
"train_annot_folder": "/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/anns",
|
||||
"train_image_set_filename": "/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/train.txt",
|
||||
|
||||
"train_times": 1,
|
||||
"batch_size": 16,
|
||||
"learning_rate": 1e-4,
|
||||
"nb_epochs": 50,
|
||||
"warmup_epochs": 3,
|
||||
"saved_weights_name": "experimento_3_ssd512.h5",
|
||||
"debug": false
|
||||
},
|
||||
|
||||
"valid": {
|
||||
"valid_image_folder": "/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/images",
|
||||
"valid_annot_folder": "/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/anns",
|
||||
"valid_image_set_filename": "/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Training/train.txt",
|
||||
"valid_times": 1
|
||||
},
|
||||
"test": {
|
||||
"test_image_folder": "/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Baggages/Testing/images",
|
||||
"test_annot_folder": "/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Baggages/Testing/anns",
|
||||
"test_image_set_filename": "/home/dlsaavedra/Desktop/Tesis/8.-Object_Detection/Experimento_3/Baggages/Testing/test.txt",
|
||||
}
|
||||
}
|
||||
33
ssd_keras-master/config_300.json
Normal file
@@ -0,0 +1,33 @@
|
||||
{
|
||||
"model" : {
|
||||
"backend": "ssd300",
|
||||
"input": 300,
|
||||
"labels": ["Gun" ,"Knife", "Razor", "Shuriken"]
|
||||
},
|
||||
|
||||
"train": {
|
||||
"train_image_folder": "../Experimento_5/Training/images/",
|
||||
"train_annot_folder": "../Experimento_5/Training/anns/",
|
||||
"train_image_set_filename": "../Experimento_5/Training/train_no_original.txt",
|
||||
|
||||
"train_times": 1,
|
||||
"batch_size": 8,
|
||||
"learning_rate": 1e-4,
|
||||
"nb_epochs": 100,
|
||||
"warmup_epochs": 3,
|
||||
"saved_weights_name": "../Experimento_5/Resultados_ssd/ssd300/experimento_5_ssd300.h5",
|
||||
"debug": false
|
||||
},
|
||||
|
||||
"valid": {
|
||||
"valid_image_folder": "../Experimento_5/Training/images/",
|
||||
"valid_annot_folder": "../Experimento_5/Training/anns/",
|
||||
"valid_image_set_filename": "../Experimento_5/Training/train_no_original.txt",
|
||||
"valid_times": 1
|
||||
},
|
||||
"test": {
|
||||
"test_image_folder": "../Experimento_3/Baggages/Testing_3/images/",
|
||||
"test_annot_folder": "../Experimento_3/Baggages/Testing_3/anns/",
|
||||
"test_image_set_filename": "../Experimento_3/Baggages/Testing_3/test.txt"
|
||||
}
|
||||
}
|
||||
33
ssd_keras-master/config_512.json
Normal file
@@ -0,0 +1,33 @@
|
||||
{
|
||||
"model" : {
|
||||
"backend": "ssd512",
|
||||
"input": 512,
|
||||
"labels": ["Gun" ,"Knife", "Razor", "Shuriken"]
|
||||
},
|
||||
|
||||
"train": {
|
||||
"train_image_folder": "../Experimento_3/Training/images/",
|
||||
"train_annot_folder": "../Experimento_3/Training/anns/",
|
||||
"train_image_set_filename": "../Experimento_3/Training/train_no_original.txt",
|
||||
|
||||
"train_times": 1,
|
||||
"batch_size": 1,
|
||||
"learning_rate": 1e-4,
|
||||
"nb_epochs": 100,
|
||||
"warmup_epochs": 3,
|
||||
"saved_weights_name": "../Experimento_3/Resultados_ssd/ssd512/experimento_3_ssd512.h5",
|
||||
"debug": false
|
||||
},
|
||||
|
||||
"valid": {
|
||||
"valid_image_folder": "../Experimento_3/Training/images/",
|
||||
"valid_annot_folder": "../Experimento_3/Training/anns/",
|
||||
"valid_image_set_filename": "../Experimento_3/Training/train_no_original.txt",
|
||||
"valid_times": 1
|
||||
},
|
||||
"test": {
|
||||
"test_image_folder": "../Experimento_3/Baggages/Testing_small/images/",
|
||||
"test_annot_folder": "../Experimento_3/Baggages/Testing_small/anns/",
|
||||
"test_image_set_filename": "../Experimento_3/Baggages/Testing_small/test.txt"
|
||||
}
|
||||
}
|
||||
33
ssd_keras-master/config_7.json
Normal file
@@ -0,0 +1,33 @@
|
||||
{
|
||||
"model" : {
|
||||
"backend": "ssd7",
|
||||
"input": 448,
|
||||
"labels": ["Gun" ,"Knife", "Razor", "Shuriken"]
|
||||
},
|
||||
|
||||
"train": {
|
||||
"train_image_folder": "../Experimento_3/Training/images/",
|
||||
"train_annot_folder": "../Experimento_3/Training/anns/",
|
||||
"train_image_set_filename": "../Experimento_3/Training/train.txt",
|
||||
|
||||
"train_times": 1,
|
||||
"batch_size": 8,
|
||||
"learning_rate": 1e-4,
|
||||
"nb_epochs": 100,
|
||||
"warmup_epochs": 3,
|
||||
"saved_weights_name": "../Experimento_3/Resultados_ssd/ssd7/experimento_3_ssd7.h5",
|
||||
"debug": false
|
||||
},
|
||||
|
||||
"valid": {
|
||||
"valid_image_folder": "../Experimento_3/Training/images/",
|
||||
"valid_annot_folder": "../Experimento_3/Training/anns/",
|
||||
"valid_image_set_filename": "../Experimento_3/Training/train.txt",
|
||||
"valid_times": 1
|
||||
},
|
||||
"test": {
|
||||
"test_image_folder": "../Experimento_3/Baggages/Testing_678/images/",
|
||||
"test_annot_folder": "../Experimento_3/Baggages/Testing_678/anns/",
|
||||
"test_image_set_filename": "../Experimento_3/Baggages/Testing_678/test.txt"
|
||||
}
|
||||
}
|
||||
0
ssd_keras-master/data_generator/__init__.py
Normal file
BIN
ssd_keras-master/data_generator/__init__.pyc
Normal file
@@ -0,0 +1,183 @@
|
||||
'''
|
||||
The data augmentation operations of the original SSD implementation.
|
||||
|
||||
Copyright (C) 2018 Pierluigi Ferrari
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
'''
|
||||
|
||||
from __future__ import division
|
||||
import numpy as np
|
||||
|
||||
from data_generator.object_detection_2d_photometric_ops import ConvertColor, ConvertDataType, ConvertTo3Channels, RandomBrightness, RandomContrast, RandomHue, RandomSaturation
|
||||
from data_generator.object_detection_2d_geometric_ops import RandomFlip, RandomTranslate, RandomScale
|
||||
from data_generator.object_detection_2d_image_boxes_validation_utils import BoundGenerator, BoxFilter, ImageValidator
|
||||
|
||||
class DataAugmentationConstantInputSize:
|
||||
'''
|
||||
Applies a chain of photometric and geometric image transformations. For documentation, please refer
|
||||
to the documentation of the individual transformations involved.
|
||||
|
||||
Important: This augmentation chain is suitable for constant-size images only.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
random_brightness=(-48, 48, 0.5),
|
||||
random_contrast=(0.5, 1.8, 0.5),
|
||||
random_saturation=(0.5, 1.8, 0.5),
|
||||
random_hue=(18, 0.5),
|
||||
random_flip=0.5,
|
||||
random_translate=((0.03,0.5), (0.03,0.5), 0.5),
|
||||
random_scale=(0.5, 2.0, 0.5),
|
||||
n_trials_max=3,
|
||||
clip_boxes=True,
|
||||
overlap_criterion='area',
|
||||
bounds_box_filter=(0.3, 1.0),
|
||||
bounds_validator=(0.5, 1.0),
|
||||
n_boxes_min=1,
|
||||
background=(0,0,0),
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
|
||||
if (random_scale[0] >= 1) or (random_scale[1] <= 1):
|
||||
raise ValueError("This sequence of transformations only makes sense if the minimum scaling factor is <1 and the maximum scaling factor is >1.")
|
||||
|
||||
self.n_trials_max = n_trials_max
|
||||
self.clip_boxes = clip_boxes
|
||||
self.overlap_criterion = overlap_criterion
|
||||
self.bounds_box_filter = bounds_box_filter
|
||||
self.bounds_validator = bounds_validator
|
||||
self.n_boxes_min = n_boxes_min
|
||||
self.background = background
|
||||
self.labels_format = labels_format
|
||||
|
||||
# Determines which boxes are kept in an image after the transformations have been applied.
|
||||
self.box_filter = BoxFilter(check_overlap=True,
|
||||
check_min_area=True,
|
||||
check_degenerate=True,
|
||||
overlap_criterion=self.overlap_criterion,
|
||||
overlap_bounds=self.bounds_box_filter,
|
||||
min_area=16,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
# Determines whether the result of the transformations is a valid training image.
|
||||
self.image_validator = ImageValidator(overlap_criterion=self.overlap_criterion,
|
||||
bounds=self.bounds_validator,
|
||||
n_boxes_min=self.n_boxes_min,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
# Utility distortions
|
||||
self.convert_RGB_to_HSV = ConvertColor(current='RGB', to='HSV')
|
||||
self.convert_HSV_to_RGB = ConvertColor(current='HSV', to='RGB')
|
||||
self.convert_to_float32 = ConvertDataType(to='float32')
|
||||
self.convert_to_uint8 = ConvertDataType(to='uint8')
|
||||
self.convert_to_3_channels = ConvertTo3Channels() # Make sure all images end up having 3 channels.
|
||||
|
||||
# Photometric transformations
|
||||
self.random_brightness = RandomBrightness(lower=random_brightness[0], upper=random_brightness[1], prob=random_brightness[2])
|
||||
self.random_contrast = RandomContrast(lower=random_contrast[0], upper=random_contrast[1], prob=random_contrast[2])
|
||||
self.random_saturation = RandomSaturation(lower=random_saturation[0], upper=random_saturation[1], prob=random_saturation[2])
|
||||
self.random_hue = RandomHue(max_delta=random_hue[0], prob=random_hue[1])
|
||||
|
||||
# Geometric transformations
|
||||
self.random_flip = RandomFlip(dim='horizontal', prob=random_flip, labels_format=self.labels_format)
|
||||
self.random_translate = RandomTranslate(dy_minmax=random_translate[0],
|
||||
dx_minmax=random_translate[1],
|
||||
prob=random_translate[2],
|
||||
clip_boxes=self.clip_boxes,
|
||||
box_filter=self.box_filter,
|
||||
image_validator=self.image_validator,
|
||||
n_trials_max=self.n_trials_max,
|
||||
background=self.background,
|
||||
labels_format=self.labels_format)
|
||||
self.random_zoom_in = RandomScale(min_factor=1.0,
|
||||
max_factor=random_scale[1],
|
||||
prob=random_scale[2],
|
||||
clip_boxes=self.clip_boxes,
|
||||
box_filter=self.box_filter,
|
||||
image_validator=self.image_validator,
|
||||
n_trials_max=self.n_trials_max,
|
||||
background=self.background,
|
||||
labels_format=self.labels_format)
|
||||
self.random_zoom_out = RandomScale(min_factor=random_scale[0],
|
||||
max_factor=1.0,
|
||||
prob=random_scale[2],
|
||||
clip_boxes=self.clip_boxes,
|
||||
box_filter=self.box_filter,
|
||||
image_validator=self.image_validator,
|
||||
n_trials_max=self.n_trials_max,
|
||||
background=self.background,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
# If we zoom in, do translation before scaling.
|
||||
self.sequence1 = [self.convert_to_3_channels,
|
||||
self.convert_to_float32,
|
||||
self.random_brightness,
|
||||
self.random_contrast,
|
||||
self.convert_to_uint8,
|
||||
self.convert_RGB_to_HSV,
|
||||
self.convert_to_float32,
|
||||
self.random_saturation,
|
||||
self.random_hue,
|
||||
self.convert_to_uint8,
|
||||
self.convert_HSV_to_RGB,
|
||||
self.random_translate,
|
||||
self.random_zoom_in,
|
||||
self.random_flip]
|
||||
|
||||
# If we zoom out, do scaling before translation.
|
||||
self.sequence2 = [self.convert_to_3_channels,
|
||||
self.convert_to_float32,
|
||||
self.random_brightness,
|
||||
self.convert_to_uint8,
|
||||
self.convert_RGB_to_HSV,
|
||||
self.convert_to_float32,
|
||||
self.random_saturation,
|
||||
self.random_hue,
|
||||
self.convert_to_uint8,
|
||||
self.convert_HSV_to_RGB,
|
||||
self.convert_to_float32,
|
||||
self.random_contrast,
|
||||
self.convert_to_uint8,
|
||||
self.random_zoom_out,
|
||||
self.random_translate,
|
||||
self.random_flip]
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
|
||||
self.random_translate.labels_format = self.labels_format
|
||||
self.random_zoom_in.labels_format = self.labels_format
|
||||
self.random_zoom_out.labels_format = self.labels_format
|
||||
self.random_flip.labels_format = self.labels_format
|
||||
|
||||
# Choose sequence 1 with probability 0.5.
|
||||
if np.random.choice(2):
|
||||
|
||||
if not (labels is None):
|
||||
for transform in self.sequence1:
|
||||
image, labels = transform(image, labels)
|
||||
return image, labels
|
||||
else:
|
||||
for transform in self.sequence1:
|
||||
image = transform(image)
|
||||
return image
|
||||
# Choose sequence 2 with probability 0.5.
|
||||
else:
|
||||
|
||||
if not (labels is None):
|
||||
for transform in self.sequence2:
|
||||
image, labels = transform(image, labels)
|
||||
return image, labels
|
||||
else:
|
||||
for transform in self.sequence2:
|
||||
image = transform(image)
|
||||
return image
|
||||
@@ -0,0 +1,280 @@
|
||||
'''
|
||||
The data augmentation operations of the original SSD implementation.
|
||||
|
||||
Copyright (C) 2018 Pierluigi Ferrari
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
'''
|
||||
|
||||
from __future__ import division
|
||||
import numpy as np
|
||||
import cv2
|
||||
import inspect
|
||||
|
||||
from data_generator.object_detection_2d_photometric_ops import ConvertColor, ConvertDataType, ConvertTo3Channels, RandomBrightness, RandomContrast, RandomHue, RandomSaturation, RandomChannelSwap
|
||||
from data_generator.object_detection_2d_patch_sampling_ops import PatchCoordinateGenerator, RandomPatch, RandomPatchInf
|
||||
from data_generator.object_detection_2d_geometric_ops import ResizeRandomInterp, RandomFlip
|
||||
from data_generator.object_detection_2d_image_boxes_validation_utils import BoundGenerator, BoxFilter, ImageValidator
|
||||
|
||||
class SSDRandomCrop:
|
||||
'''
|
||||
Performs the same random crops as defined by the `batch_sampler` instructions
|
||||
of the original Caffe implementation of SSD. A description of this random cropping
|
||||
strategy can also be found in the data augmentation section of the paper:
|
||||
https://arxiv.org/abs/1512.02325
|
||||
'''
|
||||
|
||||
def __init__(self, labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
'''
|
||||
Arguments:
|
||||
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
|
||||
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
|
||||
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
|
||||
'''
|
||||
|
||||
self.labels_format = labels_format
|
||||
|
||||
# This randomly samples one of the lower IoU bounds defined
|
||||
# by the `sample_space` every time it is called.
|
||||
self.bound_generator = BoundGenerator(sample_space=((None, None),
|
||||
(0.1, None),
|
||||
(0.3, None),
|
||||
(0.5, None),
|
||||
(0.7, None),
|
||||
(0.9, None)),
|
||||
weights=None)
|
||||
|
||||
# Produces coordinates for candidate patches such that the height
|
||||
# and width of the patches are between 0.3 and 1.0 of the height
|
||||
# and width of the respective image and the aspect ratio of the
|
||||
# patches is between 0.5 and 2.0.
|
||||
self.patch_coord_generator = PatchCoordinateGenerator(must_match='h_w',
|
||||
min_scale=0.3,
|
||||
max_scale=1.0,
|
||||
scale_uniformly=False,
|
||||
min_aspect_ratio = 0.5,
|
||||
max_aspect_ratio = 2.0)
|
||||
|
||||
# Filters out boxes whose center point does not lie within the
|
||||
# chosen patches.
|
||||
self.box_filter = BoxFilter(check_overlap=True,
|
||||
check_min_area=False,
|
||||
check_degenerate=False,
|
||||
overlap_criterion='center_point',
|
||||
labels_format=self.labels_format)
|
||||
|
||||
# Determines whether a given patch is considered a valid patch.
|
||||
# Defines a patch to be valid if at least one ground truth bounding box
|
||||
# (n_boxes_min == 1) has an IoU overlap with the patch that
|
||||
# meets the requirements defined by `bound_generator`.
|
||||
self.image_validator = ImageValidator(overlap_criterion='iou',
|
||||
n_boxes_min=1,
|
||||
labels_format=self.labels_format,
|
||||
border_pixels='half')
|
||||
|
||||
# Performs crops according to the parameters set in the objects above.
|
||||
# Runs until either a valid patch is found or the original input image
|
||||
# is returned unaltered. Runs a maximum of 50 trials to find a valid
|
||||
# patch for each new sampled IoU threshold. Every 50 trials, the original
|
||||
# image is returned as is with probability (1 - prob) = 0.143.
|
||||
self.random_crop = RandomPatchInf(patch_coord_generator=self.patch_coord_generator,
|
||||
box_filter=self.box_filter,
|
||||
image_validator=self.image_validator,
|
||||
bound_generator=self.bound_generator,
|
||||
n_trials_max=50,
|
||||
clip_boxes=True,
|
||||
prob=0.857,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
def __call__(self, image, labels=None, return_inverter=False):
|
||||
self.random_crop.labels_format = self.labels_format
|
||||
return self.random_crop(image, labels, return_inverter)
|
||||
|
||||
class SSDExpand:
|
||||
'''
|
||||
Performs the random image expansion as defined by the `train_transform_param` instructions
|
||||
of the original Caffe implementation of SSD. A description of this expansion strategy
|
||||
can also be found in section 3.6 ("Data Augmentation for Small Object Accuracy") of the paper:
|
||||
https://arxiv.org/abs/1512.02325
|
||||
'''
|
||||
|
||||
def __init__(self, background=(123, 117, 104), labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
'''
|
||||
Arguments:
|
||||
background (list/tuple, optional): A 3-tuple specifying the RGB color value of the
|
||||
background pixels of the translated images.
|
||||
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
|
||||
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
|
||||
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
|
||||
'''
|
||||
|
||||
self.labels_format = labels_format
|
||||
|
||||
# Generate coordinates for patches that are between 1.0 and 4.0 times
|
||||
# the size of the input image in both spatial dimensions.
|
||||
self.patch_coord_generator = PatchCoordinateGenerator(must_match='h_w',
|
||||
min_scale=1.0,
|
||||
max_scale=4.0,
|
||||
scale_uniformly=True)
|
||||
|
||||
# With probability 0.5, place the input image randomly on a canvas filled with
|
||||
# mean color values according to the parameters set above. With probability 0.5,
|
||||
# return the input image unaltered.
|
||||
self.expand = RandomPatch(patch_coord_generator=self.patch_coord_generator,
|
||||
box_filter=None,
|
||||
image_validator=None,
|
||||
n_trials_max=1,
|
||||
clip_boxes=False,
|
||||
prob=0.5,
|
||||
background=background,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
def __call__(self, image, labels=None, return_inverter=False):
|
||||
self.expand.labels_format = self.labels_format
|
||||
return self.expand(image, labels, return_inverter)
|
||||
|
||||
class SSDPhotometricDistortions:
|
||||
'''
|
||||
Performs the photometric distortions defined by the `train_transform_param` instructions
|
||||
of the original Caffe implementation of SSD.
|
||||
'''
|
||||
|
||||
def __init__(self):
|
||||
|
||||
self.convert_RGB_to_HSV = ConvertColor(current='RGB', to='HSV')
|
||||
self.convert_HSV_to_RGB = ConvertColor(current='HSV', to='RGB')
|
||||
self.convert_to_float32 = ConvertDataType(to='float32')
|
||||
self.convert_to_uint8 = ConvertDataType(to='uint8')
|
||||
self.convert_to_3_channels = ConvertTo3Channels()
|
||||
self.random_brightness = RandomBrightness(lower=-32, upper=32, prob=0.5)
|
||||
self.random_contrast = RandomContrast(lower=0.5, upper=1.5, prob=0.5)
|
||||
self.random_saturation = RandomSaturation(lower=0.5, upper=1.5, prob=0.5)
|
||||
self.random_hue = RandomHue(max_delta=18, prob=0.5)
|
||||
self.random_channel_swap = RandomChannelSwap(prob=0.0)
|
||||
|
||||
self.sequence1 = [self.convert_to_3_channels,
|
||||
self.convert_to_float32,
|
||||
self.random_brightness,
|
||||
self.random_contrast,
|
||||
self.convert_to_uint8,
|
||||
self.convert_RGB_to_HSV,
|
||||
self.convert_to_float32,
|
||||
self.random_saturation,
|
||||
self.random_hue,
|
||||
self.convert_to_uint8,
|
||||
self.convert_HSV_to_RGB,
|
||||
self.random_channel_swap]
|
||||
|
||||
self.sequence2 = [self.convert_to_3_channels,
|
||||
self.convert_to_float32,
|
||||
self.random_brightness,
|
||||
self.convert_to_uint8,
|
||||
self.convert_RGB_to_HSV,
|
||||
self.convert_to_float32,
|
||||
self.random_saturation,
|
||||
self.random_hue,
|
||||
self.convert_to_uint8,
|
||||
self.convert_HSV_to_RGB,
|
||||
self.convert_to_float32,
|
||||
self.random_contrast,
|
||||
self.convert_to_uint8,
|
||||
self.random_channel_swap]
|
||||
|
||||
def __call__(self, image, labels):
|
||||
|
||||
# Choose sequence 1 with probability 0.5.
|
||||
if np.random.choice(2):
|
||||
|
||||
for transform in self.sequence1:
|
||||
image, labels = transform(image, labels)
|
||||
return image, labels
|
||||
# Choose sequence 2 with probability 0.5.
|
||||
else:
|
||||
|
||||
for transform in self.sequence2:
|
||||
image, labels = transform(image, labels)
|
||||
return image, labels
|
||||
|
||||
class SSDDataAugmentation:
|
||||
'''
|
||||
Reproduces the data augmentation pipeline used in the training of the original
|
||||
Caffe implementation of SSD.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
img_height=300,
|
||||
img_width=300,
|
||||
background=(123, 117, 104),
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
'''
|
||||
Arguments:
|
||||
height (int): The desired height of the output images in pixels.
|
||||
width (int): The desired width of the output images in pixels.
|
||||
background (list/tuple, optional): A 3-tuple specifying the RGB color value of the
|
||||
background pixels of the translated images.
|
||||
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
|
||||
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
|
||||
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
|
||||
'''
|
||||
|
||||
self.labels_format = labels_format
|
||||
|
||||
self.photometric_distortions = SSDPhotometricDistortions()
|
||||
self.expand = SSDExpand(background=background, labels_format=self.labels_format)
|
||||
self.random_crop = SSDRandomCrop(labels_format=self.labels_format)
|
||||
self.random_flip = RandomFlip(dim='horizontal', prob=0.5, labels_format=self.labels_format)
|
||||
|
||||
# This box filter makes sure that the resized images don't contain any degenerate boxes.
|
||||
# Resizing the images could lead the boxes to becomes smaller. For boxes that are already
|
||||
# pretty small, that might result in boxes with height and/or width zero, which we obviously
|
||||
# cannot allow.
|
||||
self.box_filter = BoxFilter(check_overlap=False,
|
||||
check_min_area=False,
|
||||
check_degenerate=True,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
self.resize = ResizeRandomInterp(height=img_height,
|
||||
width=img_width,
|
||||
interpolation_modes=[cv2.INTER_NEAREST,
|
||||
cv2.INTER_LINEAR,
|
||||
cv2.INTER_CUBIC,
|
||||
cv2.INTER_AREA,
|
||||
cv2.INTER_LANCZOS4],
|
||||
box_filter=self.box_filter,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
self.sequence = [self.photometric_distortions,
|
||||
self.expand,
|
||||
self.random_crop,
|
||||
self.random_flip,
|
||||
self.resize]
|
||||
|
||||
def __call__(self, image, labels, return_inverter=False):
|
||||
self.expand.labels_format = self.labels_format
|
||||
self.random_crop.labels_format = self.labels_format
|
||||
self.random_flip.labels_format = self.labels_format
|
||||
self.resize.labels_format = self.labels_format
|
||||
|
||||
inverters = []
|
||||
|
||||
for transform in self.sequence:
|
||||
if return_inverter and ('return_inverter' in inspect.signature(transform).parameters):
|
||||
image, labels, inverter = transform(image, labels, return_inverter=True)
|
||||
inverters.append(inverter)
|
||||
else:
|
||||
image, labels = transform(image, labels)
|
||||
|
||||
if return_inverter:
|
||||
return image, labels, inverters[::-1]
|
||||
else:
|
||||
return image, labels
|
||||
@@ -0,0 +1,157 @@
|
||||
'''
|
||||
A data augmentation pipeline for datasets in bird's eye view, i.e. where there is
|
||||
no "up" or "down" in the images.
|
||||
|
||||
Copyright (C) 2018 Pierluigi Ferrari
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
'''
|
||||
|
||||
from __future__ import division
|
||||
import numpy as np
|
||||
|
||||
from data_generator.object_detection_2d_photometric_ops import ConvertColor, ConvertDataType, ConvertTo3Channels, RandomBrightness, RandomContrast, RandomHue, RandomSaturation
|
||||
from data_generator.object_detection_2d_geometric_ops import Resize, RandomFlip, RandomRotate
|
||||
from data_generator.object_detection_2d_patch_sampling_ops import PatchCoordinateGenerator, RandomPatch
|
||||
from data_generator.object_detection_2d_image_boxes_validation_utils import BoxFilter, ImageValidator
|
||||
|
||||
class DataAugmentationSatellite:
|
||||
'''
|
||||
A data augmentation pipeline for datasets in bird's eye view, i.e. where there is
|
||||
no "up" or "down" in the images.
|
||||
|
||||
Applies a chain of photometric and geometric image transformations. For documentation, please refer
|
||||
to the documentation of the individual transformations involved.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
resize_height,
|
||||
resize_width,
|
||||
random_brightness=(-48, 48, 0.5),
|
||||
random_contrast=(0.5, 1.8, 0.5),
|
||||
random_saturation=(0.5, 1.8, 0.5),
|
||||
random_hue=(18, 0.5),
|
||||
random_flip=0.5,
|
||||
random_rotate=([90, 180, 270], 0.5),
|
||||
min_scale=0.3,
|
||||
max_scale=2.0,
|
||||
min_aspect_ratio = 0.8,
|
||||
max_aspect_ratio = 1.25,
|
||||
n_trials_max=3,
|
||||
clip_boxes=True,
|
||||
overlap_criterion='area',
|
||||
bounds_box_filter=(0.3, 1.0),
|
||||
bounds_validator=(0.5, 1.0),
|
||||
n_boxes_min=1,
|
||||
background=(0,0,0),
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
|
||||
self.n_trials_max = n_trials_max
|
||||
self.clip_boxes = clip_boxes
|
||||
self.overlap_criterion = overlap_criterion
|
||||
self.bounds_box_filter = bounds_box_filter
|
||||
self.bounds_validator = bounds_validator
|
||||
self.n_boxes_min = n_boxes_min
|
||||
self.background = background
|
||||
self.labels_format = labels_format
|
||||
|
||||
# Determines which boxes are kept in an image after the transformations have been applied.
|
||||
self.box_filter_patch = BoxFilter(check_overlap=True,
|
||||
check_min_area=False,
|
||||
check_degenerate=False,
|
||||
overlap_criterion=self.overlap_criterion,
|
||||
overlap_bounds=self.bounds_box_filter,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
self.box_filter_resize = BoxFilter(check_overlap=False,
|
||||
check_min_area=True,
|
||||
check_degenerate=True,
|
||||
min_area=16,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
# Determines whether the result of the transformations is a valid training image.
|
||||
self.image_validator = ImageValidator(overlap_criterion=self.overlap_criterion,
|
||||
bounds=self.bounds_validator,
|
||||
n_boxes_min=self.n_boxes_min,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
# Utility transformations
|
||||
self.convert_to_3_channels = ConvertTo3Channels() # Make sure all images end up having 3 channels.
|
||||
self.convert_RGB_to_HSV = ConvertColor(current='RGB', to='HSV')
|
||||
self.convert_HSV_to_RGB = ConvertColor(current='HSV', to='RGB')
|
||||
self.convert_to_float32 = ConvertDataType(to='float32')
|
||||
self.convert_to_uint8 = ConvertDataType(to='uint8')
|
||||
self.resize = Resize(height=resize_height,
|
||||
width=resize_width,
|
||||
box_filter=self.box_filter_resize,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
# Photometric transformations
|
||||
self.random_brightness = RandomBrightness(lower=random_brightness[0], upper=random_brightness[1], prob=random_brightness[2])
|
||||
self.random_contrast = RandomContrast(lower=random_contrast[0], upper=random_contrast[1], prob=random_contrast[2])
|
||||
self.random_saturation = RandomSaturation(lower=random_saturation[0], upper=random_saturation[1], prob=random_saturation[2])
|
||||
self.random_hue = RandomHue(max_delta=random_hue[0], prob=random_hue[1])
|
||||
|
||||
# Geometric transformations
|
||||
self.random_horizontal_flip = RandomFlip(dim='horizontal', prob=random_flip, labels_format=self.labels_format)
|
||||
self.random_vertical_flip = RandomFlip(dim='vertical', prob=random_flip, labels_format=self.labels_format)
|
||||
self.random_rotate = RandomRotate(angles=random_rotate[0], prob=random_rotate[1], labels_format=self.labels_format)
|
||||
self.patch_coord_generator = PatchCoordinateGenerator(must_match='w_ar',
|
||||
min_scale=min_scale,
|
||||
max_scale=max_scale,
|
||||
scale_uniformly=False,
|
||||
min_aspect_ratio = min_aspect_ratio,
|
||||
max_aspect_ratio = max_aspect_ratio)
|
||||
self.random_patch = RandomPatch(patch_coord_generator=self.patch_coord_generator,
|
||||
box_filter=self.box_filter_patch,
|
||||
image_validator=self.image_validator,
|
||||
n_trials_max=self.n_trials_max,
|
||||
clip_boxes=self.clip_boxes,
|
||||
prob=1.0,
|
||||
can_fail=False,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
# Define the processing chain.
|
||||
self.transformations = [self.convert_to_3_channels,
|
||||
self.convert_to_float32,
|
||||
self.random_brightness,
|
||||
self.random_contrast,
|
||||
self.convert_to_uint8,
|
||||
self.convert_RGB_to_HSV,
|
||||
self.convert_to_float32,
|
||||
self.random_saturation,
|
||||
self.random_hue,
|
||||
self.convert_to_uint8,
|
||||
self.convert_HSV_to_RGB,
|
||||
self.random_horizontal_flip,
|
||||
self.random_vertical_flip,
|
||||
self.random_rotate,
|
||||
self.random_patch,
|
||||
self.resize]
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
|
||||
self.random_patch.labels_format = self.labels_format
|
||||
self.random_horizontal_flip.labels_format = self.labels_format
|
||||
self.random_vertical_flip.labels_format = self.labels_format
|
||||
self.random_rotate.labels_format = self.labels_format
|
||||
self.resize.labels_format = self.labels_format
|
||||
|
||||
if not (labels is None):
|
||||
for transform in self.transformations:
|
||||
image, labels = transform(image, labels)
|
||||
return image, labels
|
||||
else:
|
||||
for transform in self.sequence1:
|
||||
image = transform(image)
|
||||
return image
|
||||
@@ -0,0 +1,152 @@
|
||||
'''
|
||||
A data augmentation pipeline suitable for variable-size images that produces effects
|
||||
that are similar (but not identical) to those of the original SSD data augmentation
|
||||
pipeline while being faster.
|
||||
|
||||
Copyright (C) 2018 Pierluigi Ferrari
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
'''
|
||||
|
||||
from __future__ import division
|
||||
import numpy as np
|
||||
|
||||
from data_generator.object_detection_2d_photometric_ops import ConvertColor, ConvertDataType, ConvertTo3Channels, RandomBrightness, RandomContrast, RandomHue, RandomSaturation
|
||||
from data_generator.object_detection_2d_geometric_ops import Resize, RandomFlip
|
||||
from data_generator.object_detection_2d_patch_sampling_ops import PatchCoordinateGenerator, RandomPatch
|
||||
from data_generator.object_detection_2d_image_boxes_validation_utils import BoxFilter, ImageValidator
|
||||
|
||||
class DataAugmentationVariableInputSize:
|
||||
'''
|
||||
A data augmentation pipeline suitable for variable-size images that produces effects
|
||||
that are similar (but not identical!) to those of the original SSD data augmentation
|
||||
pipeline while being faster.
|
||||
|
||||
Applies a chain of photometric and geometric image transformations. For documentation, please refer
|
||||
to the documentation of the individual transformations involved.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
resize_height,
|
||||
resize_width,
|
||||
random_brightness=(-48, 48, 0.5),
|
||||
random_contrast=(0.5, 1.8, 0.5),
|
||||
random_saturation=(0.5, 1.8, 0.5),
|
||||
random_hue=(18, 0.5),
|
||||
random_flip=0.5,
|
||||
min_scale=0.3,
|
||||
max_scale=2.0,
|
||||
min_aspect_ratio = 0.5,
|
||||
max_aspect_ratio = 2.0,
|
||||
n_trials_max=3,
|
||||
clip_boxes=True,
|
||||
overlap_criterion='area',
|
||||
bounds_box_filter=(0.3, 1.0),
|
||||
bounds_validator=(0.5, 1.0),
|
||||
n_boxes_min=1,
|
||||
background=(0,0,0),
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
|
||||
self.n_trials_max = n_trials_max
|
||||
self.clip_boxes = clip_boxes
|
||||
self.overlap_criterion = overlap_criterion
|
||||
self.bounds_box_filter = bounds_box_filter
|
||||
self.bounds_validator = bounds_validator
|
||||
self.n_boxes_min = n_boxes_min
|
||||
self.background = background
|
||||
self.labels_format = labels_format
|
||||
|
||||
# Determines which boxes are kept in an image after the transformations have been applied.
|
||||
self.box_filter_patch = BoxFilter(check_overlap=True,
|
||||
check_min_area=False,
|
||||
check_degenerate=False,
|
||||
overlap_criterion=self.overlap_criterion,
|
||||
overlap_bounds=self.bounds_box_filter,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
self.box_filter_resize = BoxFilter(check_overlap=False,
|
||||
check_min_area=True,
|
||||
check_degenerate=True,
|
||||
min_area=16,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
# Determines whether the result of the transformations is a valid training image.
|
||||
self.image_validator = ImageValidator(overlap_criterion=self.overlap_criterion,
|
||||
bounds=self.bounds_validator,
|
||||
n_boxes_min=self.n_boxes_min,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
# Utility transformations
|
||||
self.convert_to_3_channels = ConvertTo3Channels() # Make sure all images end up having 3 channels.
|
||||
self.convert_RGB_to_HSV = ConvertColor(current='RGB', to='HSV')
|
||||
self.convert_HSV_to_RGB = ConvertColor(current='HSV', to='RGB')
|
||||
self.convert_to_float32 = ConvertDataType(to='float32')
|
||||
self.convert_to_uint8 = ConvertDataType(to='uint8')
|
||||
self.resize = Resize(height=resize_height,
|
||||
width=resize_width,
|
||||
box_filter=self.box_filter_resize,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
# Photometric transformations
|
||||
self.random_brightness = RandomBrightness(lower=random_brightness[0], upper=random_brightness[1], prob=random_brightness[2])
|
||||
self.random_contrast = RandomContrast(lower=random_contrast[0], upper=random_contrast[1], prob=random_contrast[2])
|
||||
self.random_saturation = RandomSaturation(lower=random_saturation[0], upper=random_saturation[1], prob=random_saturation[2])
|
||||
self.random_hue = RandomHue(max_delta=random_hue[0], prob=random_hue[1])
|
||||
|
||||
# Geometric transformations
|
||||
self.random_flip = RandomFlip(dim='horizontal', prob=random_flip, labels_format=self.labels_format)
|
||||
self.patch_coord_generator = PatchCoordinateGenerator(must_match='w_ar',
|
||||
min_scale=min_scale,
|
||||
max_scale=max_scale,
|
||||
scale_uniformly=False,
|
||||
min_aspect_ratio = min_aspect_ratio,
|
||||
max_aspect_ratio = max_aspect_ratio)
|
||||
self.random_patch = RandomPatch(patch_coord_generator=self.patch_coord_generator,
|
||||
box_filter=self.box_filter_patch,
|
||||
image_validator=self.image_validator,
|
||||
n_trials_max=self.n_trials_max,
|
||||
clip_boxes=self.clip_boxes,
|
||||
prob=1.0,
|
||||
can_fail=False,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
# Define the processing chain
|
||||
self.transformations = [self.convert_to_3_channels,
|
||||
self.convert_to_float32,
|
||||
self.random_brightness,
|
||||
self.random_contrast,
|
||||
self.convert_to_uint8,
|
||||
self.convert_RGB_to_HSV,
|
||||
self.convert_to_float32,
|
||||
self.random_saturation,
|
||||
self.random_hue,
|
||||
self.convert_to_uint8,
|
||||
self.convert_HSV_to_RGB,
|
||||
self.random_patch,
|
||||
self.random_flip,
|
||||
self.resize]
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
|
||||
self.random_patch.labels_format = self.labels_format
|
||||
self.random_flip.labels_format = self.labels_format
|
||||
self.resize.labels_format = self.labels_format
|
||||
|
||||
if not (labels is None):
|
||||
for transform in self.transformations:
|
||||
image, labels = transform(image, labels)
|
||||
return image, labels
|
||||
else:
|
||||
for transform in self.sequence1:
|
||||
image = transform(image)
|
||||
return image
|
||||
@@ -0,0 +1,779 @@
|
||||
'''
|
||||
Various geometric image transformations for 2D object detection, both deterministic
|
||||
and probabilistic.
|
||||
|
||||
Copyright (C) 2018 Pierluigi Ferrari
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
'''
|
||||
|
||||
from __future__ import division
|
||||
import numpy as np
|
||||
import cv2
|
||||
import random
|
||||
|
||||
from data_generator.object_detection_2d_image_boxes_validation_utils import BoxFilter, ImageValidator
|
||||
|
||||
class Resize:
|
||||
'''
|
||||
Resizes images to a specified height and width in pixels.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
height,
|
||||
width,
|
||||
interpolation_mode=cv2.INTER_LINEAR,
|
||||
box_filter=None,
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
'''
|
||||
Arguments:
|
||||
height (int): The desired height of the output images in pixels.
|
||||
width (int): The desired width of the output images in pixels.
|
||||
interpolation_mode (int, optional): An integer that denotes a valid
|
||||
OpenCV interpolation mode. For example, integers 0 through 5 are
|
||||
valid interpolation modes.
|
||||
box_filter (BoxFilter, optional): Only relevant if ground truth bounding boxes are given.
|
||||
A `BoxFilter` object to filter out bounding boxes that don't meet the given criteria
|
||||
after the transformation. Refer to the `BoxFilter` documentation for details. If `None`,
|
||||
the validity of the bounding boxes is not checked.
|
||||
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
|
||||
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
|
||||
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
|
||||
'''
|
||||
if not (isinstance(box_filter, BoxFilter) or box_filter is None):
|
||||
raise ValueError("`box_filter` must be either `None` or a `BoxFilter` object.")
|
||||
self.out_height = height
|
||||
self.out_width = width
|
||||
self.interpolation_mode = interpolation_mode
|
||||
self.box_filter = box_filter
|
||||
self.labels_format = labels_format
|
||||
|
||||
def __call__(self, image, labels=None, return_inverter=False):
|
||||
|
||||
img_height, img_width = image.shape[:2]
|
||||
|
||||
xmin = self.labels_format['xmin']
|
||||
ymin = self.labels_format['ymin']
|
||||
xmax = self.labels_format['xmax']
|
||||
ymax = self.labels_format['ymax']
|
||||
|
||||
image = cv2.resize(image,
|
||||
dsize=(self.out_width, self.out_height),
|
||||
interpolation=self.interpolation_mode)
|
||||
|
||||
if return_inverter:
|
||||
def inverter(labels):
|
||||
labels = np.copy(labels)
|
||||
labels[:, [ymin+1, ymax+1]] = np.round(labels[:, [ymin+1, ymax+1]] * (img_height / self.out_height), decimals=0)
|
||||
labels[:, [xmin+1, xmax+1]] = np.round(labels[:, [xmin+1, xmax+1]] * (img_width / self.out_width), decimals=0)
|
||||
return labels
|
||||
|
||||
if labels is None:
|
||||
if return_inverter:
|
||||
return image, inverter
|
||||
else:
|
||||
return image
|
||||
else:
|
||||
labels = np.copy(labels)
|
||||
labels[:, [ymin, ymax]] = np.round(labels[:, [ymin, ymax]] * (self.out_height / img_height), decimals=0)
|
||||
labels[:, [xmin, xmax]] = np.round(labels[:, [xmin, xmax]] * (self.out_width / img_width), decimals=0)
|
||||
|
||||
if not (self.box_filter is None):
|
||||
self.box_filter.labels_format = self.labels_format
|
||||
labels = self.box_filter(labels=labels,
|
||||
image_height=self.out_height,
|
||||
image_width=self.out_width)
|
||||
|
||||
if return_inverter:
|
||||
return image, labels, inverter
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class ResizeRandomInterp:
|
||||
'''
|
||||
Resizes images to a specified height and width in pixels using a radnomly
|
||||
selected interpolation mode.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
height,
|
||||
width,
|
||||
interpolation_modes=[cv2.INTER_NEAREST,
|
||||
cv2.INTER_LINEAR,
|
||||
cv2.INTER_CUBIC,
|
||||
cv2.INTER_AREA,
|
||||
cv2.INTER_LANCZOS4],
|
||||
box_filter=None,
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
'''
|
||||
Arguments:
|
||||
height (int): The desired height of the output image in pixels.
|
||||
width (int): The desired width of the output image in pixels.
|
||||
interpolation_modes (list/tuple, optional): A list/tuple of integers
|
||||
that represent valid OpenCV interpolation modes. For example,
|
||||
integers 0 through 5 are valid interpolation modes.
|
||||
box_filter (BoxFilter, optional): Only relevant if ground truth bounding boxes are given.
|
||||
A `BoxFilter` object to filter out bounding boxes that don't meet the given criteria
|
||||
after the transformation. Refer to the `BoxFilter` documentation for details. If `None`,
|
||||
the validity of the bounding boxes is not checked.
|
||||
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
|
||||
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
|
||||
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
|
||||
'''
|
||||
if not (isinstance(interpolation_modes, (list, tuple))):
|
||||
raise ValueError("`interpolation_mode` must be a list or tuple.")
|
||||
self.height = height
|
||||
self.width = width
|
||||
self.interpolation_modes = interpolation_modes
|
||||
self.box_filter = box_filter
|
||||
self.labels_format = labels_format
|
||||
self.resize = Resize(height=self.height,
|
||||
width=self.width,
|
||||
box_filter=self.box_filter,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
def __call__(self, image, labels=None, return_inverter=False):
|
||||
self.resize.interpolation_mode = np.random.choice(self.interpolation_modes)
|
||||
self.resize.labels_format = self.labels_format
|
||||
return self.resize(image, labels, return_inverter)
|
||||
|
||||
class Flip:
|
||||
'''
|
||||
Flips images horizontally or vertically.
|
||||
'''
|
||||
def __init__(self,
|
||||
dim='horizontal',
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
'''
|
||||
Arguments:
|
||||
dim (str, optional): Can be either of 'horizontal' and 'vertical'.
|
||||
If 'horizontal', images will be flipped horizontally, i.e. along
|
||||
the vertical axis. If 'horizontal', images will be flipped vertically,
|
||||
i.e. along the horizontal axis.
|
||||
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
|
||||
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
|
||||
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
|
||||
'''
|
||||
if not (dim in {'horizontal', 'vertical'}): raise ValueError("`dim` can be one of 'horizontal' and 'vertical'.")
|
||||
self.dim = dim
|
||||
self.labels_format = labels_format
|
||||
|
||||
def __call__(self, image, labels=None, return_inverter=False):
|
||||
|
||||
img_height, img_width = image.shape[:2]
|
||||
|
||||
xmin = self.labels_format['xmin']
|
||||
ymin = self.labels_format['ymin']
|
||||
xmax = self.labels_format['xmax']
|
||||
ymax = self.labels_format['ymax']
|
||||
|
||||
if self.dim == 'horizontal':
|
||||
image = image[:,::-1]
|
||||
if labels is None:
|
||||
return image
|
||||
else:
|
||||
labels = np.copy(labels)
|
||||
labels[:, [xmin, xmax]] = img_width - labels[:, [xmax, xmin]]
|
||||
return image, labels
|
||||
else:
|
||||
image = image[::-1]
|
||||
if labels is None:
|
||||
return image
|
||||
else:
|
||||
labels = np.copy(labels)
|
||||
labels[:, [ymin, ymax]] = img_height - labels[:, [ymax, ymin]]
|
||||
return image, labels
|
||||
|
||||
class RandomFlip:
|
||||
'''
|
||||
Randomly flips images horizontally or vertically. The randomness only refers
|
||||
to whether or not the image will be flipped.
|
||||
'''
|
||||
def __init__(self,
|
||||
dim='horizontal',
|
||||
prob=0.5,
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
'''
|
||||
Arguments:
|
||||
dim (str, optional): Can be either of 'horizontal' and 'vertical'.
|
||||
If 'horizontal', images will be flipped horizontally, i.e. along
|
||||
the vertical axis. If 'horizontal', images will be flipped vertically,
|
||||
i.e. along the horizontal axis.
|
||||
prob (float, optional): `(1 - prob)` determines the probability with which the original,
|
||||
unaltered image is returned.
|
||||
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
|
||||
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
|
||||
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
|
||||
'''
|
||||
self.dim = dim
|
||||
self.prob = prob
|
||||
self.labels_format = labels_format
|
||||
self.flip = Flip(dim=self.dim, labels_format=self.labels_format)
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
p = np.random.uniform(0,1)
|
||||
if p >= (1.0-self.prob):
|
||||
self.flip.labels_format = self.labels_format
|
||||
return self.flip(image, labels)
|
||||
elif labels is None:
|
||||
return image
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class Translate:
|
||||
'''
|
||||
Translates images horizontally and/or vertically.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
dy,
|
||||
dx,
|
||||
clip_boxes=True,
|
||||
box_filter=None,
|
||||
background=(0,0,0),
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
'''
|
||||
Arguments:
|
||||
dy (float): The fraction of the image height by which to translate images along the
|
||||
vertical axis. Positive values translate images downwards, negative values
|
||||
translate images upwards.
|
||||
dx (float): The fraction of the image width by which to translate images along the
|
||||
horizontal axis. Positive values translate images to the right, negative values
|
||||
translate images to the left.
|
||||
clip_boxes (bool, optional): Only relevant if ground truth bounding boxes are given.
|
||||
If `True`, any ground truth bounding boxes will be clipped to lie entirely within the
|
||||
image after the translation.
|
||||
box_filter (BoxFilter, optional): Only relevant if ground truth bounding boxes are given.
|
||||
A `BoxFilter` object to filter out bounding boxes that don't meet the given criteria
|
||||
after the transformation. Refer to the `BoxFilter` documentation for details. If `None`,
|
||||
the validity of the bounding boxes is not checked.
|
||||
background (list/tuple, optional): A 3-tuple specifying the RGB color value of the
|
||||
background pixels of the translated images.
|
||||
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
|
||||
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
|
||||
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
|
||||
'''
|
||||
|
||||
if not (isinstance(box_filter, BoxFilter) or box_filter is None):
|
||||
raise ValueError("`box_filter` must be either `None` or a `BoxFilter` object.")
|
||||
self.dy_rel = dy
|
||||
self.dx_rel = dx
|
||||
self.clip_boxes = clip_boxes
|
||||
self.box_filter = box_filter
|
||||
self.background = background
|
||||
self.labels_format = labels_format
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
|
||||
img_height, img_width = image.shape[:2]
|
||||
|
||||
# Compute the translation matrix.
|
||||
dy_abs = int(round(img_height * self.dy_rel))
|
||||
dx_abs = int(round(img_width * self.dx_rel))
|
||||
M = np.float32([[1, 0, dx_abs],
|
||||
[0, 1, dy_abs]])
|
||||
|
||||
# Translate the image.
|
||||
image = cv2.warpAffine(image,
|
||||
M=M,
|
||||
dsize=(img_width, img_height),
|
||||
borderMode=cv2.BORDER_CONSTANT,
|
||||
borderValue=self.background)
|
||||
|
||||
if labels is None:
|
||||
return image
|
||||
else:
|
||||
xmin = self.labels_format['xmin']
|
||||
ymin = self.labels_format['ymin']
|
||||
xmax = self.labels_format['xmax']
|
||||
ymax = self.labels_format['ymax']
|
||||
|
||||
labels = np.copy(labels)
|
||||
# Translate the box coordinates to the translated image's coordinate system.
|
||||
labels[:,[xmin,xmax]] += dx_abs
|
||||
labels[:,[ymin,ymax]] += dy_abs
|
||||
|
||||
# Compute all valid boxes for this patch.
|
||||
if not (self.box_filter is None):
|
||||
self.box_filter.labels_format = self.labels_format
|
||||
labels = self.box_filter(labels=labels,
|
||||
image_height=img_height,
|
||||
image_width=img_width)
|
||||
|
||||
if self.clip_boxes:
|
||||
labels[:,[ymin,ymax]] = np.clip(labels[:,[ymin,ymax]], a_min=0, a_max=img_height-1)
|
||||
labels[:,[xmin,xmax]] = np.clip(labels[:,[xmin,xmax]], a_min=0, a_max=img_width-1)
|
||||
|
||||
return image, labels
|
||||
|
||||
class RandomTranslate:
|
||||
'''
|
||||
Randomly translates images horizontally and/or vertically.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
dy_minmax=(0.03,0.3),
|
||||
dx_minmax=(0.03,0.3),
|
||||
prob=0.5,
|
||||
clip_boxes=True,
|
||||
box_filter=None,
|
||||
image_validator=None,
|
||||
n_trials_max=3,
|
||||
background=(0,0,0),
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
'''
|
||||
Arguments:
|
||||
dy_minmax (list/tuple, optional): A 2-tuple `(min, max)` of non-negative floats that
|
||||
determines the minimum and maximum relative translation of images along the vertical
|
||||
axis both upward and downward. That is, images will be randomly translated by at least
|
||||
`min` and at most `max` either upward or downward. For example, if `dy_minmax == (0.05,0.3)`,
|
||||
an image of size `(100,100)` will be translated by at least 5 and at most 30 pixels
|
||||
either upward or downward. The translation direction is chosen randomly.
|
||||
dx_minmax (list/tuple, optional): A 2-tuple `(min, max)` of non-negative floats that
|
||||
determines the minimum and maximum relative translation of images along the horizontal
|
||||
axis both to the left and right. That is, images will be randomly translated by at least
|
||||
`min` and at most `max` either left or right. For example, if `dx_minmax == (0.05,0.3)`,
|
||||
an image of size `(100,100)` will be translated by at least 5 and at most 30 pixels
|
||||
either left or right. The translation direction is chosen randomly.
|
||||
prob (float, optional): `(1 - prob)` determines the probability with which the original,
|
||||
unaltered image is returned.
|
||||
clip_boxes (bool, optional): Only relevant if ground truth bounding boxes are given.
|
||||
If `True`, any ground truth bounding boxes will be clipped to lie entirely within the
|
||||
image after the translation.
|
||||
box_filter (BoxFilter, optional): Only relevant if ground truth bounding boxes are given.
|
||||
A `BoxFilter` object to filter out bounding boxes that don't meet the given criteria
|
||||
after the transformation. Refer to the `BoxFilter` documentation for details. If `None`,
|
||||
the validity of the bounding boxes is not checked.
|
||||
image_validator (ImageValidator, optional): Only relevant if ground truth bounding boxes are given.
|
||||
An `ImageValidator` object to determine whether a translated image is valid. If `None`,
|
||||
any outcome is valid.
|
||||
n_trials_max (int, optional): Only relevant if ground truth bounding boxes are given.
|
||||
Determines the maxmial number of trials to produce a valid image. If no valid image could
|
||||
be produced in `n_trials_max` trials, returns the unaltered input image.
|
||||
background (list/tuple, optional): A 3-tuple specifying the RGB color value of the
|
||||
background pixels of the translated images.
|
||||
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
|
||||
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
|
||||
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
|
||||
'''
|
||||
if dy_minmax[0] > dy_minmax[1]:
|
||||
raise ValueError("It must be `dy_minmax[0] <= dy_minmax[1]`.")
|
||||
if dx_minmax[0] > dx_minmax[1]:
|
||||
raise ValueError("It must be `dx_minmax[0] <= dx_minmax[1]`.")
|
||||
if dy_minmax[0] < 0 or dx_minmax[0] < 0:
|
||||
raise ValueError("It must be `dy_minmax[0] >= 0` and `dx_minmax[0] >= 0`.")
|
||||
if not (isinstance(image_validator, ImageValidator) or image_validator is None):
|
||||
raise ValueError("`image_validator` must be either `None` or an `ImageValidator` object.")
|
||||
self.dy_minmax = dy_minmax
|
||||
self.dx_minmax = dx_minmax
|
||||
self.prob = prob
|
||||
self.clip_boxes = clip_boxes
|
||||
self.box_filter = box_filter
|
||||
self.image_validator = image_validator
|
||||
self.n_trials_max = n_trials_max
|
||||
self.background = background
|
||||
self.labels_format = labels_format
|
||||
self.translate = Translate(dy=0,
|
||||
dx=0,
|
||||
clip_boxes=self.clip_boxes,
|
||||
box_filter=self.box_filter,
|
||||
background=self.background,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
|
||||
p = np.random.uniform(0,1)
|
||||
if p >= (1.0-self.prob):
|
||||
|
||||
img_height, img_width = image.shape[:2]
|
||||
|
||||
xmin = self.labels_format['xmin']
|
||||
ymin = self.labels_format['ymin']
|
||||
xmax = self.labels_format['xmax']
|
||||
ymax = self.labels_format['ymax']
|
||||
|
||||
# Override the preset labels format.
|
||||
if not self.image_validator is None:
|
||||
self.image_validator.labels_format = self.labels_format
|
||||
self.translate.labels_format = self.labels_format
|
||||
|
||||
for _ in range(max(1, self.n_trials_max)):
|
||||
|
||||
# Pick the relative amount by which to translate.
|
||||
dy_abs = np.random.uniform(self.dy_minmax[0], self.dy_minmax[1])
|
||||
dx_abs = np.random.uniform(self.dx_minmax[0], self.dx_minmax[1])
|
||||
# Pick the direction in which to translate.
|
||||
dy = np.random.choice([-dy_abs, dy_abs])
|
||||
dx = np.random.choice([-dx_abs, dx_abs])
|
||||
self.translate.dy_rel = dy
|
||||
self.translate.dx_rel = dx
|
||||
|
||||
if (labels is None) or (self.image_validator is None):
|
||||
# We either don't have any boxes or if we do, we will accept any outcome as valid.
|
||||
return self.translate(image, labels)
|
||||
else:
|
||||
# Translate the box coordinates to the translated image's coordinate system.
|
||||
new_labels = np.copy(labels)
|
||||
new_labels[:, [ymin, ymax]] += int(round(img_height * dy))
|
||||
new_labels[:, [xmin, xmax]] += int(round(img_width * dx))
|
||||
|
||||
# Check if the patch is valid.
|
||||
if self.image_validator(labels=new_labels,
|
||||
image_height=img_height,
|
||||
image_width=img_width):
|
||||
return self.translate(image, labels)
|
||||
|
||||
# If all attempts failed, return the unaltered input image.
|
||||
if labels is None:
|
||||
return image
|
||||
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
elif labels is None:
|
||||
return image
|
||||
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class Scale:
|
||||
'''
|
||||
Scales images, i.e. zooms in or out.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
factor,
|
||||
clip_boxes=True,
|
||||
box_filter=None,
|
||||
background=(0,0,0),
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
'''
|
||||
Arguments:
|
||||
factor (float): The fraction of the image size by which to scale images. Must be positive.
|
||||
clip_boxes (bool, optional): Only relevant if ground truth bounding boxes are given.
|
||||
If `True`, any ground truth bounding boxes will be clipped to lie entirely within the
|
||||
image after the translation.
|
||||
box_filter (BoxFilter, optional): Only relevant if ground truth bounding boxes are given.
|
||||
A `BoxFilter` object to filter out bounding boxes that don't meet the given criteria
|
||||
after the transformation. Refer to the `BoxFilter` documentation for details. If `None`,
|
||||
the validity of the bounding boxes is not checked.
|
||||
background (list/tuple, optional): A 3-tuple specifying the RGB color value of the potential
|
||||
background pixels of the scaled images.
|
||||
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
|
||||
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
|
||||
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
|
||||
'''
|
||||
|
||||
if factor <= 0:
|
||||
raise ValueError("It must be `factor > 0`.")
|
||||
if not (isinstance(box_filter, BoxFilter) or box_filter is None):
|
||||
raise ValueError("`box_filter` must be either `None` or a `BoxFilter` object.")
|
||||
self.factor = factor
|
||||
self.clip_boxes = clip_boxes
|
||||
self.box_filter = box_filter
|
||||
self.background = background
|
||||
self.labels_format = labels_format
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
|
||||
img_height, img_width = image.shape[:2]
|
||||
|
||||
# Compute the rotation matrix.
|
||||
M = cv2.getRotationMatrix2D(center=(img_width / 2, img_height / 2),
|
||||
angle=0,
|
||||
scale=self.factor)
|
||||
|
||||
# Scale the image.
|
||||
image = cv2.warpAffine(image,
|
||||
M=M,
|
||||
dsize=(img_width, img_height),
|
||||
borderMode=cv2.BORDER_CONSTANT,
|
||||
borderValue=self.background)
|
||||
|
||||
if labels is None:
|
||||
return image
|
||||
else:
|
||||
xmin = self.labels_format['xmin']
|
||||
ymin = self.labels_format['ymin']
|
||||
xmax = self.labels_format['xmax']
|
||||
ymax = self.labels_format['ymax']
|
||||
|
||||
labels = np.copy(labels)
|
||||
# Scale the bounding boxes accordingly.
|
||||
# Transform two opposite corner points of the rectangular boxes using the rotation matrix `M`.
|
||||
toplefts = np.array([labels[:,xmin], labels[:,ymin], np.ones(labels.shape[0])])
|
||||
bottomrights = np.array([labels[:,xmax], labels[:,ymax], np.ones(labels.shape[0])])
|
||||
new_toplefts = (np.dot(M, toplefts)).T
|
||||
new_bottomrights = (np.dot(M, bottomrights)).T
|
||||
labels[:,[xmin,ymin]] = np.round(new_toplefts, decimals=0).astype(np.int)
|
||||
labels[:,[xmax,ymax]] = np.round(new_bottomrights, decimals=0).astype(np.int)
|
||||
|
||||
# Compute all valid boxes for this patch.
|
||||
if not (self.box_filter is None):
|
||||
self.box_filter.labels_format = self.labels_format
|
||||
labels = self.box_filter(labels=labels,
|
||||
image_height=img_height,
|
||||
image_width=img_width)
|
||||
|
||||
if self.clip_boxes:
|
||||
labels[:,[ymin,ymax]] = np.clip(labels[:,[ymin,ymax]], a_min=0, a_max=img_height-1)
|
||||
labels[:,[xmin,xmax]] = np.clip(labels[:,[xmin,xmax]], a_min=0, a_max=img_width-1)
|
||||
|
||||
return image, labels
|
||||
|
||||
class RandomScale:
|
||||
'''
|
||||
Randomly scales images.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
min_factor=0.5,
|
||||
max_factor=1.5,
|
||||
prob=0.5,
|
||||
clip_boxes=True,
|
||||
box_filter=None,
|
||||
image_validator=None,
|
||||
n_trials_max=3,
|
||||
background=(0,0,0),
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
'''
|
||||
Arguments:
|
||||
min_factor (float, optional): The minimum fraction of the image size by which to scale images.
|
||||
Must be positive.
|
||||
max_factor (float, optional): The maximum fraction of the image size by which to scale images.
|
||||
Must be positive.
|
||||
prob (float, optional): `(1 - prob)` determines the probability with which the original,
|
||||
unaltered image is returned.
|
||||
clip_boxes (bool, optional): Only relevant if ground truth bounding boxes are given.
|
||||
If `True`, any ground truth bounding boxes will be clipped to lie entirely within the
|
||||
image after the translation.
|
||||
box_filter (BoxFilter, optional): Only relevant if ground truth bounding boxes are given.
|
||||
A `BoxFilter` object to filter out bounding boxes that don't meet the given criteria
|
||||
after the transformation. Refer to the `BoxFilter` documentation for details. If `None`,
|
||||
the validity of the bounding boxes is not checked.
|
||||
image_validator (ImageValidator, optional): Only relevant if ground truth bounding boxes are given.
|
||||
An `ImageValidator` object to determine whether a scaled image is valid. If `None`,
|
||||
any outcome is valid.
|
||||
n_trials_max (int, optional): Only relevant if ground truth bounding boxes are given.
|
||||
Determines the maxmial number of trials to produce a valid image. If no valid image could
|
||||
be produced in `n_trials_max` trials, returns the unaltered input image.
|
||||
background (list/tuple, optional): A 3-tuple specifying the RGB color value of the potential
|
||||
background pixels of the scaled images.
|
||||
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
|
||||
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
|
||||
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
|
||||
'''
|
||||
|
||||
if not (0 < min_factor <= max_factor):
|
||||
raise ValueError("It must be `0 < min_factor <= max_factor`.")
|
||||
if not (isinstance(image_validator, ImageValidator) or image_validator is None):
|
||||
raise ValueError("`image_validator` must be either `None` or an `ImageValidator` object.")
|
||||
self.min_factor = min_factor
|
||||
self.max_factor = max_factor
|
||||
self.prob = prob
|
||||
self.clip_boxes = clip_boxes
|
||||
self.box_filter = box_filter
|
||||
self.image_validator = image_validator
|
||||
self.n_trials_max = n_trials_max
|
||||
self.background = background
|
||||
self.labels_format = labels_format
|
||||
self.scale = Scale(factor=1.0,
|
||||
clip_boxes=self.clip_boxes,
|
||||
box_filter=self.box_filter,
|
||||
background=self.background,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
|
||||
p = np.random.uniform(0,1)
|
||||
if p >= (1.0-self.prob):
|
||||
|
||||
img_height, img_width = image.shape[:2]
|
||||
|
||||
xmin = self.labels_format['xmin']
|
||||
ymin = self.labels_format['ymin']
|
||||
xmax = self.labels_format['xmax']
|
||||
ymax = self.labels_format['ymax']
|
||||
|
||||
# Override the preset labels format.
|
||||
if not self.image_validator is None:
|
||||
self.image_validator.labels_format = self.labels_format
|
||||
self.scale.labels_format = self.labels_format
|
||||
|
||||
for _ in range(max(1, self.n_trials_max)):
|
||||
|
||||
# Pick a scaling factor.
|
||||
factor = np.random.uniform(self.min_factor, self.max_factor)
|
||||
self.scale.factor = factor
|
||||
|
||||
if (labels is None) or (self.image_validator is None):
|
||||
# We either don't have any boxes or if we do, we will accept any outcome as valid.
|
||||
return self.scale(image, labels)
|
||||
else:
|
||||
# Scale the bounding boxes accordingly.
|
||||
# Transform two opposite corner points of the rectangular boxes using the rotation matrix `M`.
|
||||
toplefts = np.array([labels[:,xmin], labels[:,ymin], np.ones(labels.shape[0])])
|
||||
bottomrights = np.array([labels[:,xmax], labels[:,ymax], np.ones(labels.shape[0])])
|
||||
|
||||
# Compute the rotation matrix.
|
||||
M = cv2.getRotationMatrix2D(center=(img_width / 2, img_height / 2),
|
||||
angle=0,
|
||||
scale=factor)
|
||||
|
||||
new_toplefts = (np.dot(M, toplefts)).T
|
||||
new_bottomrights = (np.dot(M, bottomrights)).T
|
||||
|
||||
new_labels = np.copy(labels)
|
||||
new_labels[:,[xmin,ymin]] = np.around(new_toplefts, decimals=0).astype(np.int)
|
||||
new_labels[:,[xmax,ymax]] = np.around(new_bottomrights, decimals=0).astype(np.int)
|
||||
|
||||
# Check if the patch is valid.
|
||||
if self.image_validator(labels=new_labels,
|
||||
image_height=img_height,
|
||||
image_width=img_width):
|
||||
return self.scale(image, labels)
|
||||
|
||||
# If all attempts failed, return the unaltered input image.
|
||||
if labels is None:
|
||||
return image
|
||||
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
elif labels is None:
|
||||
return image
|
||||
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class Rotate:
|
||||
'''
|
||||
Rotates images counter-clockwise by 90, 180, or 270 degrees.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
angle,
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
'''
|
||||
Arguments:
|
||||
angle (int): The angle in degrees by which to rotate the images counter-clockwise.
|
||||
Only 90, 180, and 270 are valid values.
|
||||
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
|
||||
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
|
||||
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
|
||||
'''
|
||||
|
||||
if not angle in {90, 180, 270}:
|
||||
raise ValueError("`angle` must be in the set {90, 180, 270}.")
|
||||
self.angle = angle
|
||||
self.labels_format = labels_format
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
|
||||
img_height, img_width = image.shape[:2]
|
||||
|
||||
# Compute the rotation matrix.
|
||||
M = cv2.getRotationMatrix2D(center=(img_width / 2, img_height / 2),
|
||||
angle=self.angle,
|
||||
scale=1)
|
||||
|
||||
# Get the sine and cosine from the rotation matrix.
|
||||
cos_angle = np.abs(M[0, 0])
|
||||
sin_angle = np.abs(M[0, 1])
|
||||
|
||||
# Compute the new bounding dimensions of the image.
|
||||
img_width_new = int(img_height * sin_angle + img_width * cos_angle)
|
||||
img_height_new = int(img_height * cos_angle + img_width * sin_angle)
|
||||
|
||||
# Adjust the rotation matrix to take into account the translation.
|
||||
M[1, 2] += (img_height_new - img_height) / 2
|
||||
M[0, 2] += (img_width_new - img_width) / 2
|
||||
|
||||
# Rotate the image.
|
||||
image = cv2.warpAffine(image,
|
||||
M=M,
|
||||
dsize=(img_width_new, img_height_new))
|
||||
|
||||
if labels is None:
|
||||
return image
|
||||
else:
|
||||
xmin = self.labels_format['xmin']
|
||||
ymin = self.labels_format['ymin']
|
||||
xmax = self.labels_format['xmax']
|
||||
ymax = self.labels_format['ymax']
|
||||
|
||||
labels = np.copy(labels)
|
||||
# Rotate the bounding boxes accordingly.
|
||||
# Transform two opposite corner points of the rectangular boxes using the rotation matrix `M`.
|
||||
toplefts = np.array([labels[:,xmin], labels[:,ymin], np.ones(labels.shape[0])])
|
||||
bottomrights = np.array([labels[:,xmax], labels[:,ymax], np.ones(labels.shape[0])])
|
||||
new_toplefts = (np.dot(M, toplefts)).T
|
||||
new_bottomrights = (np.dot(M, bottomrights)).T
|
||||
labels[:,[xmin,ymin]] = np.round(new_toplefts, decimals=0).astype(np.int)
|
||||
labels[:,[xmax,ymax]] = np.round(new_bottomrights, decimals=0).astype(np.int)
|
||||
|
||||
if self.angle == 90:
|
||||
# ymin and ymax were switched by the rotation.
|
||||
labels[:,[ymax,ymin]] = labels[:,[ymin,ymax]]
|
||||
elif self.angle == 180:
|
||||
# ymin and ymax were switched by the rotation,
|
||||
# and also xmin and xmax were switched.
|
||||
labels[:,[ymax,ymin]] = labels[:,[ymin,ymax]]
|
||||
labels[:,[xmax,xmin]] = labels[:,[xmin,xmax]]
|
||||
elif self.angle == 270:
|
||||
# xmin and xmax were switched by the rotation.
|
||||
labels[:,[xmax,xmin]] = labels[:,[xmin,xmax]]
|
||||
|
||||
return image, labels
|
||||
|
||||
class RandomRotate:
|
||||
'''
|
||||
Randomly rotates images counter-clockwise.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
angles=[90, 180, 270],
|
||||
prob=0.5,
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
'''
|
||||
Arguments:
|
||||
angle (list): The list of angles in degrees from which one is randomly selected to rotate
|
||||
the images counter-clockwise. Only 90, 180, and 270 are valid values.
|
||||
prob (float, optional): `(1 - prob)` determines the probability with which the original,
|
||||
unaltered image is returned.
|
||||
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
|
||||
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
|
||||
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
|
||||
'''
|
||||
for angle in angles:
|
||||
if not angle in {90, 180, 270}:
|
||||
raise ValueError("`angles` can only contain the values 90, 180, and 270.")
|
||||
self.angles = angles
|
||||
self.prob = prob
|
||||
self.labels_format = labels_format
|
||||
self.rotate = Rotate(angle=90, labels_format=self.labels_format)
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
|
||||
p = np.random.uniform(0,1)
|
||||
if p >= (1.0-self.prob):
|
||||
# Pick a rotation angle.
|
||||
self.rotate.angle = random.choice(self.angles)
|
||||
self.rotate.labels_format = self.labels_format
|
||||
return self.rotate(image, labels)
|
||||
|
||||
elif labels is None:
|
||||
return image
|
||||
|
||||
else:
|
||||
return image, labels
|
||||
@@ -0,0 +1,322 @@
|
||||
'''
|
||||
Utilities for 2D object detection related to answering the following questions:
|
||||
1. Given an image size and bounding boxes, which bounding boxes meet certain
|
||||
requirements with respect to the image size?
|
||||
2. Given an image size and bounding boxes, is an image of that size valid with
|
||||
respect to the bounding boxes according to certain requirements?
|
||||
|
||||
Copyright (C) 2018 Pierluigi Ferrari
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
'''
|
||||
|
||||
from __future__ import division
|
||||
import numpy as np
|
||||
|
||||
from bounding_box_utils.bounding_box_utils import iou
|
||||
|
||||
class BoundGenerator:
|
||||
'''
|
||||
Generates pairs of floating point values that represent lower and upper bounds
|
||||
from a given sample space.
|
||||
'''
|
||||
def __init__(self,
|
||||
sample_space=((0.1, None),
|
||||
(0.3, None),
|
||||
(0.5, None),
|
||||
(0.7, None),
|
||||
(0.9, None),
|
||||
(None, None)),
|
||||
weights=None):
|
||||
'''
|
||||
Arguments:
|
||||
sample_space (list or tuple): A list, tuple, or array-like object of shape
|
||||
`(n, 2)` that contains `n` samples to choose from, where each sample
|
||||
is a 2-tuple of scalars and/or `None` values.
|
||||
weights (list or tuple, optional): A list or tuple representing the distribution
|
||||
over the sample space. If `None`, a uniform distribution will be assumed.
|
||||
'''
|
||||
|
||||
if (not (weights is None)) and len(weights) != len(sample_space):
|
||||
raise ValueError("`weights` must either be `None` for uniform distribution or have the same length as `sample_space`.")
|
||||
|
||||
self.sample_space = []
|
||||
for bound_pair in sample_space:
|
||||
if len(bound_pair) != 2:
|
||||
raise ValueError("All elements of the sample space must be 2-tuples.")
|
||||
bound_pair = list(bound_pair)
|
||||
if bound_pair[0] is None: bound_pair[0] = 0.0
|
||||
if bound_pair[1] is None: bound_pair[1] = 1.0
|
||||
if bound_pair[0] > bound_pair[1]:
|
||||
raise ValueError("For all sample space elements, the lower bound cannot be greater than the upper bound.")
|
||||
self.sample_space.append(bound_pair)
|
||||
|
||||
self.sample_space_size = len(self.sample_space)
|
||||
|
||||
if weights is None:
|
||||
self.weights = [1.0/self.sample_space_size] * self.sample_space_size
|
||||
else:
|
||||
self.weights = weights
|
||||
|
||||
def __call__(self):
|
||||
'''
|
||||
Returns:
|
||||
An item of the sample space, i.e. a 2-tuple of scalars.
|
||||
'''
|
||||
i = np.random.choice(self.sample_space_size, p=self.weights)
|
||||
return self.sample_space[i]
|
||||
|
||||
class BoxFilter:
|
||||
'''
|
||||
Returns all bounding boxes that are valid with respect to a the defined criteria.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
check_overlap=True,
|
||||
check_min_area=True,
|
||||
check_degenerate=True,
|
||||
overlap_criterion='center_point',
|
||||
overlap_bounds=(0.3, 1.0),
|
||||
min_area=16,
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4},
|
||||
border_pixels='half'):
|
||||
'''
|
||||
Arguments:
|
||||
check_overlap (bool, optional): Whether or not to enforce the overlap requirements defined by
|
||||
`overlap_criterion` and `overlap_bounds`. Sometimes you might want to use the box filter only
|
||||
to enforce a certain minimum area for all boxes (see next argument), in such cases you can
|
||||
turn the overlap requirements off.
|
||||
check_min_area (bool, optional): Whether or not to enforce the minimum area requirement defined
|
||||
by `min_area`. If `True`, any boxes that have an area (in pixels) that is smaller than `min_area`
|
||||
will be removed from the labels of an image. Bounding boxes below a certain area aren't useful
|
||||
training examples. An object that takes up only, say, 5 pixels in an image is probably not
|
||||
recognizable anymore, neither for a human, nor for an object detection model. It makes sense
|
||||
to remove such boxes.
|
||||
check_degenerate (bool, optional): Whether or not to check for and remove degenerate bounding boxes.
|
||||
Degenerate bounding boxes are boxes that have `xmax <= xmin` and/or `ymax <= ymin`. In particular,
|
||||
boxes with a width and/or height of zero are degenerate. It is obviously important to filter out
|
||||
such boxes, so you should only set this option to `False` if you are certain that degenerate
|
||||
boxes are not possible in your data and processing chain.
|
||||
overlap_criterion (str, optional): Can be either of 'center_point', 'iou', or 'area'. Determines
|
||||
which boxes are considered valid with respect to a given image. If set to 'center_point',
|
||||
a given bounding box is considered valid if its center point lies within the image.
|
||||
If set to 'area', a given bounding box is considered valid if the quotient of its intersection
|
||||
area with the image and its own area is within the given `overlap_bounds`. If set to 'iou', a given
|
||||
bounding box is considered valid if its IoU with the image is within the given `overlap_bounds`.
|
||||
overlap_bounds (list or BoundGenerator, optional): Only relevant if `overlap_criterion` is 'area' or 'iou'.
|
||||
Determines the lower and upper bounds for `overlap_criterion`. Can be either a 2-tuple of scalars
|
||||
representing a lower bound and an upper bound, or a `BoundGenerator` object, which provides
|
||||
the possibility to generate bounds randomly.
|
||||
min_area (int, optional): Only relevant if `check_min_area` is `True`. Defines the minimum area in
|
||||
pixels that a bounding box must have in order to be valid. Boxes with an area smaller than this
|
||||
will be removed.
|
||||
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
|
||||
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
|
||||
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
|
||||
border_pixels (str, optional): How to treat the border pixels of the bounding boxes.
|
||||
Can be 'include', 'exclude', or 'half'. If 'include', the border pixels belong
|
||||
to the boxes. If 'exclude', the border pixels do not belong to the boxes.
|
||||
If 'half', then one of each of the two horizontal and vertical borders belong
|
||||
to the boxex, but not the other.
|
||||
'''
|
||||
if not isinstance(overlap_bounds, (list, tuple, BoundGenerator)):
|
||||
raise ValueError("`overlap_bounds` must be either a 2-tuple of scalars or a `BoundGenerator` object.")
|
||||
if isinstance(overlap_bounds, (list, tuple)) and (overlap_bounds[0] > overlap_bounds[1]):
|
||||
raise ValueError("The lower bound must not be greater than the upper bound.")
|
||||
if not (overlap_criterion in {'iou', 'area', 'center_point'}):
|
||||
raise ValueError("`overlap_criterion` must be one of 'iou', 'area', or 'center_point'.")
|
||||
self.overlap_criterion = overlap_criterion
|
||||
self.overlap_bounds = overlap_bounds
|
||||
self.min_area = min_area
|
||||
self.check_overlap = check_overlap
|
||||
self.check_min_area = check_min_area
|
||||
self.check_degenerate = check_degenerate
|
||||
self.labels_format = labels_format
|
||||
self.border_pixels = border_pixels
|
||||
|
||||
def __call__(self,
|
||||
labels,
|
||||
image_height=None,
|
||||
image_width=None):
|
||||
'''
|
||||
Arguments:
|
||||
labels (array): The labels to be filtered. This is an array with shape `(m,n)`, where
|
||||
`m` is the number of bounding boxes and `n` is the number of elements that defines
|
||||
each bounding box (box coordinates, class ID, etc.). The box coordinates are expected
|
||||
to be in the image's coordinate system.
|
||||
image_height (int): Only relevant if `check_overlap == True`. The height of the image
|
||||
(in pixels) to compare the box coordinates to.
|
||||
image_width (int): `check_overlap == True`. The width of the image (in pixels) to compare
|
||||
the box coordinates to.
|
||||
|
||||
Returns:
|
||||
An array containing the labels of all boxes that are valid.
|
||||
'''
|
||||
|
||||
labels = np.copy(labels)
|
||||
|
||||
xmin = self.labels_format['xmin']
|
||||
ymin = self.labels_format['ymin']
|
||||
xmax = self.labels_format['xmax']
|
||||
ymax = self.labels_format['ymax']
|
||||
|
||||
# Record the boxes that pass all checks here.
|
||||
requirements_met = np.ones(shape=labels.shape[0], dtype=np.bool)
|
||||
|
||||
if self.check_degenerate:
|
||||
|
||||
non_degenerate = (labels[:,xmax] > labels[:,xmin]) * (labels[:,ymax] > labels[:,ymin])
|
||||
requirements_met *= non_degenerate
|
||||
|
||||
if self.check_min_area:
|
||||
|
||||
min_area_met = (labels[:,xmax] - labels[:,xmin]) * (labels[:,ymax] - labels[:,ymin]) >= self.min_area
|
||||
requirements_met *= min_area_met
|
||||
|
||||
if self.check_overlap:
|
||||
|
||||
# Get the lower and upper bounds.
|
||||
if isinstance(self.overlap_bounds, BoundGenerator):
|
||||
lower, upper = self.overlap_bounds()
|
||||
else:
|
||||
lower, upper = self.overlap_bounds
|
||||
|
||||
# Compute which boxes are valid.
|
||||
|
||||
if self.overlap_criterion == 'iou':
|
||||
# Compute the patch coordinates.
|
||||
image_coords = np.array([0, 0, image_width, image_height])
|
||||
# Compute the IoU between the patch and all of the ground truth boxes.
|
||||
image_boxes_iou = iou(image_coords, labels[:, [xmin, ymin, xmax, ymax]], coords='corners', mode='element-wise', border_pixels=self.border_pixels)
|
||||
requirements_met *= (image_boxes_iou > lower) * (image_boxes_iou <= upper)
|
||||
|
||||
elif self.overlap_criterion == 'area':
|
||||
if self.border_pixels == 'half':
|
||||
d = 0
|
||||
elif self.border_pixels == 'include':
|
||||
d = 1 # If border pixels are supposed to belong to the bounding boxes, we have to add one pixel to any difference `xmax - xmin` or `ymax - ymin`.
|
||||
elif self.border_pixels == 'exclude':
|
||||
d = -1 # If border pixels are not supposed to belong to the bounding boxes, we have to subtract one pixel from any difference `xmax - xmin` or `ymax - ymin`.
|
||||
# Compute the areas of the boxes.
|
||||
box_areas = (labels[:,xmax] - labels[:,xmin] + d) * (labels[:,ymax] - labels[:,ymin] + d)
|
||||
# Compute the intersection area between the patch and all of the ground truth boxes.
|
||||
clipped_boxes = np.copy(labels)
|
||||
clipped_boxes[:,[ymin,ymax]] = np.clip(labels[:,[ymin,ymax]], a_min=0, a_max=image_height-1)
|
||||
clipped_boxes[:,[xmin,xmax]] = np.clip(labels[:,[xmin,xmax]], a_min=0, a_max=image_width-1)
|
||||
intersection_areas = (clipped_boxes[:,xmax] - clipped_boxes[:,xmin] + d) * (clipped_boxes[:,ymax] - clipped_boxes[:,ymin] + d) # +1 because the border pixels belong to the box areas.
|
||||
# Check which boxes meet the overlap requirements.
|
||||
if lower == 0.0:
|
||||
mask_lower = intersection_areas > lower * box_areas # If `self.lower == 0`, we want to make sure that boxes with area 0 don't count, hence the ">" sign instead of the ">=" sign.
|
||||
else:
|
||||
mask_lower = intersection_areas >= lower * box_areas # Especially for the case `self.lower == 1` we want the ">=" sign, otherwise no boxes would count at all.
|
||||
mask_upper = intersection_areas <= upper * box_areas
|
||||
requirements_met *= mask_lower * mask_upper
|
||||
|
||||
elif self.overlap_criterion == 'center_point':
|
||||
# Compute the center points of the boxes.
|
||||
cy = (labels[:,ymin] + labels[:,ymax]) / 2
|
||||
cx = (labels[:,xmin] + labels[:,xmax]) / 2
|
||||
# Check which of the boxes have center points within the cropped patch remove those that don't.
|
||||
requirements_met *= (cy >= 0.0) * (cy <= image_height-1) * (cx >= 0.0) * (cx <= image_width-1)
|
||||
|
||||
return labels[requirements_met]
|
||||
|
||||
class ImageValidator:
|
||||
'''
|
||||
Returns `True` if a given minimum number of bounding boxes meets given overlap
|
||||
requirements with an image of a given height and width.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
overlap_criterion='center_point',
|
||||
bounds=(0.3, 1.0),
|
||||
n_boxes_min=1,
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4},
|
||||
border_pixels='half'):
|
||||
'''
|
||||
Arguments:
|
||||
overlap_criterion (str, optional): Can be either of 'center_point', 'iou', or 'area'. Determines
|
||||
which boxes are considered valid with respect to a given image. If set to 'center_point',
|
||||
a given bounding box is considered valid if its center point lies within the image.
|
||||
If set to 'area', a given bounding box is considered valid if the quotient of its intersection
|
||||
area with the image and its own area is within `lower` and `upper`. If set to 'iou', a given
|
||||
bounding box is considered valid if its IoU with the image is within `lower` and `upper`.
|
||||
bounds (list or BoundGenerator, optional): Only relevant if `overlap_criterion` is 'area' or 'iou'.
|
||||
Determines the lower and upper bounds for `overlap_criterion`. Can be either a 2-tuple of scalars
|
||||
representing a lower bound and an upper bound, or a `BoundGenerator` object, which provides
|
||||
the possibility to generate bounds randomly.
|
||||
n_boxes_min (int or str, optional): Either a non-negative integer or the string 'all'.
|
||||
Determines the minimum number of boxes that must meet the `overlap_criterion` with respect to
|
||||
an image of the given height and width in order for the image to be a valid image.
|
||||
If set to 'all', an image is considered valid if all given boxes meet the `overlap_criterion`.
|
||||
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
|
||||
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
|
||||
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
|
||||
border_pixels (str, optional): How to treat the border pixels of the bounding boxes.
|
||||
Can be 'include', 'exclude', or 'half'. If 'include', the border pixels belong
|
||||
to the boxes. If 'exclude', the border pixels do not belong to the boxes.
|
||||
If 'half', then one of each of the two horizontal and vertical borders belong
|
||||
to the boxex, but not the other.
|
||||
'''
|
||||
if not ((isinstance(n_boxes_min, int) and n_boxes_min > 0) or n_boxes_min == 'all'):
|
||||
raise ValueError("`n_boxes_min` must be a positive integer or 'all'.")
|
||||
self.overlap_criterion = overlap_criterion
|
||||
self.bounds = bounds
|
||||
self.n_boxes_min = n_boxes_min
|
||||
self.labels_format = labels_format
|
||||
self.border_pixels = border_pixels
|
||||
self.box_filter = BoxFilter(check_overlap=True,
|
||||
check_min_area=False,
|
||||
check_degenerate=False,
|
||||
overlap_criterion=self.overlap_criterion,
|
||||
overlap_bounds=self.bounds,
|
||||
labels_format=self.labels_format,
|
||||
border_pixels=self.border_pixels)
|
||||
|
||||
def __call__(self,
|
||||
labels,
|
||||
image_height,
|
||||
image_width):
|
||||
'''
|
||||
Arguments:
|
||||
labels (array): The labels to be tested. The box coordinates are expected
|
||||
to be in the image's coordinate system.
|
||||
image_height (int): The height of the image to compare the box coordinates to.
|
||||
image_width (int): The width of the image to compare the box coordinates to.
|
||||
|
||||
Returns:
|
||||
A boolean indicating whether an imgae of the given height and width is
|
||||
valid with respect to the given bounding boxes.
|
||||
'''
|
||||
|
||||
self.box_filter.overlap_bounds = self.bounds
|
||||
self.box_filter.labels_format = self.labels_format
|
||||
|
||||
# Get all boxes that meet the overlap requirements.
|
||||
valid_labels = self.box_filter(labels=labels,
|
||||
image_height=image_height,
|
||||
image_width=image_width)
|
||||
|
||||
# Check whether enough boxes meet the requirements.
|
||||
if isinstance(self.n_boxes_min, int):
|
||||
# The image is valid if at least `self.n_boxes_min` ground truth boxes meet the requirements.
|
||||
if len(valid_labels) >= self.n_boxes_min:
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
elif self.n_boxes_min == 'all':
|
||||
# The image is valid if all ground truth boxes meet the requirements.
|
||||
if len(valid_labels) == len(labels):
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
@@ -0,0 +1,73 @@
|
||||
'''
|
||||
Miscellaneous data generator utilities.
|
||||
|
||||
Copyright (C) 2018 Pierluigi Ferrari
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
'''
|
||||
|
||||
from __future__ import division
|
||||
import numpy as np
|
||||
|
||||
def apply_inverse_transforms(y_pred_decoded, inverse_transforms):
|
||||
'''
|
||||
Takes a list or Numpy array of decoded predictions and applies a given list of
|
||||
transforms to them. The list of inverse transforms would usually contain the
|
||||
inverter functions that some of the image transformations that come with this
|
||||
data generator return. This function would normally be used to transform predictions
|
||||
that were made on a transformed image back to the original image.
|
||||
|
||||
Arguments:
|
||||
y_pred_decoded (list or array): Either a list of length `batch_size` that
|
||||
contains Numpy arrays that contain the predictions for each batch item
|
||||
or a Numpy array. If this is a list of Numpy arrays, the arrays would
|
||||
usually have the shape `(num_predictions, 6)`, where `num_predictions`
|
||||
is different for each batch item. If this is a Numpy array, it would
|
||||
usually have the shape `(batch_size, num_predictions, 6)`. The last axis
|
||||
would usually contain the class ID, confidence score, and four bounding
|
||||
box coordinates for each prediction.
|
||||
inverse_predictions (list): A nested list of length `batch_size` that contains
|
||||
for each batch item a list of functions that take one argument (one element
|
||||
of `y_pred_decoded` if it is a list or one slice along the first axis of
|
||||
`y_pred_decoded` if it is an array) and return an output of the same shape
|
||||
and data type.
|
||||
|
||||
Returns:
|
||||
The transformed predictions, which have the same structure as `y_pred_decoded`.
|
||||
'''
|
||||
|
||||
if isinstance(y_pred_decoded, list):
|
||||
|
||||
y_pred_decoded_inv = []
|
||||
|
||||
for i in range(len(y_pred_decoded)):
|
||||
y_pred_decoded_inv.append(np.copy(y_pred_decoded[i]))
|
||||
if y_pred_decoded_inv[i].size > 0: # If there are any predictions for this batch item.
|
||||
for inverter in inverse_transforms[i]:
|
||||
if not (inverter is None):
|
||||
y_pred_decoded_inv[i] = inverter(y_pred_decoded_inv[i])
|
||||
|
||||
elif isinstance(y_pred_decoded, np.ndarray):
|
||||
|
||||
y_pred_decoded_inv = np.copy(y_pred_decoded)
|
||||
|
||||
for i in range(len(y_pred_decoded)):
|
||||
if y_pred_decoded_inv[i].size > 0: # If there are any predictions for this batch item.
|
||||
for inverter in inverse_transforms[i]:
|
||||
if not (inverter is None):
|
||||
y_pred_decoded_inv[i] = inverter(y_pred_decoded_inv[i])
|
||||
|
||||
else:
|
||||
raise ValueError("`y_pred_decoded` must be either a list or a Numpy array.")
|
||||
|
||||
return y_pred_decoded_inv
|
||||
@@ -0,0 +1,881 @@
|
||||
'''
|
||||
Various patch sampling operations for data augmentation in 2D object detection.
|
||||
|
||||
Copyright (C) 2018 Pierluigi Ferrari
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
'''
|
||||
|
||||
from __future__ import division
|
||||
import numpy as np
|
||||
|
||||
from data_generator.object_detection_2d_image_boxes_validation_utils import BoundGenerator, BoxFilter, ImageValidator
|
||||
|
||||
class PatchCoordinateGenerator:
|
||||
'''
|
||||
Generates random patch coordinates that meet specified requirements.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
img_height=None,
|
||||
img_width=None,
|
||||
must_match='h_w',
|
||||
min_scale=0.3,
|
||||
max_scale=1.0,
|
||||
scale_uniformly=False,
|
||||
min_aspect_ratio = 0.5,
|
||||
max_aspect_ratio = 2.0,
|
||||
patch_ymin=None,
|
||||
patch_xmin=None,
|
||||
patch_height=None,
|
||||
patch_width=None,
|
||||
patch_aspect_ratio=None):
|
||||
'''
|
||||
Arguments:
|
||||
img_height (int): The height of the image for which the patch coordinates
|
||||
shall be generated. Doesn't have to be known upon construction.
|
||||
img_width (int): The width of the image for which the patch coordinates
|
||||
shall be generated. Doesn't have to be known upon construction.
|
||||
must_match (str, optional): Can be either of 'h_w', 'h_ar', and 'w_ar'.
|
||||
Specifies which two of the three quantities height, width, and aspect
|
||||
ratio determine the shape of the generated patch. The respective third
|
||||
quantity will be computed from the other two. For example,
|
||||
if `must_match == 'h_w'`, then the patch's height and width will be
|
||||
set to lie within [min_scale, max_scale] of the image size or to
|
||||
`patch_height` and/or `patch_width`, if given. The patch's aspect ratio
|
||||
is the dependent variable in this case, it will be computed from the
|
||||
height and width. Any given values for `patch_aspect_ratio`,
|
||||
`min_aspect_ratio`, or `max_aspect_ratio` will be ignored.
|
||||
min_scale (float, optional): The minimum size of a dimension of the patch
|
||||
as a fraction of the respective dimension of the image. Can be greater
|
||||
than 1. For example, if the image width is 200 and `min_scale == 0.5`,
|
||||
then the width of the generated patch will be at least 100. If `min_scale == 1.5`,
|
||||
the width of the generated patch will be at least 300.
|
||||
max_scale (float, optional): The maximum size of a dimension of the patch
|
||||
as a fraction of the respective dimension of the image. Can be greater
|
||||
than 1. For example, if the image width is 200 and `max_scale == 1.0`,
|
||||
then the width of the generated patch will be at most 200. If `max_scale == 1.5`,
|
||||
the width of the generated patch will be at most 300. Must be greater than
|
||||
`min_scale`.
|
||||
scale_uniformly (bool, optional): If `True` and if `must_match == 'h_w'`,
|
||||
the patch height and width will be scaled uniformly, otherwise they will
|
||||
be scaled independently.
|
||||
min_aspect_ratio (float, optional): Determines the minimum aspect ratio
|
||||
for the generated patches.
|
||||
max_aspect_ratio (float, optional): Determines the maximum aspect ratio
|
||||
for the generated patches.
|
||||
patch_ymin (int, optional): `None` or the vertical coordinate of the top left
|
||||
corner of the generated patches. If this is not `None`, the position of the
|
||||
patches along the vertical axis is fixed. If this is `None`, then the
|
||||
vertical position of generated patches will be chosen randomly such that
|
||||
the overlap of a patch and the image along the vertical dimension is
|
||||
always maximal.
|
||||
patch_xmin (int, optional): `None` or the horizontal coordinate of the top left
|
||||
corner of the generated patches. If this is not `None`, the position of the
|
||||
patches along the horizontal axis is fixed. If this is `None`, then the
|
||||
horizontal position of generated patches will be chosen randomly such that
|
||||
the overlap of a patch and the image along the horizontal dimension is
|
||||
always maximal.
|
||||
patch_height (int, optional): `None` or the fixed height of the generated patches.
|
||||
patch_width (int, optional): `None` or the fixed width of the generated patches.
|
||||
patch_aspect_ratio (float, optional): `None` or the fixed aspect ratio of the
|
||||
generated patches.
|
||||
'''
|
||||
|
||||
if not (must_match in {'h_w', 'h_ar', 'w_ar'}):
|
||||
raise ValueError("`must_match` must be either of 'h_w', 'h_ar' and 'w_ar'.")
|
||||
if min_scale >= max_scale:
|
||||
raise ValueError("It must be `min_scale < max_scale`.")
|
||||
if min_aspect_ratio >= max_aspect_ratio:
|
||||
raise ValueError("It must be `min_aspect_ratio < max_aspect_ratio`.")
|
||||
if scale_uniformly and not ((patch_height is None) and (patch_width is None)):
|
||||
raise ValueError("If `scale_uniformly == True`, `patch_height` and `patch_width` must both be `None`.")
|
||||
self.img_height = img_height
|
||||
self.img_width = img_width
|
||||
self.must_match = must_match
|
||||
self.min_scale = min_scale
|
||||
self.max_scale = max_scale
|
||||
self.scale_uniformly = scale_uniformly
|
||||
self.min_aspect_ratio = min_aspect_ratio
|
||||
self.max_aspect_ratio = max_aspect_ratio
|
||||
self.patch_ymin = patch_ymin
|
||||
self.patch_xmin = patch_xmin
|
||||
self.patch_height = patch_height
|
||||
self.patch_width = patch_width
|
||||
self.patch_aspect_ratio = patch_aspect_ratio
|
||||
|
||||
def __call__(self):
|
||||
'''
|
||||
Returns:
|
||||
A 4-tuple `(ymin, xmin, height, width)` that represents the coordinates
|
||||
of the generated patch.
|
||||
'''
|
||||
|
||||
# Get the patch height and width.
|
||||
|
||||
if self.must_match == 'h_w': # Aspect is the dependent variable.
|
||||
if not self.scale_uniformly:
|
||||
# Get the height.
|
||||
if self.patch_height is None:
|
||||
patch_height = int(np.random.uniform(self.min_scale, self.max_scale) * self.img_height)
|
||||
else:
|
||||
patch_height = self.patch_height
|
||||
# Get the width.
|
||||
if self.patch_width is None:
|
||||
patch_width = int(np.random.uniform(self.min_scale, self.max_scale) * self.img_width)
|
||||
else:
|
||||
patch_width = self.patch_width
|
||||
else:
|
||||
scaling_factor = np.random.uniform(self.min_scale, self.max_scale)
|
||||
patch_height = int(scaling_factor * self.img_height)
|
||||
patch_width = int(scaling_factor * self.img_width)
|
||||
|
||||
elif self.must_match == 'h_ar': # Width is the dependent variable.
|
||||
# Get the height.
|
||||
if self.patch_height is None:
|
||||
patch_height = int(np.random.uniform(self.min_scale, self.max_scale) * self.img_height)
|
||||
else:
|
||||
patch_height = self.patch_height
|
||||
# Get the aspect ratio.
|
||||
if self.patch_aspect_ratio is None:
|
||||
patch_aspect_ratio = np.random.uniform(self.min_aspect_ratio, self.max_aspect_ratio)
|
||||
else:
|
||||
patch_aspect_ratio = self.patch_aspect_ratio
|
||||
# Get the width.
|
||||
patch_width = int(patch_height * patch_aspect_ratio)
|
||||
|
||||
elif self.must_match == 'w_ar': # Height is the dependent variable.
|
||||
# Get the width.
|
||||
if self.patch_width is None:
|
||||
patch_width = int(np.random.uniform(self.min_scale, self.max_scale) * self.img_width)
|
||||
else:
|
||||
patch_width = self.patch_width
|
||||
# Get the aspect ratio.
|
||||
if self.patch_aspect_ratio is None:
|
||||
patch_aspect_ratio = np.random.uniform(self.min_aspect_ratio, self.max_aspect_ratio)
|
||||
else:
|
||||
patch_aspect_ratio = self.patch_aspect_ratio
|
||||
# Get the height.
|
||||
patch_height = int(patch_width / patch_aspect_ratio)
|
||||
|
||||
# Get the top left corner coordinates of the patch.
|
||||
|
||||
if self.patch_ymin is None:
|
||||
# Compute how much room we have along the vertical axis to place the patch.
|
||||
# A negative number here means that we want to sample a patch that is larger than the original image
|
||||
# in the vertical dimension, in which case the patch will be placed such that it fully contains the
|
||||
# image in the vertical dimension.
|
||||
y_range = self.img_height - patch_height
|
||||
# Select a random top left corner for the sample position from the possible positions.
|
||||
if y_range >= 0: patch_ymin = np.random.randint(0, y_range + 1) # There are y_range + 1 possible positions for the crop in the vertical dimension.
|
||||
else: patch_ymin = np.random.randint(y_range, 1) # The possible positions for the image on the background canvas in the vertical dimension.
|
||||
else:
|
||||
patch_ymin = self.patch_ymin
|
||||
|
||||
if self.patch_xmin is None:
|
||||
# Compute how much room we have along the horizontal axis to place the patch.
|
||||
# A negative number here means that we want to sample a patch that is larger than the original image
|
||||
# in the horizontal dimension, in which case the patch will be placed such that it fully contains the
|
||||
# image in the horizontal dimension.
|
||||
x_range = self.img_width - patch_width
|
||||
# Select a random top left corner for the sample position from the possible positions.
|
||||
if x_range >= 0: patch_xmin = np.random.randint(0, x_range + 1) # There are x_range + 1 possible positions for the crop in the horizontal dimension.
|
||||
else: patch_xmin = np.random.randint(x_range, 1) # The possible positions for the image on the background canvas in the horizontal dimension.
|
||||
else:
|
||||
patch_xmin = self.patch_xmin
|
||||
|
||||
return (patch_ymin, patch_xmin, patch_height, patch_width)
|
||||
|
||||
class CropPad:
|
||||
'''
|
||||
Crops and/or pads an image deterministically.
|
||||
|
||||
Depending on the given output patch size and the position (top left corner) relative
|
||||
to the input image, the image will be cropped and/or padded along one or both spatial
|
||||
dimensions.
|
||||
|
||||
For example, if the output patch lies entirely within the input image, this will result
|
||||
in a regular crop. If the input image lies entirely within the output patch, this will
|
||||
result in the image being padded in every direction. All other cases are mixed cases
|
||||
where the image might be cropped in some directions and padded in others.
|
||||
|
||||
The output patch can be arbitrary in both size and position as long as it overlaps
|
||||
with the input image.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
patch_ymin,
|
||||
patch_xmin,
|
||||
patch_height,
|
||||
patch_width,
|
||||
clip_boxes=True,
|
||||
box_filter=None,
|
||||
background=(0,0,0),
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
'''
|
||||
Arguments:
|
||||
patch_ymin (int, optional): The vertical coordinate of the top left corner of the output
|
||||
patch relative to the image coordinate system. Can be negative (i.e. lie outside the image)
|
||||
as long as the resulting patch still overlaps with the image.
|
||||
patch_ymin (int, optional): The horizontal coordinate of the top left corner of the output
|
||||
patch relative to the image coordinate system. Can be negative (i.e. lie outside the image)
|
||||
as long as the resulting patch still overlaps with the image.
|
||||
patch_height (int): The height of the patch to be sampled from the image. Can be greater
|
||||
than the height of the input image.
|
||||
patch_width (int): The width of the patch to be sampled from the image. Can be greater
|
||||
than the width of the input image.
|
||||
clip_boxes (bool, optional): Only relevant if ground truth bounding boxes are given.
|
||||
If `True`, any ground truth bounding boxes will be clipped to lie entirely within the
|
||||
sampled patch.
|
||||
box_filter (BoxFilter, optional): Only relevant if ground truth bounding boxes are given.
|
||||
A `BoxFilter` object to filter out bounding boxes that don't meet the given criteria
|
||||
after the transformation. Refer to the `BoxFilter` documentation for details. If `None`,
|
||||
the validity of the bounding boxes is not checked.
|
||||
background (list/tuple, optional): A 3-tuple specifying the RGB color value of the potential
|
||||
background pixels of the scaled images. In the case of single-channel images,
|
||||
the first element of `background` will be used as the background pixel value.
|
||||
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
|
||||
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
|
||||
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
|
||||
'''
|
||||
#if (patch_height <= 0) or (patch_width <= 0):
|
||||
# raise ValueError("Patch height and width must both be positive.")
|
||||
#if (patch_ymin + patch_height < 0) or (patch_xmin + patch_width < 0):
|
||||
# raise ValueError("A patch with the given coordinates cannot overlap with an input image.")
|
||||
if not (isinstance(box_filter, BoxFilter) or box_filter is None):
|
||||
raise ValueError("`box_filter` must be either `None` or a `BoxFilter` object.")
|
||||
self.patch_height = patch_height
|
||||
self.patch_width = patch_width
|
||||
self.patch_ymin = patch_ymin
|
||||
self.patch_xmin = patch_xmin
|
||||
self.clip_boxes = clip_boxes
|
||||
self.box_filter = box_filter
|
||||
self.background = background
|
||||
self.labels_format = labels_format
|
||||
|
||||
def __call__(self, image, labels=None, return_inverter=False):
|
||||
|
||||
img_height, img_width = image.shape[:2]
|
||||
|
||||
if (self.patch_ymin > img_height) or (self.patch_xmin > img_width):
|
||||
raise ValueError("The given patch doesn't overlap with the input image.")
|
||||
|
||||
labels = np.copy(labels)
|
||||
|
||||
xmin = self.labels_format['xmin']
|
||||
ymin = self.labels_format['ymin']
|
||||
xmax = self.labels_format['xmax']
|
||||
ymax = self.labels_format['ymax']
|
||||
|
||||
# Top left corner of the patch relative to the image coordinate system:
|
||||
patch_ymin = self.patch_ymin
|
||||
patch_xmin = self.patch_xmin
|
||||
|
||||
# Create a canvas of the size of the patch we want to end up with.
|
||||
if image.ndim == 3:
|
||||
canvas = np.zeros(shape=(self.patch_height, self.patch_width, 3), dtype=np.uint8)
|
||||
canvas[:, :] = self.background
|
||||
elif image.ndim == 2:
|
||||
canvas = np.zeros(shape=(self.patch_height, self.patch_width), dtype=np.uint8)
|
||||
canvas[:, :] = self.background[0]
|
||||
|
||||
# Perform the crop.
|
||||
if patch_ymin < 0 and patch_xmin < 0: # Pad the image at the top and on the left.
|
||||
image_crop_height = min(img_height, self.patch_height + patch_ymin) # The number of pixels of the image that will end up on the canvas in the vertical direction.
|
||||
image_crop_width = min(img_width, self.patch_width + patch_xmin) # The number of pixels of the image that will end up on the canvas in the horizontal direction.
|
||||
canvas[-patch_ymin:-patch_ymin + image_crop_height, -patch_xmin:-patch_xmin + image_crop_width] = image[:image_crop_height, :image_crop_width]
|
||||
|
||||
elif patch_ymin < 0 and patch_xmin >= 0: # Pad the image at the top and crop it on the left.
|
||||
image_crop_height = min(img_height, self.patch_height + patch_ymin) # The number of pixels of the image that will end up on the canvas in the vertical direction.
|
||||
image_crop_width = min(self.patch_width, img_width - patch_xmin) # The number of pixels of the image that will end up on the canvas in the horizontal direction.
|
||||
canvas[-patch_ymin:-patch_ymin + image_crop_height, :image_crop_width] = image[:image_crop_height, patch_xmin:patch_xmin + image_crop_width]
|
||||
|
||||
elif patch_ymin >= 0 and patch_xmin < 0: # Crop the image at the top and pad it on the left.
|
||||
image_crop_height = min(self.patch_height, img_height - patch_ymin) # The number of pixels of the image that will end up on the canvas in the vertical direction.
|
||||
image_crop_width = min(img_width, self.patch_width + patch_xmin) # The number of pixels of the image that will end up on the canvas in the horizontal direction.
|
||||
canvas[:image_crop_height, -patch_xmin:-patch_xmin + image_crop_width] = image[patch_ymin:patch_ymin + image_crop_height, :image_crop_width]
|
||||
|
||||
elif patch_ymin >= 0 and patch_xmin >= 0: # Crop the image at the top and on the left.
|
||||
image_crop_height = min(self.patch_height, img_height - patch_ymin) # The number of pixels of the image that will end up on the canvas in the vertical direction.
|
||||
image_crop_width = min(self.patch_width, img_width - patch_xmin) # The number of pixels of the image that will end up on the canvas in the horizontal direction.
|
||||
canvas[:image_crop_height, :image_crop_width] = image[patch_ymin:patch_ymin + image_crop_height, patch_xmin:patch_xmin + image_crop_width]
|
||||
|
||||
image = canvas
|
||||
|
||||
if return_inverter:
|
||||
def inverter(labels):
|
||||
labels = np.copy(labels)
|
||||
labels[:, [ymin+1, ymax+1]] += patch_ymin
|
||||
labels[:, [xmin+1, xmax+1]] += patch_xmin
|
||||
return labels
|
||||
|
||||
if not (labels is None):
|
||||
|
||||
# Translate the box coordinates to the patch's coordinate system.
|
||||
labels[:, [ymin, ymax]] -= patch_ymin
|
||||
labels[:, [xmin, xmax]] -= patch_xmin
|
||||
|
||||
# Compute all valid boxes for this patch.
|
||||
if not (self.box_filter is None):
|
||||
self.box_filter.labels_format = self.labels_format
|
||||
labels = self.box_filter(labels=labels,
|
||||
image_height=self.patch_height,
|
||||
image_width=self.patch_width)
|
||||
|
||||
if self.clip_boxes:
|
||||
labels[:,[ymin,ymax]] = np.clip(labels[:,[ymin,ymax]], a_min=0, a_max=self.patch_height-1)
|
||||
labels[:,[xmin,xmax]] = np.clip(labels[:,[xmin,xmax]], a_min=0, a_max=self.patch_width-1)
|
||||
|
||||
if return_inverter:
|
||||
return image, labels, inverter
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
else:
|
||||
if return_inverter:
|
||||
return image, inverter
|
||||
else:
|
||||
return image
|
||||
|
||||
class Crop:
|
||||
'''
|
||||
Crops off the specified numbers of pixels from the borders of images.
|
||||
|
||||
This is just a convenience interface for `CropPad`.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
crop_top,
|
||||
crop_bottom,
|
||||
crop_left,
|
||||
crop_right,
|
||||
clip_boxes=True,
|
||||
box_filter=None,
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
self.crop_top = crop_top
|
||||
self.crop_bottom = crop_bottom
|
||||
self.crop_left = crop_left
|
||||
self.crop_right = crop_right
|
||||
self.clip_boxes = clip_boxes
|
||||
self.box_filter = box_filter
|
||||
self.labels_format = labels_format
|
||||
self.crop = CropPad(patch_ymin=self.crop_top,
|
||||
patch_xmin=self.crop_left,
|
||||
patch_height=None,
|
||||
patch_width=None,
|
||||
clip_boxes=self.clip_boxes,
|
||||
box_filter=self.box_filter,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
def __call__(self, image, labels=None, return_inverter=False):
|
||||
|
||||
img_height, img_width = image.shape[:2]
|
||||
|
||||
self.crop.patch_height = img_height - self.crop_top - self.crop_bottom
|
||||
self.crop.patch_width = img_width - self.crop_left - self.crop_right
|
||||
self.crop.labels_format = self.labels_format
|
||||
|
||||
return self.crop(image, labels, return_inverter)
|
||||
|
||||
class Pad:
|
||||
'''
|
||||
Pads images by the specified numbers of pixels on each side.
|
||||
|
||||
This is just a convenience interface for `CropPad`.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
pad_top,
|
||||
pad_bottom,
|
||||
pad_left,
|
||||
pad_right,
|
||||
background=(0,0,0),
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
self.pad_top = pad_top
|
||||
self.pad_bottom = pad_bottom
|
||||
self.pad_left = pad_left
|
||||
self.pad_right = pad_right
|
||||
self.background = background
|
||||
self.labels_format = labels_format
|
||||
self.pad = CropPad(patch_ymin=-self.pad_top,
|
||||
patch_xmin=-self.pad_left,
|
||||
patch_height=None,
|
||||
patch_width=None,
|
||||
clip_boxes=False,
|
||||
box_filter=None,
|
||||
background=self.background,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
def __call__(self, image, labels=None, return_inverter=False):
|
||||
|
||||
img_height, img_width = image.shape[:2]
|
||||
|
||||
self.pad.patch_height = img_height + self.pad_top + self.pad_bottom
|
||||
self.pad.patch_width = img_width + self.pad_left + self.pad_right
|
||||
self.pad.labels_format = self.labels_format
|
||||
|
||||
return self.pad(image, labels, return_inverter)
|
||||
|
||||
class RandomPatch:
|
||||
'''
|
||||
Randomly samples a patch from an image. The randomness refers to whatever
|
||||
randomness may be introduced by the patch coordinate generator, the box filter,
|
||||
and the patch validator.
|
||||
|
||||
Input images may be cropped and/or padded along either or both of the two
|
||||
spatial dimensions as necessary in order to obtain the required patch.
|
||||
|
||||
As opposed to `RandomPatchInf`, it is possible for this transform to fail to produce
|
||||
an output image at all, in which case it will return `None`. This is useful, because
|
||||
if this transform is used to generate patches of a fixed size or aspect ratio, then
|
||||
the caller needs to be able to rely on the output image satisfying the set size or
|
||||
aspect ratio. It might therefore not be an option to return the unaltered input image
|
||||
as other random transforms do when they fail to produce a valid transformed image.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
patch_coord_generator,
|
||||
box_filter=None,
|
||||
image_validator=None,
|
||||
n_trials_max=3,
|
||||
clip_boxes=True,
|
||||
prob=1.0,
|
||||
background=(0,0,0),
|
||||
can_fail=False,
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
'''
|
||||
Arguments:
|
||||
patch_coord_generator (PatchCoordinateGenerator): A `PatchCoordinateGenerator` object
|
||||
to generate the positions and sizes of the patches to be sampled from the input images.
|
||||
box_filter (BoxFilter, optional): Only relevant if ground truth bounding boxes are given.
|
||||
A `BoxFilter` object to filter out bounding boxes that don't meet the given criteria
|
||||
after the transformation. Refer to the `BoxFilter` documentation for details. If `None`,
|
||||
the validity of the bounding boxes is not checked.
|
||||
image_validator (ImageValidator, optional): Only relevant if ground truth bounding boxes are given.
|
||||
An `ImageValidator` object to determine whether a sampled patch is valid. If `None`,
|
||||
any outcome is valid.
|
||||
n_trials_max (int, optional): Only relevant if ground truth bounding boxes are given.
|
||||
Determines the maxmial number of trials to sample a valid patch. If no valid patch could
|
||||
be sampled in `n_trials_max` trials, returns one `None` in place of each regular output.
|
||||
clip_boxes (bool, optional): Only relevant if ground truth bounding boxes are given.
|
||||
If `True`, any ground truth bounding boxes will be clipped to lie entirely within the
|
||||
sampled patch.
|
||||
prob (float, optional): `(1 - prob)` determines the probability with which the original,
|
||||
unaltered image is returned.
|
||||
background (list/tuple, optional): A 3-tuple specifying the RGB color value of the potential
|
||||
background pixels of the scaled images. In the case of single-channel images,
|
||||
the first element of `background` will be used as the background pixel value.
|
||||
can_fail (bool, optional): If `True`, will return `None` if no valid patch could be found after
|
||||
`n_trials_max` trials. If `False`, will return the unaltered input image in such a case.
|
||||
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
|
||||
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
|
||||
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
|
||||
'''
|
||||
if not isinstance(patch_coord_generator, PatchCoordinateGenerator):
|
||||
raise ValueError("`patch_coord_generator` must be an instance of `PatchCoordinateGenerator`.")
|
||||
if not (isinstance(image_validator, ImageValidator) or image_validator is None):
|
||||
raise ValueError("`image_validator` must be either `None` or an `ImageValidator` object.")
|
||||
self.patch_coord_generator = patch_coord_generator
|
||||
self.box_filter = box_filter
|
||||
self.image_validator = image_validator
|
||||
self.n_trials_max = n_trials_max
|
||||
self.clip_boxes = clip_boxes
|
||||
self.prob = prob
|
||||
self.background = background
|
||||
self.can_fail = can_fail
|
||||
self.labels_format = labels_format
|
||||
self.sample_patch = CropPad(patch_ymin=None,
|
||||
patch_xmin=None,
|
||||
patch_height=None,
|
||||
patch_width=None,
|
||||
clip_boxes=self.clip_boxes,
|
||||
box_filter=self.box_filter,
|
||||
background=self.background,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
def __call__(self, image, labels=None, return_inverter=False):
|
||||
|
||||
p = np.random.uniform(0,1)
|
||||
if p >= (1.0-self.prob):
|
||||
|
||||
img_height, img_width = image.shape[:2]
|
||||
self.patch_coord_generator.img_height = img_height
|
||||
self.patch_coord_generator.img_width = img_width
|
||||
|
||||
xmin = self.labels_format['xmin']
|
||||
ymin = self.labels_format['ymin']
|
||||
xmax = self.labels_format['xmax']
|
||||
ymax = self.labels_format['ymax']
|
||||
|
||||
# Override the preset labels format.
|
||||
if not self.image_validator is None:
|
||||
self.image_validator.labels_format = self.labels_format
|
||||
self.sample_patch.labels_format = self.labels_format
|
||||
|
||||
for _ in range(max(1, self.n_trials_max)):
|
||||
|
||||
# Generate patch coordinates.
|
||||
patch_ymin, patch_xmin, patch_height, patch_width = self.patch_coord_generator()
|
||||
|
||||
self.sample_patch.patch_ymin = patch_ymin
|
||||
self.sample_patch.patch_xmin = patch_xmin
|
||||
self.sample_patch.patch_height = patch_height
|
||||
self.sample_patch.patch_width = patch_width
|
||||
|
||||
if (labels is None) or (self.image_validator is None):
|
||||
# We either don't have any boxes or if we do, we will accept any outcome as valid.
|
||||
return self.sample_patch(image, labels, return_inverter)
|
||||
else:
|
||||
# Translate the box coordinates to the patch's coordinate system.
|
||||
new_labels = np.copy(labels)
|
||||
new_labels[:, [ymin, ymax]] -= patch_ymin
|
||||
new_labels[:, [xmin, xmax]] -= patch_xmin
|
||||
# Check if the patch is valid.
|
||||
if self.image_validator(labels=new_labels,
|
||||
image_height=patch_height,
|
||||
image_width=patch_width):
|
||||
return self.sample_patch(image, labels, return_inverter)
|
||||
|
||||
# If we weren't able to sample a valid patch...
|
||||
if self.can_fail:
|
||||
# ...return `None`.
|
||||
if labels is None:
|
||||
if return_inverter:
|
||||
return None, None
|
||||
else:
|
||||
return None
|
||||
else:
|
||||
if return_inverter:
|
||||
return None, None, None
|
||||
else:
|
||||
return None, None
|
||||
else:
|
||||
# ...return the unaltered input image.
|
||||
if labels is None:
|
||||
if return_inverter:
|
||||
return image, None
|
||||
else:
|
||||
return image
|
||||
else:
|
||||
if return_inverter:
|
||||
return image, labels, None
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
else:
|
||||
if return_inverter:
|
||||
def inverter(labels):
|
||||
return labels
|
||||
|
||||
if labels is None:
|
||||
if return_inverter:
|
||||
return image, inverter
|
||||
else:
|
||||
return image
|
||||
else:
|
||||
if return_inverter:
|
||||
return image, labels, inverter
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class RandomPatchInf:
|
||||
'''
|
||||
Randomly samples a patch from an image. The randomness refers to whatever
|
||||
randomness may be introduced by the patch coordinate generator, the box filter,
|
||||
and the patch validator.
|
||||
|
||||
Input images may be cropped and/or padded along either or both of the two
|
||||
spatial dimensions as necessary in order to obtain the required patch.
|
||||
|
||||
This operation is very similar to `RandomPatch`, except that:
|
||||
1. This operation runs indefinitely until either a valid patch is found or
|
||||
the input image is returned unaltered, i.e. it cannot fail.
|
||||
2. If a bound generator is given, a new pair of bounds will be generated
|
||||
every `n_trials_max` iterations.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
patch_coord_generator,
|
||||
box_filter=None,
|
||||
image_validator=None,
|
||||
bound_generator=None,
|
||||
n_trials_max=50,
|
||||
clip_boxes=True,
|
||||
prob=0.857,
|
||||
background=(0,0,0),
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
'''
|
||||
Arguments:
|
||||
patch_coord_generator (PatchCoordinateGenerator): A `PatchCoordinateGenerator` object
|
||||
to generate the positions and sizes of the patches to be sampled from the input images.
|
||||
box_filter (BoxFilter, optional): Only relevant if ground truth bounding boxes are given.
|
||||
A `BoxFilter` object to filter out bounding boxes that don't meet the given criteria
|
||||
after the transformation. Refer to the `BoxFilter` documentation for details. If `None`,
|
||||
the validity of the bounding boxes is not checked.
|
||||
image_validator (ImageValidator, optional): Only relevant if ground truth bounding boxes are given.
|
||||
An `ImageValidator` object to determine whether a sampled patch is valid. If `None`,
|
||||
any outcome is valid.
|
||||
bound_generator (BoundGenerator, optional): A `BoundGenerator` object to generate upper and
|
||||
lower bound values for the patch validator. Every `n_trials_max` trials, a new pair of
|
||||
upper and lower bounds will be generated until a valid patch is found or the original image
|
||||
is returned. This bound generator overrides the bound generator of the patch validator.
|
||||
n_trials_max (int, optional): Only relevant if ground truth bounding boxes are given.
|
||||
The sampler will run indefinitely until either a valid patch is found or the original image
|
||||
is returned, but this determines the maxmial number of trials to sample a valid patch for each
|
||||
selected pair of lower and upper bounds before a new pair is picked.
|
||||
clip_boxes (bool, optional): Only relevant if ground truth bounding boxes are given.
|
||||
If `True`, any ground truth bounding boxes will be clipped to lie entirely within the
|
||||
sampled patch.
|
||||
prob (float, optional): `(1 - prob)` determines the probability with which the original,
|
||||
unaltered image is returned.
|
||||
background (list/tuple, optional): A 3-tuple specifying the RGB color value of the potential
|
||||
background pixels of the scaled images. In the case of single-channel images,
|
||||
the first element of `background` will be used as the background pixel value.
|
||||
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
|
||||
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
|
||||
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
|
||||
'''
|
||||
|
||||
if not isinstance(patch_coord_generator, PatchCoordinateGenerator):
|
||||
raise ValueError("`patch_coord_generator` must be an instance of `PatchCoordinateGenerator`.")
|
||||
if not (isinstance(image_validator, ImageValidator) or image_validator is None):
|
||||
raise ValueError("`image_validator` must be either `None` or an `ImageValidator` object.")
|
||||
if not (isinstance(bound_generator, BoundGenerator) or bound_generator is None):
|
||||
raise ValueError("`bound_generator` must be either `None` or a `BoundGenerator` object.")
|
||||
self.patch_coord_generator = patch_coord_generator
|
||||
self.box_filter = box_filter
|
||||
self.image_validator = image_validator
|
||||
self.bound_generator = bound_generator
|
||||
self.n_trials_max = n_trials_max
|
||||
self.clip_boxes = clip_boxes
|
||||
self.prob = prob
|
||||
self.background = background
|
||||
self.labels_format = labels_format
|
||||
self.sample_patch = CropPad(patch_ymin=None,
|
||||
patch_xmin=None,
|
||||
patch_height=None,
|
||||
patch_width=None,
|
||||
clip_boxes=self.clip_boxes,
|
||||
box_filter=self.box_filter,
|
||||
background=self.background,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
def __call__(self, image, labels=None, return_inverter=False):
|
||||
|
||||
img_height, img_width = image.shape[:2]
|
||||
self.patch_coord_generator.img_height = img_height
|
||||
self.patch_coord_generator.img_width = img_width
|
||||
|
||||
xmin = self.labels_format['xmin']
|
||||
ymin = self.labels_format['ymin']
|
||||
xmax = self.labels_format['xmax']
|
||||
ymax = self.labels_format['ymax']
|
||||
|
||||
# Override the preset labels format.
|
||||
if not self.image_validator is None:
|
||||
self.image_validator.labels_format = self.labels_format
|
||||
self.sample_patch.labels_format = self.labels_format
|
||||
|
||||
while True: # Keep going until we either find a valid patch or return the original image.
|
||||
|
||||
p = np.random.uniform(0,1)
|
||||
if p >= (1.0-self.prob):
|
||||
|
||||
# In case we have a bound generator, pick a lower and upper bound for the patch validator.
|
||||
if not ((self.image_validator is None) or (self.bound_generator is None)):
|
||||
self.image_validator.bounds = self.bound_generator()
|
||||
|
||||
# Use at most `self.n_trials_max` attempts to find a crop
|
||||
# that meets our requirements.
|
||||
for _ in range(max(1, self.n_trials_max)):
|
||||
|
||||
# Generate patch coordinates.
|
||||
patch_ymin, patch_xmin, patch_height, patch_width = self.patch_coord_generator()
|
||||
|
||||
self.sample_patch.patch_ymin = patch_ymin
|
||||
self.sample_patch.patch_xmin = patch_xmin
|
||||
self.sample_patch.patch_height = patch_height
|
||||
self.sample_patch.patch_width = patch_width
|
||||
|
||||
# Check if the resulting patch meets the aspect ratio requirements.
|
||||
aspect_ratio = patch_width / patch_height
|
||||
if not (self.patch_coord_generator.min_aspect_ratio <= aspect_ratio <= self.patch_coord_generator.max_aspect_ratio):
|
||||
continue
|
||||
|
||||
if (labels is None) or (self.image_validator is None):
|
||||
# We either don't have any boxes or if we do, we will accept any outcome as valid.
|
||||
return self.sample_patch(image, labels, return_inverter)
|
||||
else:
|
||||
# Translate the box coordinates to the patch's coordinate system.
|
||||
new_labels = np.copy(labels)
|
||||
new_labels[:, [ymin, ymax]] -= patch_ymin
|
||||
new_labels[:, [xmin, xmax]] -= patch_xmin
|
||||
# Check if the patch contains the minimum number of boxes we require.
|
||||
if self.image_validator(labels=new_labels,
|
||||
image_height=patch_height,
|
||||
image_width=patch_width):
|
||||
return self.sample_patch(image, labels, return_inverter)
|
||||
else:
|
||||
if return_inverter:
|
||||
def inverter(labels):
|
||||
return labels
|
||||
|
||||
if labels is None:
|
||||
if return_inverter:
|
||||
return image, inverter
|
||||
else:
|
||||
return image
|
||||
else:
|
||||
if return_inverter:
|
||||
return image, labels, inverter
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class RandomMaxCropFixedAR:
|
||||
'''
|
||||
Crops the largest possible patch of a given fixed aspect ratio
|
||||
from an image.
|
||||
|
||||
Since the aspect ratio of the sampled patches is constant, they
|
||||
can subsequently be resized to the same size without distortion.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
patch_aspect_ratio,
|
||||
box_filter=None,
|
||||
image_validator=None,
|
||||
n_trials_max=3,
|
||||
clip_boxes=True,
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
'''
|
||||
Arguments:
|
||||
patch_aspect_ratio (float): The fixed aspect ratio that all sampled patches will have.
|
||||
box_filter (BoxFilter, optional): Only relevant if ground truth bounding boxes are given.
|
||||
A `BoxFilter` object to filter out bounding boxes that don't meet the given criteria
|
||||
after the transformation. Refer to the `BoxFilter` documentation for details. If `None`,
|
||||
the validity of the bounding boxes is not checked.
|
||||
image_validator (ImageValidator, optional): Only relevant if ground truth bounding boxes are given.
|
||||
An `ImageValidator` object to determine whether a sampled patch is valid. If `None`,
|
||||
any outcome is valid.
|
||||
n_trials_max (int, optional): Only relevant if ground truth bounding boxes are given.
|
||||
Determines the maxmial number of trials to sample a valid patch. If no valid patch could
|
||||
be sampled in `n_trials_max` trials, returns `None`.
|
||||
clip_boxes (bool, optional): Only relevant if ground truth bounding boxes are given.
|
||||
If `True`, any ground truth bounding boxes will be clipped to lie entirely within the
|
||||
sampled patch.
|
||||
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
|
||||
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
|
||||
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
|
||||
'''
|
||||
|
||||
self.patch_aspect_ratio = patch_aspect_ratio
|
||||
self.box_filter = box_filter
|
||||
self.image_validator = image_validator
|
||||
self.n_trials_max = n_trials_max
|
||||
self.clip_boxes = clip_boxes
|
||||
self.labels_format = labels_format
|
||||
self.random_patch = RandomPatch(patch_coord_generator=PatchCoordinateGenerator(), # Just a dummy object
|
||||
box_filter=self.box_filter,
|
||||
image_validator=self.image_validator,
|
||||
n_trials_max=self.n_trials_max,
|
||||
clip_boxes=self.clip_boxes,
|
||||
prob=1.0,
|
||||
can_fail=False,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
def __call__(self, image, labels=None, return_inverter=False):
|
||||
|
||||
img_height, img_width = image.shape[:2]
|
||||
|
||||
# The ratio of the input image aspect ratio and patch aspect ratio determines the maximal possible crop.
|
||||
image_aspect_ratio = img_width / img_height
|
||||
|
||||
if image_aspect_ratio < self.patch_aspect_ratio:
|
||||
patch_width = img_width
|
||||
patch_height = int(round(patch_width / self.patch_aspect_ratio))
|
||||
else:
|
||||
patch_height = img_height
|
||||
patch_width = int(round(patch_height * self.patch_aspect_ratio))
|
||||
|
||||
# Now that we know the desired height and width for the patch,
|
||||
# instantiate an appropriate patch coordinate generator.
|
||||
patch_coord_generator = PatchCoordinateGenerator(img_height=img_height,
|
||||
img_width=img_width,
|
||||
must_match='h_w',
|
||||
patch_height=patch_height,
|
||||
patch_width=patch_width)
|
||||
|
||||
# The rest of the work is done by `RandomPatch`.
|
||||
self.random_patch.patch_coord_generator = patch_coord_generator
|
||||
self.random_patch.labels_format = self.labels_format
|
||||
return self.random_patch(image, labels, return_inverter)
|
||||
|
||||
class RandomPadFixedAR:
|
||||
'''
|
||||
Adds the minimal possible padding to an image that results in a patch
|
||||
of the given fixed aspect ratio that contains the entire image.
|
||||
|
||||
Since the aspect ratio of the resulting images is constant, they
|
||||
can subsequently be resized to the same size without distortion.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
patch_aspect_ratio,
|
||||
background=(0,0,0),
|
||||
labels_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
'''
|
||||
Arguments:
|
||||
patch_aspect_ratio (float): The fixed aspect ratio that all sampled patches will have.
|
||||
background (list/tuple, optional): A 3-tuple specifying the RGB color value of the potential
|
||||
background pixels of the scaled images. In the case of single-channel images,
|
||||
the first element of `background` will be used as the background pixel value.
|
||||
labels_format (dict, optional): A dictionary that defines which index in the last axis of the labels
|
||||
of an image contains which bounding box coordinate. The dictionary maps at least the keywords
|
||||
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis of the labels array.
|
||||
'''
|
||||
|
||||
self.patch_aspect_ratio = patch_aspect_ratio
|
||||
self.background = background
|
||||
self.labels_format = labels_format
|
||||
self.random_patch = RandomPatch(patch_coord_generator=PatchCoordinateGenerator(), # Just a dummy object
|
||||
box_filter=None,
|
||||
image_validator=None,
|
||||
n_trials_max=1,
|
||||
clip_boxes=False,
|
||||
background=self.background,
|
||||
prob=1.0,
|
||||
labels_format=self.labels_format)
|
||||
|
||||
def __call__(self, image, labels=None, return_inverter=False):
|
||||
|
||||
img_height, img_width = image.shape[:2]
|
||||
|
||||
if img_width < img_height:
|
||||
patch_height = img_height
|
||||
patch_width = int(round(patch_height * self.patch_aspect_ratio))
|
||||
else:
|
||||
patch_width = img_width
|
||||
patch_height = int(round(patch_width / self.patch_aspect_ratio))
|
||||
|
||||
# Now that we know the desired height and width for the patch,
|
||||
# instantiate an appropriate patch coordinate generator.
|
||||
patch_coord_generator = PatchCoordinateGenerator(img_height=img_height,
|
||||
img_width=img_width,
|
||||
must_match='h_w',
|
||||
patch_height=patch_height,
|
||||
patch_width=patch_width)
|
||||
|
||||
# The rest of the work is done by `RandomPatch`.
|
||||
self.random_patch.patch_coord_generator = patch_coord_generator
|
||||
self.random_patch.labels_format = self.labels_format
|
||||
return self.random_patch(image, labels, return_inverter)
|
||||
@@ -0,0 +1,485 @@
|
||||
'''
|
||||
Various photometric image transformations, both deterministic and probabilistic.
|
||||
|
||||
Copyright (C) 2018 Pierluigi Ferrari
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
'''
|
||||
|
||||
from __future__ import division
|
||||
import numpy as np
|
||||
import cv2
|
||||
|
||||
class ConvertColor:
|
||||
'''
|
||||
Converts images between RGB, HSV and grayscale color spaces. This is just a wrapper
|
||||
around `cv2.cvtColor()`.
|
||||
'''
|
||||
def __init__(self, current='RGB', to='HSV', keep_3ch=True):
|
||||
'''
|
||||
Arguments:
|
||||
current (str, optional): The current color space of the images. Can be
|
||||
one of 'RGB' and 'HSV'.
|
||||
to (str, optional): The target color space of the images. Can be one of
|
||||
'RGB', 'HSV', and 'GRAY'.
|
||||
keep_3ch (bool, optional): Only relevant if `to == GRAY`.
|
||||
If `True`, the resulting grayscale images will have three channels.
|
||||
'''
|
||||
if not ((current in {'RGB', 'HSV'}) and (to in {'RGB', 'HSV', 'GRAY'})):
|
||||
raise NotImplementedError
|
||||
self.current = current
|
||||
self.to = to
|
||||
self.keep_3ch = keep_3ch
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
if self.current == 'RGB' and self.to == 'HSV':
|
||||
image = cv2.cvtColor(image, cv2.COLOR_RGB2HSV)
|
||||
elif self.current == 'RGB' and self.to == 'GRAY':
|
||||
image = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
|
||||
if self.keep_3ch:
|
||||
image = np.stack([image] * 3, axis=-1)
|
||||
elif self.current == 'HSV' and self.to == 'RGB':
|
||||
image = cv2.cvtColor(image, cv2.COLOR_HSV2RGB)
|
||||
elif self.current == 'HSV' and self.to == 'GRAY':
|
||||
image = cv2.cvtColor(image, cv2.COLOR_HSV2GRAY)
|
||||
if self.keep_3ch:
|
||||
image = np.stack([image] * 3, axis=-1)
|
||||
if labels is None:
|
||||
return image
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class ConvertDataType:
|
||||
'''
|
||||
Converts images represented as Numpy arrays between `uint8` and `float32`.
|
||||
Serves as a helper for certain photometric distortions. This is just a wrapper
|
||||
around `np.ndarray.astype()`.
|
||||
'''
|
||||
def __init__(self, to='uint8'):
|
||||
'''
|
||||
Arguments:
|
||||
to (string, optional): To which datatype to convert the input images.
|
||||
Can be either of 'uint8' and 'float32'.
|
||||
'''
|
||||
if not (to == 'uint8' or to == 'float32'):
|
||||
raise ValueError("`to` can be either of 'uint8' or 'float32'.")
|
||||
self.to = to
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
if self.to == 'uint8':
|
||||
image = np.round(image, decimals=0).astype(np.uint8)
|
||||
else:
|
||||
image = image.astype(np.float32)
|
||||
if labels is None:
|
||||
return image
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class ConvertTo3Channels:
|
||||
'''
|
||||
Converts 1-channel and 4-channel images to 3-channel images. Does nothing to images that
|
||||
already have 3 channels. In the case of 4-channel images, the fourth channel will be
|
||||
discarded.
|
||||
'''
|
||||
def __init__(self):
|
||||
pass
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
if image.ndim == 2:
|
||||
image = np.stack([image] * 3, axis=-1)
|
||||
elif image.ndim == 3:
|
||||
if image.shape[2] == 1:
|
||||
image = np.concatenate([image] * 3, axis=-1)
|
||||
elif image.shape[2] == 4:
|
||||
image = image[:,:,:3]
|
||||
if labels is None:
|
||||
return image
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class Hue:
|
||||
'''
|
||||
Changes the hue of HSV images.
|
||||
|
||||
Important:
|
||||
- Expects HSV input.
|
||||
- Expects input array to be of `dtype` `float`.
|
||||
'''
|
||||
def __init__(self, delta):
|
||||
'''
|
||||
Arguments:
|
||||
delta (int): An integer in the closed interval `[-180, 180]` that determines the hue change, where
|
||||
a change by integer `delta` means a change by `2 * delta` degrees. Read up on the HSV color format
|
||||
if you need more information.
|
||||
'''
|
||||
if not (-180 <= delta <= 180): raise ValueError("`delta` must be in the closed interval `[-180, 180]`.")
|
||||
self.delta = delta
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
image[:, :, 0] = (image[:, :, 0] + self.delta) % 180.0
|
||||
if labels is None:
|
||||
return image
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class RandomHue:
|
||||
'''
|
||||
Randomly changes the hue of HSV images.
|
||||
|
||||
Important:
|
||||
- Expects HSV input.
|
||||
- Expects input array to be of `dtype` `float`.
|
||||
'''
|
||||
def __init__(self, max_delta=18, prob=0.5):
|
||||
'''
|
||||
Arguments:
|
||||
max_delta (int): An integer in the closed interval `[0, 180]` that determines the maximal absolute
|
||||
hue change.
|
||||
prob (float, optional): `(1 - prob)` determines the probability with which the original,
|
||||
unaltered image is returned.
|
||||
'''
|
||||
if not (0 <= max_delta <= 180): raise ValueError("`max_delta` must be in the closed interval `[0, 180]`.")
|
||||
self.max_delta = max_delta
|
||||
self.prob = prob
|
||||
self.change_hue = Hue(delta=0)
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
p = np.random.uniform(0,1)
|
||||
if p >= (1.0-self.prob):
|
||||
self.change_hue.delta = np.random.uniform(-self.max_delta, self.max_delta)
|
||||
return self.change_hue(image, labels)
|
||||
elif labels is None:
|
||||
return image
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class Saturation:
|
||||
'''
|
||||
Changes the saturation of HSV images.
|
||||
|
||||
Important:
|
||||
- Expects HSV input.
|
||||
- Expects input array to be of `dtype` `float`.
|
||||
'''
|
||||
def __init__(self, factor):
|
||||
'''
|
||||
Arguments:
|
||||
factor (float): A float greater than zero that determines saturation change, where
|
||||
values less than one result in less saturation and values greater than one result
|
||||
in more saturation.
|
||||
'''
|
||||
if factor <= 0.0: raise ValueError("It must be `factor > 0`.")
|
||||
self.factor = factor
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
image[:,:,1] = np.clip(image[:,:,1] * self.factor, 0, 255)
|
||||
if labels is None:
|
||||
return image
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class RandomSaturation:
|
||||
'''
|
||||
Randomly changes the saturation of HSV images.
|
||||
|
||||
Important:
|
||||
- Expects HSV input.
|
||||
- Expects input array to be of `dtype` `float`.
|
||||
'''
|
||||
def __init__(self, lower=0.3, upper=2.0, prob=0.5):
|
||||
'''
|
||||
Arguments:
|
||||
lower (float, optional): A float greater than zero, the lower bound for the random
|
||||
saturation change.
|
||||
upper (float, optional): A float greater than zero, the upper bound for the random
|
||||
saturation change. Must be greater than `lower`.
|
||||
prob (float, optional): `(1 - prob)` determines the probability with which the original,
|
||||
unaltered image is returned.
|
||||
'''
|
||||
if lower >= upper: raise ValueError("`upper` must be greater than `lower`.")
|
||||
self.lower = lower
|
||||
self.upper = upper
|
||||
self.prob = prob
|
||||
self.change_saturation = Saturation(factor=1.0)
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
p = np.random.uniform(0,1)
|
||||
if p >= (1.0-self.prob):
|
||||
self.change_saturation.factor = np.random.uniform(self.lower, self.upper)
|
||||
return self.change_saturation(image, labels)
|
||||
elif labels is None:
|
||||
return image
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class Brightness:
|
||||
'''
|
||||
Changes the brightness of RGB images.
|
||||
|
||||
Important:
|
||||
- Expects RGB input.
|
||||
- Expects input array to be of `dtype` `float`.
|
||||
'''
|
||||
def __init__(self, delta):
|
||||
'''
|
||||
Arguments:
|
||||
delta (int): An integer, the amount to add to or subtract from the intensity
|
||||
of every pixel.
|
||||
'''
|
||||
self.delta = delta
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
image = np.clip(image + self.delta, 0, 255)
|
||||
if labels is None:
|
||||
return image
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class RandomBrightness:
|
||||
'''
|
||||
Randomly changes the brightness of RGB images.
|
||||
|
||||
Important:
|
||||
- Expects RGB input.
|
||||
- Expects input array to be of `dtype` `float`.
|
||||
'''
|
||||
def __init__(self, lower=-84, upper=84, prob=0.5):
|
||||
'''
|
||||
Arguments:
|
||||
lower (int, optional): An integer, the lower bound for the random brightness change.
|
||||
upper (int, optional): An integer, the upper bound for the random brightness change.
|
||||
Must be greater than `lower`.
|
||||
prob (float, optional): `(1 - prob)` determines the probability with which the original,
|
||||
unaltered image is returned.
|
||||
'''
|
||||
if lower >= upper: raise ValueError("`upper` must be greater than `lower`.")
|
||||
self.lower = float(lower)
|
||||
self.upper = float(upper)
|
||||
self.prob = prob
|
||||
self.change_brightness = Brightness(delta=0)
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
p = np.random.uniform(0,1)
|
||||
if p >= (1.0-self.prob):
|
||||
self.change_brightness.delta = np.random.uniform(self.lower, self.upper)
|
||||
return self.change_brightness(image, labels)
|
||||
elif labels is None:
|
||||
return image
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class Contrast:
|
||||
'''
|
||||
Changes the contrast of RGB images.
|
||||
|
||||
Important:
|
||||
- Expects RGB input.
|
||||
- Expects input array to be of `dtype` `float`.
|
||||
'''
|
||||
def __init__(self, factor):
|
||||
'''
|
||||
Arguments:
|
||||
factor (float): A float greater than zero that determines contrast change, where
|
||||
values less than one result in less contrast and values greater than one result
|
||||
in more contrast.
|
||||
'''
|
||||
if factor <= 0.0: raise ValueError("It must be `factor > 0`.")
|
||||
self.factor = factor
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
image = np.clip(127.5 + self.factor * (image - 127.5), 0, 255)
|
||||
if labels is None:
|
||||
return image
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class RandomContrast:
|
||||
'''
|
||||
Randomly changes the contrast of RGB images.
|
||||
|
||||
Important:
|
||||
- Expects RGB input.
|
||||
- Expects input array to be of `dtype` `float`.
|
||||
'''
|
||||
def __init__(self, lower=0.5, upper=1.5, prob=0.5):
|
||||
'''
|
||||
Arguments:
|
||||
lower (float, optional): A float greater than zero, the lower bound for the random
|
||||
contrast change.
|
||||
upper (float, optional): A float greater than zero, the upper bound for the random
|
||||
contrast change. Must be greater than `lower`.
|
||||
prob (float, optional): `(1 - prob)` determines the probability with which the original,
|
||||
unaltered image is returned.
|
||||
'''
|
||||
if lower >= upper: raise ValueError("`upper` must be greater than `lower`.")
|
||||
self.lower = lower
|
||||
self.upper = upper
|
||||
self.prob = prob
|
||||
self.change_contrast = Contrast(factor=1.0)
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
p = np.random.uniform(0,1)
|
||||
if p >= (1.0-self.prob):
|
||||
self.change_contrast.factor = np.random.uniform(self.lower, self.upper)
|
||||
return self.change_contrast(image, labels)
|
||||
elif labels is None:
|
||||
return image
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class Gamma:
|
||||
'''
|
||||
Changes the gamma value of RGB images.
|
||||
|
||||
Important: Expects RGB input.
|
||||
'''
|
||||
def __init__(self, gamma):
|
||||
'''
|
||||
Arguments:
|
||||
gamma (float): A float greater than zero that determines gamma change.
|
||||
'''
|
||||
if gamma <= 0.0: raise ValueError("It must be `gamma > 0`.")
|
||||
self.gamma = gamma
|
||||
self.gamma_inv = 1.0 / gamma
|
||||
# Build a lookup table mapping the pixel values [0, 255] to
|
||||
# their adjusted gamma values.
|
||||
self.table = np.array([((i / 255.0) ** self.gamma_inv) * 255 for i in np.arange(0, 256)]).astype("uint8")
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
image = cv2.LUT(image, table)
|
||||
if labels is None:
|
||||
return image
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class RandomGamma:
|
||||
'''
|
||||
Randomly changes the gamma value of RGB images.
|
||||
|
||||
Important: Expects RGB input.
|
||||
'''
|
||||
def __init__(self, lower=0.25, upper=2.0, prob=0.5):
|
||||
'''
|
||||
Arguments:
|
||||
lower (float, optional): A float greater than zero, the lower bound for the random
|
||||
gamma change.
|
||||
upper (float, optional): A float greater than zero, the upper bound for the random
|
||||
gamma change. Must be greater than `lower`.
|
||||
prob (float, optional): `(1 - prob)` determines the probability with which the original,
|
||||
unaltered image is returned.
|
||||
'''
|
||||
if lower >= upper: raise ValueError("`upper` must be greater than `lower`.")
|
||||
self.lower = lower
|
||||
self.upper = upper
|
||||
self.prob = prob
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
p = np.random.uniform(0,1)
|
||||
if p >= (1.0-self.prob):
|
||||
gamma = np.random.uniform(self.lower, self.upper)
|
||||
change_gamma = Gamma(gamma=gamma)
|
||||
return change_gamma(image, labels)
|
||||
elif labels is None:
|
||||
return image
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class HistogramEqualization:
|
||||
'''
|
||||
Performs histogram equalization on HSV images.
|
||||
|
||||
Importat: Expects HSV input.
|
||||
'''
|
||||
def __init__(self):
|
||||
pass
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
image[:,:,2] = cv2.equalizeHist(image[:,:,2])
|
||||
if labels is None:
|
||||
return image
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class RandomHistogramEqualization:
|
||||
'''
|
||||
Randomly performs histogram equalization on HSV images. The randomness only refers
|
||||
to whether or not the equalization is performed.
|
||||
|
||||
Importat: Expects HSV input.
|
||||
'''
|
||||
def __init__(self, prob=0.5):
|
||||
'''
|
||||
Arguments:
|
||||
prob (float, optional): `(1 - prob)` determines the probability with which the original,
|
||||
unaltered image is returned.
|
||||
'''
|
||||
self.prob = prob
|
||||
self.equalize = HistogramEqualization()
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
p = np.random.uniform(0,1)
|
||||
if p >= (1.0-self.prob):
|
||||
return self.equalize(image, labels)
|
||||
elif labels is None:
|
||||
return image
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class ChannelSwap:
|
||||
'''
|
||||
Swaps the channels of images.
|
||||
'''
|
||||
def __init__(self, order):
|
||||
'''
|
||||
Arguments:
|
||||
order (tuple): A tuple of integers that defines the desired channel order
|
||||
of the input images after the channel swap.
|
||||
'''
|
||||
self.order = order
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
image = image[:,:,self.order]
|
||||
if labels is None:
|
||||
return image
|
||||
else:
|
||||
return image, labels
|
||||
|
||||
class RandomChannelSwap:
|
||||
'''
|
||||
Randomly swaps the channels of RGB images.
|
||||
|
||||
Important: Expects RGB input.
|
||||
'''
|
||||
def __init__(self, prob=0.5):
|
||||
'''
|
||||
Arguments:
|
||||
prob (float, optional): `(1 - prob)` determines the probability with which the original,
|
||||
unaltered image is returned.
|
||||
'''
|
||||
self.prob = prob
|
||||
# All possible permutations of the three image channels except the original order.
|
||||
self.permutations = ((0, 2, 1),
|
||||
(1, 0, 2), (1, 2, 0),
|
||||
(2, 0, 1), (2, 1, 0))
|
||||
self.swap_channels = ChannelSwap(order=(0, 1, 2))
|
||||
|
||||
def __call__(self, image, labels=None):
|
||||
p = np.random.uniform(0,1)
|
||||
if p >= (1.0-self.prob):
|
||||
i = np.random.randint(5) # There are 6 possible permutations.
|
||||
self.swap_channels.order = self.permutations[i]
|
||||
return self.swap_channels(image, labels)
|
||||
elif labels is None:
|
||||
return image
|
||||
else:
|
||||
return image, labels
|
||||
0
ssd_keras-master/eval_utils/__init__.py
Normal file
BIN
ssd_keras-master/eval_utils/__pycache__/__init__.cpython-36.pyc
Normal file
906
ssd_keras-master/eval_utils/average_precision_evaluator.py
Normal file
@@ -0,0 +1,906 @@
|
||||
'''
|
||||
An evaluator to compute the Pascal VOC-style mean average precision (both the pre-2010
|
||||
and post-2010 algorithm versions) of a given Keras SSD model on a given dataset.
|
||||
|
||||
Copyright (C) 2018 Pierluigi Ferrari
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
'''
|
||||
|
||||
from __future__ import division
|
||||
import numpy as np
|
||||
from math import ceil
|
||||
from tqdm import trange
|
||||
import sys
|
||||
import warnings
|
||||
|
||||
from data_generator.object_detection_2d_data_generator import DataGenerator
|
||||
from data_generator.object_detection_2d_geometric_ops import Resize
|
||||
from data_generator.object_detection_2d_patch_sampling_ops import RandomPadFixedAR
|
||||
from data_generator.object_detection_2d_photometric_ops import ConvertTo3Channels
|
||||
from ssd_encoder_decoder.ssd_output_decoder import decode_detections
|
||||
from data_generator.object_detection_2d_misc_utils import apply_inverse_transforms
|
||||
|
||||
from bounding_box_utils.bounding_box_utils import iou
|
||||
|
||||
class Evaluator:
|
||||
'''
|
||||
Computes the mean average precision of the given Keras SSD model on the given dataset.
|
||||
|
||||
Can compute the Pascal-VOC-style average precision in both the pre-2010 (k-point sampling)
|
||||
and post-2010 (integration) algorithm versions.
|
||||
|
||||
Optionally also returns the average precisions, precisions, and recalls.
|
||||
|
||||
The algorithm is identical to the official Pascal VOC pre-2010 detection evaluation algorithm
|
||||
in its default settings, but can be cusomized in a number of ways.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
model,
|
||||
n_classes,
|
||||
data_generator,
|
||||
model_mode='inference',
|
||||
pred_format={'class_id': 0, 'conf': 1, 'xmin': 2, 'ymin': 3, 'xmax': 4, 'ymax': 5},
|
||||
gt_format={'class_id': 0, 'xmin': 1, 'ymin': 2, 'xmax': 3, 'ymax': 4}):
|
||||
'''
|
||||
Arguments:
|
||||
model (Keras model): A Keras SSD model object.
|
||||
n_classes (int): The number of positive classes, e.g. 20 for Pascal VOC, 80 for MS COCO.
|
||||
data_generator (DataGenerator): A `DataGenerator` object with the evaluation dataset.
|
||||
model_mode (str, optional): The mode in which the model was created, i.e. 'training', 'inference' or 'inference_fast'.
|
||||
This is needed in order to know whether the model output is already decoded or still needs to be decoded. Refer to
|
||||
the model documentation for the meaning of the individual modes.
|
||||
pred_format (dict, optional): A dictionary that defines which index in the last axis of the model's decoded predictions
|
||||
contains which bounding box coordinate. The dictionary must map the keywords 'class_id', 'conf' (for the confidence),
|
||||
'xmin', 'ymin', 'xmax', and 'ymax' to their respective indices within last axis.
|
||||
gt_format (list, optional): A dictionary that defines which index of a ground truth bounding box contains which of the five
|
||||
items class ID, xmin, ymin, xmax, ymax. The expected strings are 'xmin', 'ymin', 'xmax', 'ymax', 'class_id'.
|
||||
'''
|
||||
|
||||
if not isinstance(data_generator, DataGenerator):
|
||||
warnings.warn("`data_generator` is not a `DataGenerator` object, which will cause undefined behavior.")
|
||||
|
||||
self.model = model
|
||||
self.data_generator = data_generator
|
||||
self.n_classes = n_classes
|
||||
self.model_mode = model_mode
|
||||
self.pred_format = pred_format
|
||||
self.gt_format = gt_format
|
||||
|
||||
# The following lists all contain per-class data, i.e. all list have the length `n_classes + 1`,
|
||||
# where one element is for the background class, i.e. that element is just a dummy entry.
|
||||
self.prediction_results = None
|
||||
self.num_gt_per_class = None
|
||||
self.true_positives = None
|
||||
self.false_positives = None
|
||||
self.cumulative_true_positives = None
|
||||
self.cumulative_false_positives = None
|
||||
self.cumulative_precisions = None # "Cumulative" means that the i-th element in each list represents the precision for the first i highest condidence predictions for that class.
|
||||
self.cumulative_recalls = None # "Cumulative" means that the i-th element in each list represents the recall for the first i highest condidence predictions for that class.
|
||||
self.average_precisions = None
|
||||
self.mean_average_precision = None
|
||||
|
||||
def __call__(self,
|
||||
img_height,
|
||||
img_width,
|
||||
batch_size,
|
||||
data_generator_mode='resize',
|
||||
round_confidences=False,
|
||||
matching_iou_threshold=0.5,
|
||||
border_pixels='include',
|
||||
sorting_algorithm='quicksort',
|
||||
average_precision_mode='sample',
|
||||
num_recall_points=11,
|
||||
ignore_neutral_boxes=True,
|
||||
return_precisions=False,
|
||||
return_recalls=False,
|
||||
return_average_precisions=False,
|
||||
verbose=True,
|
||||
decoding_confidence_thresh=0.01,
|
||||
decoding_iou_threshold=0.45,
|
||||
decoding_top_k=200,
|
||||
decoding_pred_coords='centroids',
|
||||
decoding_normalize_coords=True):
|
||||
'''
|
||||
Computes the mean average precision of the given Keras SSD model on the given dataset.
|
||||
|
||||
Optionally also returns the averages precisions, precisions, and recalls.
|
||||
|
||||
All the individual steps of the overall evaluation algorithm can also be called separately
|
||||
(check out the other methods of this class), but this runs the overall algorithm all at once.
|
||||
|
||||
Arguments:
|
||||
img_height (int): The input image height for the model.
|
||||
img_width (int): The input image width for the model.
|
||||
batch_size (int): The batch size for the evaluation.
|
||||
data_generator_mode (str, optional): Either of 'resize' and 'pad'. If 'resize', the input images will
|
||||
be resized (i.e. warped) to `(img_height, img_width)`. This mode does not preserve the aspect ratios of the images.
|
||||
If 'pad', the input images will be first padded so that they have the aspect ratio defined by `img_height`
|
||||
and `img_width` and then resized to `(img_height, img_width)`. This mode preserves the aspect ratios of the images.
|
||||
round_confidences (int, optional): `False` or an integer that is the number of decimals that the prediction
|
||||
confidences will be rounded to. If `False`, the confidences will not be rounded.
|
||||
matching_iou_threshold (float, optional): A prediction will be considered a true positive if it has a Jaccard overlap
|
||||
of at least `matching_iou_threshold` with any ground truth bounding box of the same class.
|
||||
border_pixels (str, optional): How to treat the border pixels of the bounding boxes.
|
||||
Can be 'include', 'exclude', or 'half'. If 'include', the border pixels belong
|
||||
to the boxes. If 'exclude', the border pixels do not belong to the boxes.
|
||||
If 'half', then one of each of the two horizontal and vertical borders belong
|
||||
to the boxex, but not the other.
|
||||
sorting_algorithm (str, optional): Which sorting algorithm the matching algorithm should use. This argument accepts
|
||||
any valid sorting algorithm for Numpy's `argsort()` function. You will usually want to choose between 'quicksort'
|
||||
(fastest and most memory efficient, but not stable) and 'mergesort' (slight slower and less memory efficient, but stable).
|
||||
The official Matlab evaluation algorithm uses a stable sorting algorithm, so this algorithm is only guaranteed
|
||||
to behave identically if you choose 'mergesort' as the sorting algorithm, but it will almost always behave identically
|
||||
even if you choose 'quicksort' (but no guarantees).
|
||||
average_precision_mode (str, optional): Can be either 'sample' or 'integrate'. In the case of 'sample', the average precision
|
||||
will be computed according to the Pascal VOC formula that was used up until VOC 2009, where the precision will be sampled
|
||||
for `num_recall_points` recall values. In the case of 'integrate', the average precision will be computed according to the
|
||||
Pascal VOC formula that was used from VOC 2010 onward, where the average precision will be computed by numerically integrating
|
||||
over the whole preciscion-recall curve instead of sampling individual points from it. 'integrate' mode is basically just
|
||||
the limit case of 'sample' mode as the number of sample points increases.
|
||||
num_recall_points (int, optional): The number of points to sample from the precision-recall-curve to compute the average
|
||||
precisions. In other words, this is the number of equidistant recall values for which the resulting precision will be
|
||||
computed. 11 points is the value used in the official Pascal VOC 2007 detection evaluation algorithm.
|
||||
ignore_neutral_boxes (bool, optional): In case the data generator provides annotations indicating whether a ground truth
|
||||
bounding box is supposed to either count or be neutral for the evaluation, this argument decides what to do with these
|
||||
annotations. If `False`, even boxes that are annotated as neutral will be counted into the evaluation. If `True`,
|
||||
neutral boxes will be ignored for the evaluation. An example for evaluation-neutrality are the ground truth boxes
|
||||
annotated as "difficult" in the Pascal VOC datasets, which are usually treated as neutral for the evaluation.
|
||||
return_precisions (bool, optional): If `True`, returns a nested list containing the cumulative precisions for each class.
|
||||
return_recalls (bool, optional): If `True`, returns a nested list containing the cumulative recalls for each class.
|
||||
return_average_precisions (bool, optional): If `True`, returns a list containing the average precision for each class.
|
||||
verbose (bool, optional): If `True`, will print out the progress during runtime.
|
||||
decoding_confidence_thresh (float, optional): Only relevant if the model is in 'training' mode.
|
||||
A float in [0,1), the minimum classification confidence in a specific positive class in order to be considered
|
||||
for the non-maximum suppression stage for the respective class. A lower value will result in a larger part of the
|
||||
selection process being done by the non-maximum suppression stage, while a larger value will result in a larger
|
||||
part of the selection process happening in the confidence thresholding stage.
|
||||
decoding_iou_threshold (float, optional): Only relevant if the model is in 'training' mode. A float in [0,1].
|
||||
All boxes with a Jaccard similarity of greater than `iou_threshold` with a locally maximal box will be removed
|
||||
from the set of predictions for a given class, where 'maximal' refers to the box score.
|
||||
decoding_top_k (int, optional): Only relevant if the model is in 'training' mode. The number of highest scoring
|
||||
predictions to be kept for each batch item after the non-maximum suppression stage.
|
||||
decoding_input_coords (str, optional): Only relevant if the model is in 'training' mode. The box coordinate format
|
||||
that the model outputs. Can be either 'centroids' for the format `(cx, cy, w, h)` (box center coordinates, width, and height),
|
||||
'minmax' for the format `(xmin, xmax, ymin, ymax)`, or 'corners' for the format `(xmin, ymin, xmax, ymax)`.
|
||||
decoding_normalize_coords (bool, optional): Only relevant if the model is in 'training' mode. Set to `True` if the model
|
||||
outputs relative coordinates. Do not set this to `True` if the model already outputs absolute coordinates,
|
||||
as that would result in incorrect coordinates.
|
||||
|
||||
Returns:
|
||||
A float, the mean average precision, plus any optional returns specified in the arguments.
|
||||
'''
|
||||
|
||||
#############################################################################################
|
||||
# Predict on the entire dataset.
|
||||
#############################################################################################
|
||||
|
||||
self.predict_on_dataset(img_height=img_height,
|
||||
img_width=img_width,
|
||||
batch_size=batch_size,
|
||||
data_generator_mode=data_generator_mode,
|
||||
decoding_confidence_thresh=decoding_confidence_thresh,
|
||||
decoding_iou_threshold=decoding_iou_threshold,
|
||||
decoding_top_k=decoding_top_k,
|
||||
decoding_pred_coords=decoding_pred_coords,
|
||||
decoding_normalize_coords=decoding_normalize_coords,
|
||||
decoding_border_pixels=border_pixels,
|
||||
round_confidences=round_confidences,
|
||||
verbose=verbose,
|
||||
ret=False)
|
||||
|
||||
#############################################################################################
|
||||
# Get the total number of ground truth boxes for each class.
|
||||
#############################################################################################
|
||||
|
||||
self.get_num_gt_per_class(ignore_neutral_boxes=ignore_neutral_boxes,
|
||||
verbose=False,
|
||||
ret=False)
|
||||
|
||||
#############################################################################################
|
||||
# Match predictions to ground truth boxes for all classes.
|
||||
#############################################################################################
|
||||
|
||||
self.match_predictions(ignore_neutral_boxes=ignore_neutral_boxes,
|
||||
matching_iou_threshold=matching_iou_threshold,
|
||||
border_pixels=border_pixels,
|
||||
sorting_algorithm=sorting_algorithm,
|
||||
verbose=verbose,
|
||||
ret=False)
|
||||
|
||||
#############################################################################################
|
||||
# Compute the cumulative precision and recall for all classes.
|
||||
#############################################################################################
|
||||
|
||||
self.compute_precision_recall(verbose=verbose, ret=False)
|
||||
|
||||
#############################################################################################
|
||||
# Compute the average precision for this class.
|
||||
#############################################################################################
|
||||
|
||||
self.compute_average_precisions(mode=average_precision_mode,
|
||||
num_recall_points=num_recall_points,
|
||||
verbose=verbose,
|
||||
ret=False)
|
||||
|
||||
#############################################################################################
|
||||
# Compute the mean average precision.
|
||||
#############################################################################################
|
||||
|
||||
mean_average_precision = self.compute_mean_average_precision(ret=True)
|
||||
|
||||
#############################################################################################
|
||||
|
||||
# Compile the returns.
|
||||
if return_precisions or return_recalls or return_average_precisions:
|
||||
ret = [mean_average_precision]
|
||||
if return_average_precisions:
|
||||
ret.append(self.average_precisions)
|
||||
if return_precisions:
|
||||
ret.append(self.cumulative_precisions)
|
||||
if return_recalls:
|
||||
ret.append(self.cumulative_recalls)
|
||||
return ret
|
||||
else:
|
||||
return mean_average_precision
|
||||
|
||||
def predict_on_dataset(self,
|
||||
img_height,
|
||||
img_width,
|
||||
batch_size,
|
||||
data_generator_mode='resize',
|
||||
decoding_confidence_thresh=0.01,
|
||||
decoding_iou_threshold=0.45,
|
||||
decoding_top_k=200,
|
||||
decoding_pred_coords='centroids',
|
||||
decoding_normalize_coords=True,
|
||||
decoding_border_pixels='include',
|
||||
round_confidences=False,
|
||||
verbose=True,
|
||||
ret=False):
|
||||
'''
|
||||
Runs predictions for the given model over the entire dataset given by `data_generator`.
|
||||
|
||||
Arguments:
|
||||
img_height (int): The input image height for the model.
|
||||
img_width (int): The input image width for the model.
|
||||
batch_size (int): The batch size for the evaluation.
|
||||
data_generator_mode (str, optional): Either of 'resize' and 'pad'. If 'resize', the input images will
|
||||
be resized (i.e. warped) to `(img_height, img_width)`. This mode does not preserve the aspect ratios of the images.
|
||||
If 'pad', the input images will be first padded so that they have the aspect ratio defined by `img_height`
|
||||
and `img_width` and then resized to `(img_height, img_width)`. This mode preserves the aspect ratios of the images.
|
||||
decoding_confidence_thresh (float, optional): Only relevant if the model is in 'training' mode.
|
||||
A float in [0,1), the minimum classification confidence in a specific positive class in order to be considered
|
||||
for the non-maximum suppression stage for the respective class. A lower value will result in a larger part of the
|
||||
selection process being done by the non-maximum suppression stage, while a larger value will result in a larger
|
||||
part of the selection process happening in the confidence thresholding stage.
|
||||
decoding_iou_threshold (float, optional): Only relevant if the model is in 'training' mode. A float in [0,1].
|
||||
All boxes with a Jaccard similarity of greater than `iou_threshold` with a locally maximal box will be removed
|
||||
from the set of predictions for a given class, where 'maximal' refers to the box score.
|
||||
decoding_top_k (int, optional): Only relevant if the model is in 'training' mode. The number of highest scoring
|
||||
predictions to be kept for each batch item after the non-maximum suppression stage.
|
||||
decoding_input_coords (str, optional): Only relevant if the model is in 'training' mode. The box coordinate format
|
||||
that the model outputs. Can be either 'centroids' for the format `(cx, cy, w, h)` (box center coordinates, width, and height),
|
||||
'minmax' for the format `(xmin, xmax, ymin, ymax)`, or 'corners' for the format `(xmin, ymin, xmax, ymax)`.
|
||||
decoding_normalize_coords (bool, optional): Only relevant if the model is in 'training' mode. Set to `True` if the model
|
||||
outputs relative coordinates. Do not set this to `True` if the model already outputs absolute coordinates,
|
||||
as that would result in incorrect coordinates.
|
||||
round_confidences (int, optional): `False` or an integer that is the number of decimals that the prediction
|
||||
confidences will be rounded to. If `False`, the confidences will not be rounded.
|
||||
verbose (bool, optional): If `True`, will print out the progress during runtime.
|
||||
ret (bool, optional): If `True`, returns the predictions.
|
||||
|
||||
Returns:
|
||||
None by default. Optionally, a nested list containing the predictions for each class.
|
||||
'''
|
||||
|
||||
class_id_pred = self.pred_format['class_id']
|
||||
conf_pred = self.pred_format['conf']
|
||||
xmin_pred = self.pred_format['xmin']
|
||||
ymin_pred = self.pred_format['ymin']
|
||||
xmax_pred = self.pred_format['xmax']
|
||||
ymax_pred = self.pred_format['ymax']
|
||||
|
||||
#############################################################################################
|
||||
# Configure the data generator for the evaluation.
|
||||
#############################################################################################
|
||||
|
||||
convert_to_3_channels = ConvertTo3Channels()
|
||||
resize = Resize(height=img_height,width=img_width, labels_format=self.gt_format)
|
||||
if data_generator_mode == 'resize':
|
||||
transformations = [convert_to_3_channels,
|
||||
resize]
|
||||
elif data_generator_mode == 'pad':
|
||||
random_pad = RandomPadFixedAR(patch_aspect_ratio=img_width/img_height, labels_format=self.gt_format)
|
||||
transformations = [convert_to_3_channels,
|
||||
random_pad,
|
||||
resize]
|
||||
else:
|
||||
raise ValueError("`data_generator_mode` can be either of 'resize' or 'pad', but received '{}'.".format(data_generator_mode))
|
||||
|
||||
# Set the generator parameters.
|
||||
generator = self.data_generator.generate(batch_size=batch_size,
|
||||
shuffle=False,
|
||||
transformations=transformations,
|
||||
label_encoder=None,
|
||||
returns={'processed_images',
|
||||
'image_ids',
|
||||
'evaluation-neutral',
|
||||
'inverse_transform',
|
||||
'original_labels'},
|
||||
keep_images_without_gt=True,
|
||||
degenerate_box_handling='remove')
|
||||
|
||||
# If we don't have any real image IDs, generate pseudo-image IDs.
|
||||
# This is just to make the evaluator compatible both with datasets that do and don't
|
||||
# have image IDs.
|
||||
if self.data_generator.image_ids is None:
|
||||
self.data_generator.image_ids = list(range(self.data_generator.get_dataset_size()))
|
||||
|
||||
#############################################################################################
|
||||
# Predict over all batches of the dataset and store the predictions.
|
||||
#############################################################################################
|
||||
|
||||
# We have to generate a separate results list for each class.
|
||||
results = [list() for _ in range(self.n_classes + 1)]
|
||||
|
||||
# Create a dictionary that maps image IDs to ground truth annotations.
|
||||
# We'll need it below.
|
||||
image_ids_to_labels = {}
|
||||
|
||||
# Compute the number of batches to iterate over the entire dataset.
|
||||
n_images = self.data_generator.get_dataset_size()
|
||||
n_batches = int(ceil(n_images / batch_size))
|
||||
if verbose:
|
||||
print("Number of images in the evaluation dataset: {}".format(n_images))
|
||||
print()
|
||||
tr = trange(n_batches, file=sys.stdout)
|
||||
tr.set_description('Producing predictions batch-wise')
|
||||
else:
|
||||
tr = range(n_batches)
|
||||
|
||||
# Loop over all batches.
|
||||
for j in tr:
|
||||
# Generate batch.
|
||||
batch_X, batch_image_ids, batch_eval_neutral, batch_inverse_transforms, batch_orig_labels = next(generator)
|
||||
# Predict.
|
||||
y_pred = self.model.predict(batch_X)
|
||||
# If the model was created in 'training' mode, the raw predictions need to
|
||||
# be decoded and filtered, otherwise that's already taken care of.
|
||||
if self.model_mode == 'training':
|
||||
# Decode.
|
||||
y_pred = decode_detections(y_pred,
|
||||
confidence_thresh=decoding_confidence_thresh,
|
||||
iou_threshold=decoding_iou_threshold,
|
||||
top_k=decoding_top_k,
|
||||
input_coords=decoding_pred_coords,
|
||||
normalize_coords=decoding_normalize_coords,
|
||||
img_height=img_height,
|
||||
img_width=img_width,
|
||||
border_pixels=decoding_border_pixels)
|
||||
else:
|
||||
# Filter out the all-zeros dummy elements of `y_pred`.
|
||||
y_pred_filtered = []
|
||||
for i in range(len(y_pred)):
|
||||
y_pred_filtered.append(y_pred[i][y_pred[i,:,0] != 0])
|
||||
y_pred = y_pred_filtered
|
||||
# Convert the predicted box coordinates for the original images.
|
||||
y_pred = apply_inverse_transforms(y_pred, batch_inverse_transforms)
|
||||
|
||||
# Iterate over all batch items.
|
||||
for k, batch_item in enumerate(y_pred):
|
||||
|
||||
image_id = batch_image_ids[k]
|
||||
|
||||
for box in batch_item:
|
||||
class_id = int(box[class_id_pred])
|
||||
# Round the box coordinates to reduce the required memory.
|
||||
if round_confidences:
|
||||
confidence = round(box[conf_pred], round_confidences)
|
||||
else:
|
||||
confidence = box[conf_pred]
|
||||
xmin = round(box[xmin_pred], 1)
|
||||
ymin = round(box[ymin_pred], 1)
|
||||
xmax = round(box[xmax_pred], 1)
|
||||
ymax = round(box[ymax_pred], 1)
|
||||
prediction = (image_id, confidence, xmin, ymin, xmax, ymax)
|
||||
# Append the predicted box to the results list for its class.
|
||||
results[class_id].append(prediction)
|
||||
|
||||
self.prediction_results = results
|
||||
|
||||
if ret:
|
||||
return results
|
||||
|
||||
def write_predictions_to_txt(self,
|
||||
classes=None,
|
||||
out_file_prefix='comp3_det_test_',
|
||||
verbose=True):
|
||||
'''
|
||||
Writes the predictions for all classes to separate text files according to the Pascal VOC results format.
|
||||
|
||||
Arguments:
|
||||
classes (list, optional): `None` or a list of strings containing the class names of all classes in the dataset,
|
||||
including some arbitrary name for the background class. This list will be used to name the output text files.
|
||||
The ordering of the names in the list represents the ordering of the classes as they are predicted by the model,
|
||||
i.e. the element with index 3 in this list should correspond to the class with class ID 3 in the model's predictions.
|
||||
If `None`, the output text files will be named by their class IDs.
|
||||
out_file_prefix (str, optional): A prefix for the output text file names. The suffix to each output text file name will
|
||||
be the respective class name followed by the `.txt` file extension. This string is also how you specify the directory
|
||||
in which the results are to be saved.
|
||||
verbose (bool, optional): If `True`, will print out the progress during runtime.
|
||||
|
||||
Returns:
|
||||
None.
|
||||
'''
|
||||
|
||||
if self.prediction_results is None:
|
||||
raise ValueError("There are no prediction results. You must run `predict_on_dataset()` before calling this method.")
|
||||
|
||||
# We generate a separate results file for each class.
|
||||
for class_id in range(1, self.n_classes + 1):
|
||||
|
||||
if verbose:
|
||||
print("Writing results file for class {}/{}.".format(class_id, self.n_classes))
|
||||
|
||||
if classes is None:
|
||||
class_suffix = '{:04d}'.format(class_id)
|
||||
else:
|
||||
class_suffix = classes[class_id]
|
||||
|
||||
results_file = open('{}{}.txt'.format(out_file_prefix, class_suffix), 'w')
|
||||
|
||||
for prediction in self.prediction_results[class_id]:
|
||||
|
||||
prediction_list = list(prediction)
|
||||
prediction_list[0] = '{:06d}'.format(int(prediction_list[0]))
|
||||
prediction_list[1] = round(prediction_list[1], 4)
|
||||
prediction_txt = ' '.join(map(str, prediction_list)) + '\n'
|
||||
results_file.write(prediction_txt)
|
||||
|
||||
results_file.close()
|
||||
|
||||
if verbose:
|
||||
print("All results files saved.")
|
||||
|
||||
def get_num_gt_per_class(self,
|
||||
ignore_neutral_boxes=True,
|
||||
verbose=True,
|
||||
ret=False):
|
||||
'''
|
||||
Counts the number of ground truth boxes for each class across the dataset.
|
||||
|
||||
Arguments:
|
||||
ignore_neutral_boxes (bool, optional): In case the data generator provides annotations indicating whether a ground truth
|
||||
bounding box is supposed to either count or be neutral for the evaluation, this argument decides what to do with these
|
||||
annotations. If `True`, only non-neutral ground truth boxes will be counted, otherwise all ground truth boxes will
|
||||
be counted.
|
||||
verbose (bool, optional): If `True`, will print out the progress during runtime.
|
||||
ret (bool, optional): If `True`, returns the list of counts.
|
||||
|
||||
Returns:
|
||||
None by default. Optionally, a list containing a count of the number of ground truth boxes for each class across the
|
||||
entire dataset.
|
||||
'''
|
||||
|
||||
if self.data_generator.labels is None:
|
||||
raise ValueError("Computing the number of ground truth boxes per class not possible, no ground truth given.")
|
||||
|
||||
num_gt_per_class = np.zeros(shape=(self.n_classes+1), dtype=np.int)
|
||||
|
||||
class_id_index = self.gt_format['class_id']
|
||||
|
||||
ground_truth = self.data_generator.labels
|
||||
|
||||
if verbose:
|
||||
print('Computing the number of positive ground truth boxes per class.')
|
||||
tr = trange(len(ground_truth), file=sys.stdout)
|
||||
else:
|
||||
tr = range(len(ground_truth))
|
||||
|
||||
# Iterate over the ground truth for all images in the dataset.
|
||||
for i in tr:
|
||||
|
||||
boxes = np.asarray(ground_truth[i])
|
||||
|
||||
# Iterate over all ground truth boxes for the current image.
|
||||
for j in range(boxes.shape[0]):
|
||||
|
||||
if ignore_neutral_boxes and not (self.data_generator.eval_neutral is None):
|
||||
if not self.data_generator.eval_neutral[i][j]:
|
||||
# If this box is not supposed to be evaluation-neutral,
|
||||
# increment the counter for the respective class ID.
|
||||
class_id = boxes[j, class_id_index]
|
||||
num_gt_per_class[class_id] += 1
|
||||
else:
|
||||
# If there is no such thing as evaluation-neutral boxes for
|
||||
# our dataset, always increment the counter for the respective
|
||||
# class ID.
|
||||
class_id = boxes[j, class_id_index]
|
||||
num_gt_per_class[class_id] += 1
|
||||
|
||||
self.num_gt_per_class = num_gt_per_class
|
||||
|
||||
if ret:
|
||||
return num_gt_per_class
|
||||
|
||||
def match_predictions(self,
|
||||
ignore_neutral_boxes=True,
|
||||
matching_iou_threshold=0.5,
|
||||
border_pixels='include',
|
||||
sorting_algorithm='quicksort',
|
||||
verbose=True,
|
||||
ret=False):
|
||||
'''
|
||||
Matches predictions to ground truth boxes.
|
||||
|
||||
Note that `predict_on_dataset()` must be called before calling this method.
|
||||
|
||||
Arguments:
|
||||
ignore_neutral_boxes (bool, optional): In case the data generator provides annotations indicating whether a ground truth
|
||||
bounding box is supposed to either count or be neutral for the evaluation, this argument decides what to do with these
|
||||
annotations. If `False`, even boxes that are annotated as neutral will be counted into the evaluation. If `True`,
|
||||
neutral boxes will be ignored for the evaluation. An example for evaluation-neutrality are the ground truth boxes
|
||||
annotated as "difficult" in the Pascal VOC datasets, which are usually treated as neutral for the evaluation.
|
||||
matching_iou_threshold (float, optional): A prediction will be considered a true positive if it has a Jaccard overlap
|
||||
of at least `matching_iou_threshold` with any ground truth bounding box of the same class.
|
||||
border_pixels (str, optional): How to treat the border pixels of the bounding boxes.
|
||||
Can be 'include', 'exclude', or 'half'. If 'include', the border pixels belong
|
||||
to the boxes. If 'exclude', the border pixels do not belong to the boxes.
|
||||
If 'half', then one of each of the two horizontal and vertical borders belong
|
||||
to the boxex, but not the other.
|
||||
sorting_algorithm (str, optional): Which sorting algorithm the matching algorithm should use. This argument accepts
|
||||
any valid sorting algorithm for Numpy's `argsort()` function. You will usually want to choose between 'quicksort'
|
||||
(fastest and most memory efficient, but not stable) and 'mergesort' (slight slower and less memory efficient, but stable).
|
||||
The official Matlab evaluation algorithm uses a stable sorting algorithm, so this algorithm is only guaranteed
|
||||
to behave identically if you choose 'mergesort' as the sorting algorithm, but it will almost always behave identically
|
||||
even if you choose 'quicksort' (but no guarantees).
|
||||
verbose (bool, optional): If `True`, will print out the progress during runtime.
|
||||
ret (bool, optional): If `True`, returns the true and false positives.
|
||||
|
||||
Returns:
|
||||
None by default. Optionally, four nested lists containing the true positives, false positives, cumulative true positives,
|
||||
and cumulative false positives for each class.
|
||||
'''
|
||||
|
||||
if self.data_generator.labels is None:
|
||||
raise ValueError("Matching predictions to ground truth boxes not possible, no ground truth given.")
|
||||
|
||||
if self.prediction_results is None:
|
||||
raise ValueError("There are no prediction results. You must run `predict_on_dataset()` before calling this method.")
|
||||
|
||||
class_id_gt = self.gt_format['class_id']
|
||||
xmin_gt = self.gt_format['xmin']
|
||||
ymin_gt = self.gt_format['ymin']
|
||||
xmax_gt = self.gt_format['xmax']
|
||||
ymax_gt = self.gt_format['ymax']
|
||||
|
||||
# Convert the ground truth to a more efficient format for what we need
|
||||
# to do, which is access ground truth by image ID repeatedly.
|
||||
ground_truth = {}
|
||||
eval_neutral_available = not (self.data_generator.eval_neutral is None) # Whether or not we have annotations to decide whether ground truth boxes should be neutral or not.
|
||||
for i in range(len(self.data_generator.image_ids)):
|
||||
image_id = str(self.data_generator.image_ids[i])
|
||||
labels = self.data_generator.labels[i]
|
||||
if ignore_neutral_boxes and eval_neutral_available:
|
||||
ground_truth[image_id] = (np.asarray(labels), np.asarray(self.data_generator.eval_neutral[i]))
|
||||
else:
|
||||
ground_truth[image_id] = np.asarray(labels)
|
||||
|
||||
true_positives = [[]] # The false positives for each class, sorted by descending confidence.
|
||||
false_positives = [[]] # The true positives for each class, sorted by descending confidence.
|
||||
cumulative_true_positives = [[]]
|
||||
cumulative_false_positives = [[]]
|
||||
|
||||
# Iterate over all classes.
|
||||
for class_id in range(1, self.n_classes + 1):
|
||||
|
||||
predictions = self.prediction_results[class_id]
|
||||
|
||||
# Store the matching results in these lists:
|
||||
true_pos = np.zeros(len(predictions), dtype=np.int) # 1 for every prediction that is a true positive, 0 otherwise
|
||||
false_pos = np.zeros(len(predictions), dtype=np.int) # 1 for every prediction that is a false positive, 0 otherwise
|
||||
|
||||
# In case there are no predictions at all for this class, we're done here.
|
||||
if len(predictions) == 0:
|
||||
print("No predictions for class {}/{}".format(class_id, self.n_classes))
|
||||
true_positives.append(true_pos)
|
||||
false_positives.append(false_pos)
|
||||
continue
|
||||
|
||||
# Convert the predictions list for this class into a structured array so that we can sort it by confidence.
|
||||
|
||||
# Get the number of characters needed to store the image ID strings in the structured array.
|
||||
num_chars_per_image_id = len(str(predictions[0][0])) + 6 # Keep a few characters buffer in case some image IDs are longer than others.
|
||||
# Create the data type for the structured array.
|
||||
preds_data_type = np.dtype([('image_id', 'U{}'.format(num_chars_per_image_id)),
|
||||
('confidence', 'f4'),
|
||||
('xmin', 'f4'),
|
||||
('ymin', 'f4'),
|
||||
('xmax', 'f4'),
|
||||
('ymax', 'f4')])
|
||||
# Create the structured array
|
||||
predictions = np.array(predictions, dtype=preds_data_type)
|
||||
|
||||
# Sort the detections by decreasing confidence.
|
||||
descending_indices = np.argsort(-predictions['confidence'], kind=sorting_algorithm)
|
||||
predictions_sorted = predictions[descending_indices]
|
||||
|
||||
if verbose:
|
||||
tr = trange(len(predictions), file=sys.stdout)
|
||||
tr.set_description("Matching predictions to ground truth, class {}/{}.".format(class_id, self.n_classes))
|
||||
else:
|
||||
tr = range(len(predictions.shape))
|
||||
|
||||
# Keep track of which ground truth boxes were already matched to a detection.
|
||||
gt_matched = {}
|
||||
|
||||
# Iterate over all predictions.
|
||||
for i in tr:
|
||||
|
||||
prediction = predictions_sorted[i]
|
||||
image_id = prediction['image_id']
|
||||
pred_box = np.asarray(list(prediction[['xmin', 'ymin', 'xmax', 'ymax']])) # Convert the structured array element to a regular array.
|
||||
|
||||
# Get the relevant ground truth boxes for this prediction,
|
||||
# i.e. all ground truth boxes that match the prediction's
|
||||
# image ID and class ID.
|
||||
|
||||
# The ground truth could either be a tuple with `(ground_truth_boxes, eval_neutral_boxes)`
|
||||
# or only `ground_truth_boxes`.
|
||||
if ignore_neutral_boxes and eval_neutral_available:
|
||||
gt, eval_neutral = ground_truth[image_id]
|
||||
else:
|
||||
gt = ground_truth[image_id]
|
||||
gt = np.asarray(gt)
|
||||
class_mask = gt[:,class_id_gt] == class_id
|
||||
gt = gt[class_mask]
|
||||
if ignore_neutral_boxes and eval_neutral_available:
|
||||
eval_neutral = eval_neutral[class_mask]
|
||||
|
||||
if gt.size == 0:
|
||||
# If the image doesn't contain any objects of this class,
|
||||
# the prediction becomes a false positive.
|
||||
false_pos[i] = 1
|
||||
continue
|
||||
|
||||
# Compute the IoU of this prediction with all ground truth boxes of the same class.
|
||||
overlaps = iou(boxes1=gt[:,[xmin_gt, ymin_gt, xmax_gt, ymax_gt]],
|
||||
boxes2=pred_box,
|
||||
coords='corners',
|
||||
mode='element-wise',
|
||||
border_pixels=border_pixels)
|
||||
|
||||
# For each detection, match the ground truth box with the highest overlap.
|
||||
# It's possible that the same ground truth box will be matched to multiple
|
||||
# detections.
|
||||
gt_match_index = np.argmax(overlaps)
|
||||
gt_match_overlap = overlaps[gt_match_index]
|
||||
|
||||
if gt_match_overlap < matching_iou_threshold:
|
||||
# False positive, IoU threshold violated:
|
||||
# Those predictions whose matched overlap is below the threshold become
|
||||
# false positives.
|
||||
false_pos[i] = 1
|
||||
else:
|
||||
if not (ignore_neutral_boxes and eval_neutral_available) or (eval_neutral[gt_match_index] == False):
|
||||
# If this is not a ground truth that is supposed to be evaluation-neutral
|
||||
# (i.e. should be skipped for the evaluation) or if we don't even have the
|
||||
# concept of neutral boxes.
|
||||
if not (image_id in gt_matched):
|
||||
# True positive:
|
||||
# If the matched ground truth box for this prediction hasn't been matched to a
|
||||
# different prediction already, we have a true positive.
|
||||
true_pos[i] = 1
|
||||
gt_matched[image_id] = np.zeros(shape=(gt.shape[0]), dtype=np.bool)
|
||||
gt_matched[image_id][gt_match_index] = True
|
||||
elif not gt_matched[image_id][gt_match_index]:
|
||||
# True positive:
|
||||
# If the matched ground truth box for this prediction hasn't been matched to a
|
||||
# different prediction already, we have a true positive.
|
||||
true_pos[i] = 1
|
||||
gt_matched[image_id][gt_match_index] = True
|
||||
else:
|
||||
# False positive, duplicate detection:
|
||||
# If the matched ground truth box for this prediction has already been matched
|
||||
# to a different prediction previously, it is a duplicate detection for an
|
||||
# already detected object, which counts as a false positive.
|
||||
false_pos[i] = 1
|
||||
|
||||
true_positives.append(true_pos)
|
||||
false_positives.append(false_pos)
|
||||
|
||||
cumulative_true_pos = np.cumsum(true_pos) # Cumulative sums of the true positives
|
||||
cumulative_false_pos = np.cumsum(false_pos) # Cumulative sums of the false positives
|
||||
|
||||
cumulative_true_positives.append(cumulative_true_pos)
|
||||
cumulative_false_positives.append(cumulative_false_pos)
|
||||
|
||||
self.true_positives = true_positives
|
||||
self.false_positives = false_positives
|
||||
self.cumulative_true_positives = cumulative_true_positives
|
||||
self.cumulative_false_positives = cumulative_false_positives
|
||||
|
||||
if ret:
|
||||
return true_positives, false_positives, cumulative_true_positives, cumulative_false_positives
|
||||
|
||||
def compute_precision_recall(self, verbose=True, ret=False):
|
||||
'''
|
||||
Computes the precisions and recalls for all classes.
|
||||
|
||||
Note that `match_predictions()` must be called before calling this method.
|
||||
|
||||
Arguments:
|
||||
verbose (bool, optional): If `True`, will print out the progress during runtime.
|
||||
ret (bool, optional): If `True`, returns the precisions and recalls.
|
||||
|
||||
Returns:
|
||||
None by default. Optionally, two nested lists containing the cumulative precisions and recalls for each class.
|
||||
'''
|
||||
|
||||
if (self.cumulative_true_positives is None) or (self.cumulative_false_positives is None):
|
||||
raise ValueError("True and false positives not available. You must run `match_predictions()` before you call this method.")
|
||||
|
||||
if (self.num_gt_per_class is None):
|
||||
raise ValueError("Number of ground truth boxes per class not available. You must run `get_num_gt_per_class()` before you call this method.")
|
||||
|
||||
cumulative_precisions = [[]]
|
||||
cumulative_recalls = [[]]
|
||||
|
||||
# Iterate over all classes.
|
||||
for class_id in range(1, self.n_classes + 1):
|
||||
|
||||
if verbose:
|
||||
print("Computing precisions and recalls, class {}/{}".format(class_id, self.n_classes))
|
||||
|
||||
tp = self.cumulative_true_positives[class_id]
|
||||
fp = self.cumulative_false_positives[class_id]
|
||||
|
||||
|
||||
cumulative_precision = np.where(tp + fp > 0, tp / (tp + fp), 0) # 1D array with shape `(num_predictions,)`
|
||||
cumulative_recall = tp / self.num_gt_per_class[class_id] # 1D array with shape `(num_predictions,)`
|
||||
|
||||
cumulative_precisions.append(cumulative_precision)
|
||||
cumulative_recalls.append(cumulative_recall)
|
||||
|
||||
self.cumulative_precisions = cumulative_precisions
|
||||
self.cumulative_recalls = cumulative_recalls
|
||||
|
||||
if ret:
|
||||
return cumulative_precisions, cumulative_recalls
|
||||
|
||||
def compute_average_precisions(self, mode='sample', num_recall_points=11, verbose=True, ret=False):
|
||||
'''
|
||||
Computes the average precision for each class.
|
||||
|
||||
Can compute the Pascal-VOC-style average precision in both the pre-2010 (k-point sampling)
|
||||
and post-2010 (integration) algorithm versions.
|
||||
|
||||
Note that `compute_precision_recall()` must be called before calling this method.
|
||||
|
||||
Arguments:
|
||||
mode (str, optional): Can be either 'sample' or 'integrate'. In the case of 'sample', the average precision will be computed
|
||||
according to the Pascal VOC formula that was used up until VOC 2009, where the precision will be sampled for `num_recall_points`
|
||||
recall values. In the case of 'integrate', the average precision will be computed according to the Pascal VOC formula that
|
||||
was used from VOC 2010 onward, where the average precision will be computed by numerically integrating over the whole
|
||||
preciscion-recall curve instead of sampling individual points from it. 'integrate' mode is basically just the limit case
|
||||
of 'sample' mode as the number of sample points increases. For details, see the references below.
|
||||
num_recall_points (int, optional): Only relevant if mode is 'sample'. The number of points to sample from the precision-recall-curve
|
||||
to compute the average precisions. In other words, this is the number of equidistant recall values for which the resulting
|
||||
precision will be computed. 11 points is the value used in the official Pascal VOC pre-2010 detection evaluation algorithm.
|
||||
verbose (bool, optional): If `True`, will print out the progress during runtime.
|
||||
ret (bool, optional): If `True`, returns the average precisions.
|
||||
|
||||
Returns:
|
||||
None by default. Optionally, a list containing average precision for each class.
|
||||
|
||||
References:
|
||||
http://host.robots.ox.ac.uk/pascal/VOC/voc2012/htmldoc/devkit_doc.html#sec:ap
|
||||
'''
|
||||
|
||||
if (self.cumulative_precisions is None) or (self.cumulative_recalls is None):
|
||||
raise ValueError("Precisions and recalls not available. You must run `compute_precision_recall()` before you call this method.")
|
||||
|
||||
if not (mode in {'sample', 'integrate'}):
|
||||
raise ValueError("`mode` can be either 'sample' or 'integrate', but received '{}'".format(mode))
|
||||
|
||||
average_precisions = [0.0]
|
||||
|
||||
# Iterate over all classes.
|
||||
for class_id in range(1, self.n_classes + 1):
|
||||
|
||||
if verbose:
|
||||
print("Computing average precision, class {}/{}".format(class_id, self.n_classes))
|
||||
|
||||
cumulative_precision = self.cumulative_precisions[class_id]
|
||||
cumulative_recall = self.cumulative_recalls[class_id]
|
||||
average_precision = 0.0
|
||||
|
||||
if mode == 'sample':
|
||||
|
||||
for t in np.linspace(start=0, stop=1, num=num_recall_points, endpoint=True):
|
||||
|
||||
cum_prec_recall_greater_t = cumulative_precision[cumulative_recall >= t]
|
||||
|
||||
if cum_prec_recall_greater_t.size == 0:
|
||||
precision = 0.0
|
||||
else:
|
||||
precision = np.amax(cum_prec_recall_greater_t)
|
||||
|
||||
average_precision += precision
|
||||
|
||||
average_precision /= num_recall_points
|
||||
|
||||
elif mode == 'integrate':
|
||||
|
||||
# We will compute the precision at all unique recall values.
|
||||
unique_recalls, unique_recall_indices, unique_recall_counts = np.unique(cumulative_recall, return_index=True, return_counts=True)
|
||||
|
||||
# Store the maximal precision for each recall value and the absolute difference
|
||||
# between any two unique recal values in the lists below. The products of these
|
||||
# two nummbers constitute the rectangular areas whose sum will be our numerical
|
||||
# integral.
|
||||
maximal_precisions = np.zeros_like(unique_recalls)
|
||||
recall_deltas = np.zeros_like(unique_recalls)
|
||||
|
||||
# Iterate over all unique recall values in reverse order. This saves a lot of computation:
|
||||
# For each unique recall value `r`, we want to get the maximal precision value obtained
|
||||
# for any recall value `r* >= r`. Once we know the maximal precision for the last `k` recall
|
||||
# values after a given iteration, then in the next iteration, in order compute the maximal
|
||||
# precisions for the last `l > k` recall values, we only need to compute the maximal precision
|
||||
# for `l - k` recall values and then take the maximum between that and the previously computed
|
||||
# maximum instead of computing the maximum over all `l` values.
|
||||
# We skip the very last recall value, since the precision after between the last recall value
|
||||
# recall 1.0 is defined to be zero.
|
||||
for i in range(len(unique_recalls)-2, -1, -1):
|
||||
begin = unique_recall_indices[i]
|
||||
end = unique_recall_indices[i + 1]
|
||||
# When computing the maximal precisions, use the maximum of the previous iteration to
|
||||
# avoid unnecessary repeated computation over the same precision values.
|
||||
# The maximal precisions are the heights of the rectangle areas of our integral under
|
||||
# the precision-recall curve.
|
||||
maximal_precisions[i] = np.maximum(np.amax(cumulative_precision[begin:end]), maximal_precisions[i + 1])
|
||||
# The differences between two adjacent recall values are the widths of our rectangle areas.
|
||||
recall_deltas[i] = unique_recalls[i + 1] - unique_recalls[i]
|
||||
|
||||
average_precision = np.sum(maximal_precisions * recall_deltas)
|
||||
|
||||
average_precisions.append(average_precision)
|
||||
|
||||
self.average_precisions = average_precisions
|
||||
|
||||
if ret:
|
||||
return average_precisions
|
||||
|
||||
def compute_mean_average_precision(self, ret=True):
|
||||
'''
|
||||
Computes the mean average precision over all classes.
|
||||
|
||||
Note that `compute_average_precisions()` must be called before calling this method.
|
||||
|
||||
Arguments:
|
||||
ret (bool, optional): If `True`, returns the mean average precision.
|
||||
|
||||
Returns:
|
||||
A float, the mean average precision, by default. Optionally, None.
|
||||
'''
|
||||
|
||||
if self.average_precisions is None:
|
||||
raise ValueError("Average precisions not available. You must run `compute_average_precisions()` before you call this method.")
|
||||
|
||||
mean_average_precision = np.average(self.average_precisions[1:]) # The first element is for the background class, so skip it.
|
||||
self.mean_average_precision = mean_average_precision
|
||||
|
||||
if ret:
|
||||
return mean_average_precision
|
||||
200
ssd_keras-master/eval_utils/coco_utils.py
Normal file
@@ -0,0 +1,200 @@
|
||||
'''
|
||||
A few utilities that are useful when working with the MS COCO datasets.
|
||||
|
||||
Copyright (C) 2018 Pierluigi Ferrari
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
'''
|
||||
|
||||
import json
|
||||
from tqdm import trange
|
||||
from math import ceil
|
||||
import sys
|
||||
|
||||
from data_generator.object_detection_2d_geometric_ops import Resize
|
||||
from data_generator.object_detection_2d_patch_sampling_ops import RandomPadFixedAR
|
||||
from data_generator.object_detection_2d_photometric_ops import ConvertTo3Channels
|
||||
from ssd_encoder_decoder.ssd_output_decoder import decode_detections
|
||||
from data_generator.object_detection_2d_misc_utils import apply_inverse_transforms
|
||||
|
||||
def get_coco_category_maps(annotations_file):
|
||||
'''
|
||||
Builds dictionaries that map between MS COCO category IDs, transformed category IDs, and category names.
|
||||
The original MS COCO category IDs are not consecutive unfortunately: The 80 category IDs are spread
|
||||
across the integers 1 through 90 with some integers skipped. Since we usually use a one-hot
|
||||
class representation in neural networks, we need to map these non-consecutive original COCO category
|
||||
IDs (let's call them 'cats') to consecutive category IDs (let's call them 'classes').
|
||||
|
||||
Arguments:
|
||||
annotations_file (str): The filepath to any MS COCO annotations JSON file.
|
||||
|
||||
Returns:
|
||||
1) cats_to_classes: A dictionary that maps between the original (keys) and the transformed category IDs (values).
|
||||
2) classes_to_cats: A dictionary that maps between the transformed (keys) and the original category IDs (values).
|
||||
3) cats_to_names: A dictionary that maps between original category IDs (keys) and the respective category names (values).
|
||||
4) classes_to_names: A list of the category names (values) with their indices representing the transformed IDs.
|
||||
'''
|
||||
with open(annotations_file, 'r') as f:
|
||||
annotations = json.load(f)
|
||||
cats_to_classes = {}
|
||||
classes_to_cats = {}
|
||||
cats_to_names = {}
|
||||
classes_to_names = []
|
||||
classes_to_names.append('background') # Need to add the background class first so that the indexing is right.
|
||||
for i, cat in enumerate(annotations['categories']):
|
||||
cats_to_classes[cat['id']] = i + 1
|
||||
classes_to_cats[i + 1] = cat['id']
|
||||
cats_to_names[cat['id']] = cat['name']
|
||||
classes_to_names.append(cat['name'])
|
||||
|
||||
return cats_to_classes, classes_to_cats, cats_to_names, classes_to_names
|
||||
|
||||
def predict_all_to_json(out_file,
|
||||
model,
|
||||
img_height,
|
||||
img_width,
|
||||
classes_to_cats,
|
||||
data_generator,
|
||||
batch_size,
|
||||
data_generator_mode='resize',
|
||||
model_mode='training',
|
||||
confidence_thresh=0.01,
|
||||
iou_threshold=0.45,
|
||||
top_k=200,
|
||||
pred_coords='centroids',
|
||||
normalize_coords=True):
|
||||
'''
|
||||
Runs detection predictions over the whole dataset given a model and saves them in a JSON file
|
||||
in the MS COCO detection results format.
|
||||
|
||||
Arguments:
|
||||
out_file (str): The file name (full path) under which to save the results JSON file.
|
||||
model (Keras model): A Keras SSD model object.
|
||||
img_height (int): The input image height for the model.
|
||||
img_width (int): The input image width for the model.
|
||||
classes_to_cats (dict): A dictionary that maps the consecutive class IDs predicted by the model
|
||||
to the non-consecutive original MS COCO category IDs.
|
||||
data_generator (DataGenerator): A `DataGenerator` object with the evaluation dataset.
|
||||
batch_size (int): The batch size for the evaluation.
|
||||
data_generator_mode (str, optional): Either of 'resize' or 'pad'. If 'resize', the input images will
|
||||
be resized (i.e. warped) to `(img_height, img_width)`. This mode does not preserve the aspect ratios of the images.
|
||||
If 'pad', the input images will be first padded so that they have the aspect ratio defined by `img_height`
|
||||
and `img_width` and then resized to `(img_height, img_width)`. This mode preserves the aspect ratios of the images.
|
||||
model_mode (str, optional): The mode in which the model was created, i.e. 'training', 'inference' or 'inference_fast'.
|
||||
This is needed in order to know whether the model output is already decoded or still needs to be decoded. Refer to
|
||||
the model documentation for the meaning of the individual modes.
|
||||
confidence_thresh (float, optional): A float in [0,1), the minimum classification confidence in a specific
|
||||
positive class in order to be considered for the non-maximum suppression stage for the respective class.
|
||||
A lower value will result in a larger part of the selection process being done by the non-maximum suppression
|
||||
stage, while a larger value will result in a larger part of the selection process happening in the confidence
|
||||
thresholding stage.
|
||||
iou_threshold (float, optional): A float in [0,1]. All boxes with a Jaccard similarity of greater than `iou_threshold`
|
||||
with a locally maximal box will be removed from the set of predictions for a given class, where 'maximal' refers
|
||||
to the box score.
|
||||
top_k (int, optional): The number of highest scoring predictions to be kept for each batch item after the
|
||||
non-maximum suppression stage. Defaults to 200, following the paper.
|
||||
input_coords (str, optional): The box coordinate format that the model outputs. Can be either 'centroids'
|
||||
for the format `(cx, cy, w, h)` (box center coordinates, width, and height), 'minmax' for the format
|
||||
`(xmin, xmax, ymin, ymax)`, or 'corners' for the format `(xmin, ymin, xmax, ymax)`.
|
||||
normalize_coords (bool, optional): Set to `True` if the model outputs relative coordinates (i.e. coordinates in [0,1])
|
||||
and you wish to transform these relative coordinates back to absolute coordinates. If the model outputs
|
||||
relative coordinates, but you do not want to convert them back to absolute coordinates, set this to `False`.
|
||||
Do not set this to `True` if the model already outputs absolute coordinates, as that would result in incorrect
|
||||
coordinates. Requires `img_height` and `img_width` if set to `True`.
|
||||
|
||||
Returns:
|
||||
None.
|
||||
'''
|
||||
|
||||
convert_to_3_channels = ConvertTo3Channels()
|
||||
resize = Resize(height=img_height,width=img_width)
|
||||
if data_generator_mode == 'resize':
|
||||
transformations = [convert_to_3_channels,
|
||||
resize]
|
||||
elif data_generator_mode == 'pad':
|
||||
random_pad = RandomPadFixedAR(patch_aspect_ratio=img_width/img_height, clip_boxes=False)
|
||||
transformations = [convert_to_3_channels,
|
||||
random_pad,
|
||||
resize]
|
||||
else:
|
||||
raise ValueError("Unexpected argument value: `data_generator_mode` can be either of 'resize' or 'pad', but received '{}'.".format(data_generator_mode))
|
||||
|
||||
# Set the generator parameters.
|
||||
generator = data_generator.generate(batch_size=batch_size,
|
||||
shuffle=False,
|
||||
transformations=transformations,
|
||||
label_encoder=None,
|
||||
returns={'processed_images',
|
||||
'image_ids',
|
||||
'inverse_transform'},
|
||||
keep_images_without_gt=True)
|
||||
# Put the results in this list.
|
||||
results = []
|
||||
# Compute the number of batches to iterate over the entire dataset.
|
||||
n_images = data_generator.get_dataset_size()
|
||||
print("Number of images in the evaluation dataset: {}".format(n_images))
|
||||
n_batches = int(ceil(n_images / batch_size))
|
||||
# Loop over all batches.
|
||||
tr = trange(n_batches, file=sys.stdout)
|
||||
tr.set_description('Producing results file')
|
||||
for i in tr:
|
||||
# Generate batch.
|
||||
batch_X, batch_image_ids, batch_inverse_transforms = next(generator)
|
||||
# Predict.
|
||||
y_pred = model.predict(batch_X)
|
||||
# If the model was created in 'training' mode, the raw predictions need to
|
||||
# be decoded and filtered, otherwise that's already taken care of.
|
||||
if model_mode == 'training':
|
||||
# Decode.
|
||||
y_pred = decode_detections(y_pred,
|
||||
confidence_thresh=confidence_thresh,
|
||||
iou_threshold=iou_threshold,
|
||||
top_k=top_k,
|
||||
input_coords=pred_coords,
|
||||
normalize_coords=normalize_coords,
|
||||
img_height=img_height,
|
||||
img_width=img_width)
|
||||
else:
|
||||
# Filter out the all-zeros dummy elements of `y_pred`.
|
||||
y_pred_filtered = []
|
||||
for i in range(len(y_pred)):
|
||||
y_pred_filtered.append(y_pred[i][y_pred[i,:,0] != 0])
|
||||
y_pred = y_pred_filtered
|
||||
# Convert the predicted box coordinates for the original images.
|
||||
y_pred = apply_inverse_transforms(y_pred, batch_inverse_transforms)
|
||||
|
||||
# Convert each predicted box into the results format.
|
||||
for k, batch_item in enumerate(y_pred):
|
||||
for box in batch_item:
|
||||
class_id = box[0]
|
||||
# Transform the consecutive class IDs back to the original COCO category IDs.
|
||||
cat_id = classes_to_cats[class_id]
|
||||
# Round the box coordinates to reduce the JSON file size.
|
||||
xmin = float(round(box[2], 1))
|
||||
ymin = float(round(box[3], 1))
|
||||
xmax = float(round(box[4], 1))
|
||||
ymax = float(round(box[5], 1))
|
||||
width = xmax - xmin
|
||||
height = ymax - ymin
|
||||
bbox = [xmin, ymin, width, height]
|
||||
result = {}
|
||||
result['image_id'] = batch_image_ids[k]
|
||||
result['category_id'] = cat_id
|
||||
result['score'] = float(round(box[1], 3))
|
||||
result['bbox'] = bbox
|
||||
results.append(result)
|
||||
|
||||
with open(out_file, 'w') as f:
|
||||
json.dump(results, f)
|
||||
|
||||
print("Prediction results saved in '{}'".format(out_file))
|
||||
119
ssd_keras-master/evaluate.py
Normal file
@@ -0,0 +1,119 @@
|
||||
from keras import backend as K
|
||||
from keras.models import load_model
|
||||
from keras.optimizers import Adam
|
||||
#from scipy.misc import imread
|
||||
import numpy as np
|
||||
from matplotlib import pyplot as plt
|
||||
import argparse
|
||||
import json
|
||||
|
||||
from models.keras_ssd300 import ssd_300
|
||||
from keras_loss_function.keras_ssd_loss import SSDLoss
|
||||
from keras_layers.keras_layer_AnchorBoxes import AnchorBoxes
|
||||
from keras_layers.keras_layer_DecodeDetections import DecodeDetections
|
||||
from keras_layers.keras_layer_DecodeDetectionsFast import DecodeDetectionsFast
|
||||
from keras_layers.keras_layer_L2Normalization import L2Normalization
|
||||
from data_generator.object_detection_2d_data_generator import DataGenerator
|
||||
from eval_utils.average_precision_evaluator import Evaluator
|
||||
|
||||
def _main_(args):
|
||||
|
||||
config_path = args.conf
|
||||
|
||||
with open(config_path) as config_buffer:
|
||||
config = json.loads(config_buffer.read())
|
||||
|
||||
###############################
|
||||
# Parse the annotations
|
||||
###############################
|
||||
path_imgs_test = config['test']['test_image_folder']
|
||||
path_anns_test = config['test']['test_annot_folder']
|
||||
labels = config['model']['labels']
|
||||
categories = {}
|
||||
#categories = {"Razor": 1, "Gun": 2, "Knife": 3, "Shuriken": 4} #la categoría 0 es la background
|
||||
for i in range(len(labels)): categories[labels[i]] = i+1
|
||||
print('\nTraining on: \t' + str(categories) + '\n')
|
||||
|
||||
img_height = config['model']['input'] # Height of the model input images
|
||||
img_width = config['model']['input'] # Width of the model input images
|
||||
img_channels = 3 # Number of color channels of the model input images
|
||||
n_classes = len(labels) # Number of positive classes, e.g. 20 for Pascal VOC, 80 for MS COCO
|
||||
classes = ['background'] + labels
|
||||
|
||||
model_mode = 'training'
|
||||
# TODO: Set the path to the `.h5` file of the model to be loaded.
|
||||
model_path = config['train']['saved_weights_name']
|
||||
|
||||
# We need to create an SSDLoss object in order to pass that to the model loader.
|
||||
ssd_loss = SSDLoss(neg_pos_ratio=3, alpha=1.0)
|
||||
|
||||
K.clear_session() # Clear previous models from memory.
|
||||
|
||||
model = load_model(model_path, custom_objects={'AnchorBoxes': AnchorBoxes,
|
||||
'L2Normalization': L2Normalization,
|
||||
'DecodeDetections': DecodeDetections,
|
||||
'compute_loss': ssd_loss.compute_loss})
|
||||
|
||||
test_dataset = DataGenerator()
|
||||
test_dataset.parse_xml(images_dirs= [config['test']['test_image_folder']],
|
||||
image_set_filenames=[config['test']['test_image_set_filename']],
|
||||
annotations_dirs=[config['test']['test_annot_folder']],
|
||||
classes=classes,
|
||||
include_classes='all',
|
||||
exclude_truncated=False,
|
||||
exclude_difficult=False,
|
||||
ret=False)
|
||||
evaluator = Evaluator(model=model,
|
||||
n_classes=n_classes,
|
||||
data_generator=test_dataset,
|
||||
model_mode=model_mode)
|
||||
|
||||
results = evaluator(img_height=img_height,
|
||||
img_width=img_width,
|
||||
batch_size=4,
|
||||
data_generator_mode='resize',
|
||||
round_confidences=False,
|
||||
matching_iou_threshold=0.5,
|
||||
border_pixels='include',
|
||||
sorting_algorithm='quicksort',
|
||||
average_precision_mode='sample',
|
||||
num_recall_points=11,
|
||||
ignore_neutral_boxes=True,
|
||||
return_precisions=True,
|
||||
return_recalls=True,
|
||||
return_average_precisions=True,
|
||||
verbose=True)
|
||||
|
||||
mean_average_precision, average_precisions, precisions, recalls = results
|
||||
|
||||
total_instances = []
|
||||
precisions = []
|
||||
for i in range(1, len(average_precisions)):
|
||||
print('{:.0f} instances of class'.format(len(recalls[i])),
|
||||
classes[i], 'with average precision: {:.4f}'.format(average_precisions[i]))
|
||||
total_instances.append(len(recalls[i]))
|
||||
precisions.append(average_precisions[i])
|
||||
|
||||
if sum(total_instances) == 0:
|
||||
print('No test instances found.')
|
||||
return
|
||||
|
||||
print('mAP using the weighted average of precisions among classes: {:.4f}'.format(sum([a * b for a, b in zip(total_instances, precisions)]) / sum(total_instances)))
|
||||
print('mAP: {:.4f}'.format(sum(precisions) / sum(x > 0 for x in total_instances)))
|
||||
|
||||
for i in range(1, len(average_precisions)):
|
||||
print("{:<14}{:<6}{}".format(classes[i], 'AP', round(average_precisions[i], 3)))
|
||||
print()
|
||||
print("{:<14}{:<6}{}".format('','mAP', round(mean_average_precision, 3)))
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
argparser = argparse.ArgumentParser(description='train and evaluate ssd model on any dataset')
|
||||
argparser.add_argument('-c', '--conf', help='path to configuration file')
|
||||
|
||||
args = argparser.parse_args()
|
||||
_main_(args)
|
||||
BIN
ssd_keras-master/examples/ssd300_pascalVOC_pred_01.png
Normal file
|
After Width: | Height: | Size: 346 KiB |
BIN
ssd_keras-master/examples/ssd300_pascalVOC_pred_02.png
Normal file
|
After Width: | Height: | Size: 196 KiB |
BIN
ssd_keras-master/examples/ssd300_pascalVOC_pred_03.png
Normal file
|
After Width: | Height: | Size: 151 KiB |
BIN
ssd_keras-master/examples/ssd300_pascalVOC_pred_04.png
Normal file
|
After Width: | Height: | Size: 273 KiB |
BIN
ssd_keras-master/examples/ssd300_pascalVOC_pred_05.png
Normal file
|
After Width: | Height: | Size: 171 KiB |
BIN
ssd_keras-master/examples/ssd300_pascalVOC_pred_06.png
Normal file
|
After Width: | Height: | Size: 238 KiB |
BIN
ssd_keras-master/examples/ssd300_pascalVOC_pred_07.png
Normal file
|
After Width: | Height: | Size: 209 KiB |
BIN
ssd_keras-master/examples/ssd300_pascalVOC_pred_08.png
Normal file
|
After Width: | Height: | Size: 162 KiB |
BIN
ssd_keras-master/examples/ssd300_pascalVOC_pred_09.png
Normal file
|
After Width: | Height: | Size: 212 KiB |
BIN
ssd_keras-master/examples/ssd7_udacity_traffic_pred_01.png
Normal file
|
After Width: | Height: | Size: 240 KiB |
BIN
ssd_keras-master/examples/ssd7_udacity_traffic_pred_02.png
Normal file
|
After Width: | Height: | Size: 278 KiB |
BIN
ssd_keras-master/examples/ssd7_udacity_traffic_pred_03.png
Normal file
|
After Width: | Height: | Size: 325 KiB |
BIN
ssd_keras-master/examples/ssd7_udacity_traffic_pred_04.png
Normal file
|
After Width: | Height: | Size: 272 KiB |
BIN
ssd_keras-master/examples/ssd7_udacity_traffic_pred_05.png
Normal file
|
After Width: | Height: | Size: 289 KiB |
|
After Width: | Height: | Size: 307 KiB |
|
After Width: | Height: | Size: 309 KiB |
|
After Width: | Height: | Size: 339 KiB |
|
After Width: | Height: | Size: 341 KiB |
|
After Width: | Height: | Size: 277 KiB |
|
After Width: | Height: | Size: 277 KiB |
|
After Width: | Height: | Size: 353 KiB |
|
After Width: | Height: | Size: 352 KiB |
|
After Width: | Height: | Size: 517 KiB |
|
After Width: | Height: | Size: 520 KiB |
|
After Width: | Height: | Size: 332 KiB |
0
ssd_keras-master/keras_layers/__init__.py
Normal file
BIN
ssd_keras-master/keras_layers/__init__.pyc
Normal file
278
ssd_keras-master/keras_layers/keras_layer_AnchorBoxes.py
Normal file
@@ -0,0 +1,278 @@
|
||||
'''
|
||||
A custom Keras layer to generate anchor boxes.
|
||||
|
||||
Copyright (C) 2018 Pierluigi Ferrari
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
'''
|
||||
|
||||
from __future__ import division
|
||||
import numpy as np
|
||||
import keras.backend as K
|
||||
from keras.engine.topology import InputSpec
|
||||
from keras.engine.topology import Layer
|
||||
|
||||
from bounding_box_utils.bounding_box_utils import convert_coordinates
|
||||
|
||||
class AnchorBoxes(Layer):
|
||||
'''
|
||||
A Keras layer to create an output tensor containing anchor box coordinates
|
||||
and variances based on the input tensor and the passed arguments.
|
||||
|
||||
A set of 2D anchor boxes of different aspect ratios is created for each spatial unit of
|
||||
the input tensor. The number of anchor boxes created per unit depends on the arguments
|
||||
`aspect_ratios` and `two_boxes_for_ar1`, in the default case it is 4. The boxes
|
||||
are parameterized by the coordinate tuple `(xmin, xmax, ymin, ymax)`.
|
||||
|
||||
The logic implemented by this layer is identical to the logic in the module
|
||||
`ssd_box_encode_decode_utils.py`.
|
||||
|
||||
The purpose of having this layer in the network is to make the model self-sufficient
|
||||
at inference time. Since the model is predicting offsets to the anchor boxes
|
||||
(rather than predicting absolute box coordinates directly), one needs to know the anchor
|
||||
box coordinates in order to construct the final prediction boxes from the predicted offsets.
|
||||
If the model's output tensor did not contain the anchor box coordinates, the necessary
|
||||
information to convert the predicted offsets back to absolute coordinates would be missing
|
||||
in the model output. The reason why it is necessary to predict offsets to the anchor boxes
|
||||
rather than to predict absolute box coordinates directly is explained in `README.md`.
|
||||
|
||||
Input shape:
|
||||
4D tensor of shape `(batch, channels, height, width)` if `dim_ordering = 'th'`
|
||||
or `(batch, height, width, channels)` if `dim_ordering = 'tf'`.
|
||||
|
||||
Output shape:
|
||||
5D tensor of shape `(batch, height, width, n_boxes, 8)`. The last axis contains
|
||||
the four anchor box coordinates and the four variance values for each box.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
img_height,
|
||||
img_width,
|
||||
this_scale,
|
||||
next_scale,
|
||||
aspect_ratios=[0.5, 1.0, 2.0],
|
||||
two_boxes_for_ar1=True,
|
||||
this_steps=None,
|
||||
this_offsets=None,
|
||||
clip_boxes=False,
|
||||
variances=[0.1, 0.1, 0.2, 0.2],
|
||||
coords='centroids',
|
||||
normalize_coords=False,
|
||||
**kwargs):
|
||||
'''
|
||||
All arguments need to be set to the same values as in the box encoding process, otherwise the behavior is undefined.
|
||||
Some of these arguments are explained in more detail in the documentation of the `SSDBoxEncoder` class.
|
||||
|
||||
Arguments:
|
||||
img_height (int): The height of the input images.
|
||||
img_width (int): The width of the input images.
|
||||
this_scale (float): A float in [0, 1], the scaling factor for the size of the generated anchor boxes
|
||||
as a fraction of the shorter side of the input image.
|
||||
next_scale (float): A float in [0, 1], the next larger scaling factor. Only relevant if
|
||||
`self.two_boxes_for_ar1 == True`.
|
||||
aspect_ratios (list, optional): The list of aspect ratios for which default boxes are to be
|
||||
generated for this layer.
|
||||
two_boxes_for_ar1 (bool, optional): Only relevant if `aspect_ratios` contains 1.
|
||||
If `True`, two default boxes will be generated for aspect ratio 1. The first will be generated
|
||||
using the scaling factor for the respective layer, the second one will be generated using
|
||||
geometric mean of said scaling factor and next bigger scaling factor.
|
||||
clip_boxes (bool, optional): If `True`, clips the anchor box coordinates to stay within image boundaries.
|
||||
variances (list, optional): A list of 4 floats >0. The anchor box offset for each coordinate will be divided by
|
||||
its respective variance value.
|
||||
coords (str, optional): The box coordinate format to be used internally in the model (i.e. this is not the input format
|
||||
of the ground truth labels). Can be either 'centroids' for the format `(cx, cy, w, h)` (box center coordinates, width, and height),
|
||||
'corners' for the format `(xmin, ymin, xmax, ymax)`, or 'minmax' for the format `(xmin, xmax, ymin, ymax)`.
|
||||
normalize_coords (bool, optional): Set to `True` if the model uses relative instead of absolute coordinates,
|
||||
i.e. if the model predicts box coordinates within [0,1] instead of absolute coordinates.
|
||||
'''
|
||||
if K.backend() != 'tensorflow':
|
||||
raise TypeError("This layer only supports TensorFlow at the moment, but you are using the {} backend.".format(K.backend()))
|
||||
|
||||
if (this_scale < 0) or (next_scale < 0) or (this_scale > 1):
|
||||
raise ValueError("`this_scale` must be in [0, 1] and `next_scale` must be >0, but `this_scale` == {}, `next_scale` == {}".format(this_scale, next_scale))
|
||||
|
||||
if len(variances) != 4:
|
||||
raise ValueError("4 variance values must be pased, but {} values were received.".format(len(variances)))
|
||||
variances = np.array(variances)
|
||||
if np.any(variances <= 0):
|
||||
raise ValueError("All variances must be >0, but the variances given are {}".format(variances))
|
||||
|
||||
self.img_height = img_height
|
||||
self.img_width = img_width
|
||||
self.this_scale = this_scale
|
||||
self.next_scale = next_scale
|
||||
self.aspect_ratios = aspect_ratios
|
||||
self.two_boxes_for_ar1 = two_boxes_for_ar1
|
||||
self.this_steps = this_steps
|
||||
self.this_offsets = this_offsets
|
||||
self.clip_boxes = clip_boxes
|
||||
self.variances = variances
|
||||
self.coords = coords
|
||||
self.normalize_coords = normalize_coords
|
||||
# Compute the number of boxes per cell
|
||||
if (1 in aspect_ratios) and two_boxes_for_ar1:
|
||||
self.n_boxes = len(aspect_ratios) + 1
|
||||
else:
|
||||
self.n_boxes = len(aspect_ratios)
|
||||
super(AnchorBoxes, self).__init__(**kwargs)
|
||||
|
||||
def build(self, input_shape):
|
||||
self.input_spec = [InputSpec(shape=input_shape)]
|
||||
super(AnchorBoxes, self).build(input_shape)
|
||||
|
||||
def call(self, x, mask=None):
|
||||
'''
|
||||
Return an anchor box tensor based on the shape of the input tensor.
|
||||
|
||||
The logic implemented here is identical to the logic in the module `ssd_box_encode_decode_utils.py`.
|
||||
|
||||
Note that this tensor does not participate in any graph computations at runtime. It is being created
|
||||
as a constant once during graph creation and is just being output along with the rest of the model output
|
||||
during runtime. Because of this, all logic is implemented as Numpy array operations and it is sufficient
|
||||
to convert the resulting Numpy array into a Keras tensor at the very end before outputting it.
|
||||
|
||||
Arguments:
|
||||
x (tensor): 4D tensor of shape `(batch, channels, height, width)` if `dim_ordering = 'th'`
|
||||
or `(batch, height, width, channels)` if `dim_ordering = 'tf'`. The input for this
|
||||
layer must be the output of the localization predictor layer.
|
||||
'''
|
||||
|
||||
# Compute box width and height for each aspect ratio
|
||||
# The shorter side of the image will be used to compute `w` and `h` using `scale` and `aspect_ratios`.
|
||||
size = min(self.img_height, self.img_width)
|
||||
# Compute the box widths and and heights for all aspect ratios
|
||||
wh_list = []
|
||||
for ar in self.aspect_ratios:
|
||||
if (ar == 1):
|
||||
# Compute the regular anchor box for aspect ratio 1.
|
||||
box_height = box_width = self.this_scale * size
|
||||
wh_list.append((box_width, box_height))
|
||||
if self.two_boxes_for_ar1:
|
||||
# Compute one slightly larger version using the geometric mean of this scale value and the next.
|
||||
box_height = box_width = np.sqrt(self.this_scale * self.next_scale) * size
|
||||
wh_list.append((box_width, box_height))
|
||||
else:
|
||||
box_height = self.this_scale * size / np.sqrt(ar)
|
||||
box_width = self.this_scale * size * np.sqrt(ar)
|
||||
wh_list.append((box_width, box_height))
|
||||
wh_list = np.array(wh_list)
|
||||
|
||||
# We need the shape of the input tensor
|
||||
if K.image_dim_ordering() == 'tf':
|
||||
batch_size, feature_map_height, feature_map_width, feature_map_channels = x._keras_shape
|
||||
else: # Not yet relevant since TensorFlow is the only supported backend right now, but it can't harm to have this in here for the future
|
||||
batch_size, feature_map_channels, feature_map_height, feature_map_width = x._keras_shape
|
||||
|
||||
# Compute the grid of box center points. They are identical for all aspect ratios.
|
||||
|
||||
# Compute the step sizes, i.e. how far apart the anchor box center points will be vertically and horizontally.
|
||||
if (self.this_steps is None):
|
||||
step_height = self.img_height / feature_map_height
|
||||
step_width = self.img_width / feature_map_width
|
||||
else:
|
||||
if isinstance(self.this_steps, (list, tuple)) and (len(self.this_steps) == 2):
|
||||
step_height = self.this_steps[0]
|
||||
step_width = self.this_steps[1]
|
||||
elif isinstance(self.this_steps, (int, float)):
|
||||
step_height = self.this_steps
|
||||
step_width = self.this_steps
|
||||
# Compute the offsets, i.e. at what pixel values the first anchor box center point will be from the top and from the left of the image.
|
||||
if (self.this_offsets is None):
|
||||
offset_height = 0.5
|
||||
offset_width = 0.5
|
||||
else:
|
||||
if isinstance(self.this_offsets, (list, tuple)) and (len(self.this_offsets) == 2):
|
||||
offset_height = self.this_offsets[0]
|
||||
offset_width = self.this_offsets[1]
|
||||
elif isinstance(self.this_offsets, (int, float)):
|
||||
offset_height = self.this_offsets
|
||||
offset_width = self.this_offsets
|
||||
# Now that we have the offsets and step sizes, compute the grid of anchor box center points.
|
||||
cy = np.linspace(offset_height * step_height, (offset_height + feature_map_height - 1) * step_height, feature_map_height)
|
||||
cx = np.linspace(offset_width * step_width, (offset_width + feature_map_width - 1) * step_width, feature_map_width)
|
||||
cx_grid, cy_grid = np.meshgrid(cx, cy)
|
||||
cx_grid = np.expand_dims(cx_grid, -1) # This is necessary for np.tile() to do what we want further down
|
||||
cy_grid = np.expand_dims(cy_grid, -1) # This is necessary for np.tile() to do what we want further down
|
||||
|
||||
# Create a 4D tensor template of shape `(feature_map_height, feature_map_width, n_boxes, 4)`
|
||||
# where the last dimension will contain `(cx, cy, w, h)`
|
||||
boxes_tensor = np.zeros((feature_map_height, feature_map_width, self.n_boxes, 4))
|
||||
|
||||
boxes_tensor[:, :, :, 0] = np.tile(cx_grid, (1, 1, self.n_boxes)) # Set cx
|
||||
boxes_tensor[:, :, :, 1] = np.tile(cy_grid, (1, 1, self.n_boxes)) # Set cy
|
||||
boxes_tensor[:, :, :, 2] = wh_list[:, 0] # Set w
|
||||
boxes_tensor[:, :, :, 3] = wh_list[:, 1] # Set h
|
||||
|
||||
# Convert `(cx, cy, w, h)` to `(xmin, xmax, ymin, ymax)`
|
||||
boxes_tensor = convert_coordinates(boxes_tensor, start_index=0, conversion='centroids2corners')
|
||||
|
||||
# If `clip_boxes` is enabled, clip the coordinates to lie within the image boundaries
|
||||
if self.clip_boxes:
|
||||
x_coords = boxes_tensor[:,:,:,[0, 2]]
|
||||
x_coords[x_coords >= self.img_width] = self.img_width - 1
|
||||
x_coords[x_coords < 0] = 0
|
||||
boxes_tensor[:,:,:,[0, 2]] = x_coords
|
||||
y_coords = boxes_tensor[:,:,:,[1, 3]]
|
||||
y_coords[y_coords >= self.img_height] = self.img_height - 1
|
||||
y_coords[y_coords < 0] = 0
|
||||
boxes_tensor[:,:,:,[1, 3]] = y_coords
|
||||
|
||||
# If `normalize_coords` is enabled, normalize the coordinates to be within [0,1]
|
||||
if self.normalize_coords:
|
||||
boxes_tensor[:, :, :, [0, 2]] /= self.img_width
|
||||
boxes_tensor[:, :, :, [1, 3]] /= self.img_height
|
||||
|
||||
# TODO: Implement box limiting directly for `(cx, cy, w, h)` so that we don't have to unnecessarily convert back and forth.
|
||||
if self.coords == 'centroids':
|
||||
# Convert `(xmin, ymin, xmax, ymax)` back to `(cx, cy, w, h)`.
|
||||
boxes_tensor = convert_coordinates(boxes_tensor, start_index=0, conversion='corners2centroids', border_pixels='half')
|
||||
elif self.coords == 'minmax':
|
||||
# Convert `(xmin, ymin, xmax, ymax)` to `(xmin, xmax, ymin, ymax).
|
||||
boxes_tensor = convert_coordinates(boxes_tensor, start_index=0, conversion='corners2minmax', border_pixels='half')
|
||||
|
||||
# Create a tensor to contain the variances and append it to `boxes_tensor`. This tensor has the same shape
|
||||
# as `boxes_tensor` and simply contains the same 4 variance values for every position in the last axis.
|
||||
variances_tensor = np.zeros_like(boxes_tensor) # Has shape `(feature_map_height, feature_map_width, n_boxes, 4)`
|
||||
variances_tensor += self.variances # Long live broadcasting
|
||||
# Now `boxes_tensor` becomes a tensor of shape `(feature_map_height, feature_map_width, n_boxes, 8)`
|
||||
boxes_tensor = np.concatenate((boxes_tensor, variances_tensor), axis=-1)
|
||||
|
||||
# Now prepend one dimension to `boxes_tensor` to account for the batch size and tile it along
|
||||
# The result will be a 5D tensor of shape `(batch_size, feature_map_height, feature_map_width, n_boxes, 8)`
|
||||
boxes_tensor = np.expand_dims(boxes_tensor, axis=0)
|
||||
boxes_tensor = K.tile(K.constant(boxes_tensor, dtype='float32'), (K.shape(x)[0], 1, 1, 1, 1))
|
||||
|
||||
return boxes_tensor
|
||||
|
||||
def compute_output_shape(self, input_shape):
|
||||
if K.image_dim_ordering() == 'tf':
|
||||
batch_size, feature_map_height, feature_map_width, feature_map_channels = input_shape
|
||||
else: # Not yet relevant since TensorFlow is the only supported backend right now, but it can't harm to have this in here for the future
|
||||
batch_size, feature_map_channels, feature_map_height, feature_map_width = input_shape
|
||||
return (batch_size, feature_map_height, feature_map_width, self.n_boxes, 8)
|
||||
|
||||
def get_config(self):
|
||||
config = {
|
||||
'img_height': self.img_height,
|
||||
'img_width': self.img_width,
|
||||
'this_scale': self.this_scale,
|
||||
'next_scale': self.next_scale,
|
||||
'aspect_ratios': list(self.aspect_ratios),
|
||||
'two_boxes_for_ar1': self.two_boxes_for_ar1,
|
||||
'clip_boxes': self.clip_boxes,
|
||||
'variances': list(self.variances),
|
||||
'coords': self.coords,
|
||||
'normalize_coords': self.normalize_coords
|
||||
}
|
||||
base_config = super(AnchorBoxes, self).get_config()
|
||||
return dict(list(base_config.items()) + list(config.items()))
|
||||
BIN
ssd_keras-master/keras_layers/keras_layer_AnchorBoxes.pyc
Normal file
283
ssd_keras-master/keras_layers/keras_layer_DecodeDetections.py
Normal file
@@ -0,0 +1,283 @@
|
||||
'''
|
||||
A custom Keras layer to decode the raw SSD prediction output. Corresponds to the
|
||||
`DetectionOutput` layer type in the original Caffe implementation of SSD.
|
||||
|
||||
Copyright (C) 2018 Pierluigi Ferrari
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
'''
|
||||
|
||||
from __future__ import division
|
||||
import numpy as np
|
||||
import tensorflow as tf
|
||||
import keras.backend as K
|
||||
from keras.engine.topology import InputSpec
|
||||
from keras.engine.topology import Layer
|
||||
|
||||
class DecodeDetections(Layer):
|
||||
'''
|
||||
A Keras layer to decode the raw SSD prediction output.
|
||||
|
||||
Input shape:
|
||||
3D tensor of shape `(batch_size, n_boxes, n_classes + 12)`.
|
||||
|
||||
Output shape:
|
||||
3D tensor of shape `(batch_size, top_k, 6)`.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
confidence_thresh=0.01,
|
||||
iou_threshold=0.45,
|
||||
top_k=200,
|
||||
nms_max_output_size=400,
|
||||
coords='centroids',
|
||||
normalize_coords=True,
|
||||
img_height=None,
|
||||
img_width=None,
|
||||
**kwargs):
|
||||
'''
|
||||
All default argument values follow the Caffe implementation.
|
||||
|
||||
Arguments:
|
||||
confidence_thresh (float, optional): A float in [0,1), the minimum classification confidence in a specific
|
||||
positive class in order to be considered for the non-maximum suppression stage for the respective class.
|
||||
A lower value will result in a larger part of the selection process being done by the non-maximum suppression
|
||||
stage, while a larger value will result in a larger part of the selection process happening in the confidence
|
||||
thresholding stage.
|
||||
iou_threshold (float, optional): A float in [0,1]. All boxes with a Jaccard similarity of greater than `iou_threshold`
|
||||
with a locally maximal box will be removed from the set of predictions for a given class, where 'maximal' refers
|
||||
to the box score.
|
||||
top_k (int, optional): The number of highest scoring predictions to be kept for each batch item after the
|
||||
non-maximum suppression stage.
|
||||
nms_max_output_size (int, optional): The maximum number of predictions that will be left after performing non-maximum
|
||||
suppression.
|
||||
coords (str, optional): The box coordinate format that the model outputs. Must be 'centroids'
|
||||
i.e. the format `(cx, cy, w, h)` (box center coordinates, width, and height). Other coordinate formats are
|
||||
currently not supported.
|
||||
normalize_coords (bool, optional): Set to `True` if the model outputs relative coordinates (i.e. coordinates in [0,1])
|
||||
and you wish to transform these relative coordinates back to absolute coordinates. If the model outputs
|
||||
relative coordinates, but you do not want to convert them back to absolute coordinates, set this to `False`.
|
||||
Do not set this to `True` if the model already outputs absolute coordinates, as that would result in incorrect
|
||||
coordinates. Requires `img_height` and `img_width` if set to `True`.
|
||||
img_height (int, optional): The height of the input images. Only needed if `normalize_coords` is `True`.
|
||||
img_width (int, optional): The width of the input images. Only needed if `normalize_coords` is `True`.
|
||||
'''
|
||||
if K.backend() != 'tensorflow':
|
||||
raise TypeError("This layer only supports TensorFlow at the moment, but you are using the {} backend.".format(K.backend()))
|
||||
|
||||
if normalize_coords and ((img_height is None) or (img_width is None)):
|
||||
raise ValueError("If relative box coordinates are supposed to be converted to absolute coordinates, the decoder needs the image size in order to decode the predictions, but `img_height == {}` and `img_width == {}`".format(img_height, img_width))
|
||||
|
||||
if coords != 'centroids':
|
||||
raise ValueError("The DetectionOutput layer currently only supports the 'centroids' coordinate format.")
|
||||
|
||||
# We need these members for the config.
|
||||
self.confidence_thresh = confidence_thresh
|
||||
self.iou_threshold = iou_threshold
|
||||
self.top_k = top_k
|
||||
self.normalize_coords = normalize_coords
|
||||
self.img_height = img_height
|
||||
self.img_width = img_width
|
||||
self.coords = coords
|
||||
self.nms_max_output_size = nms_max_output_size
|
||||
|
||||
# We need these members for TensorFlow.
|
||||
self.tf_confidence_thresh = tf.constant(self.confidence_thresh, name='confidence_thresh')
|
||||
self.tf_iou_threshold = tf.constant(self.iou_threshold, name='iou_threshold')
|
||||
self.tf_top_k = tf.constant(self.top_k, name='top_k')
|
||||
self.tf_normalize_coords = tf.constant(self.normalize_coords, name='normalize_coords')
|
||||
self.tf_img_height = tf.constant(self.img_height, dtype=tf.float32, name='img_height')
|
||||
self.tf_img_width = tf.constant(self.img_width, dtype=tf.float32, name='img_width')
|
||||
self.tf_nms_max_output_size = tf.constant(self.nms_max_output_size, name='nms_max_output_size')
|
||||
|
||||
super(DecodeDetections, self).__init__(**kwargs)
|
||||
|
||||
def build(self, input_shape):
|
||||
self.input_spec = [InputSpec(shape=input_shape)]
|
||||
super(DecodeDetections, self).build(input_shape)
|
||||
|
||||
def call(self, y_pred, mask=None):
|
||||
'''
|
||||
Returns:
|
||||
3D tensor of shape `(batch_size, top_k, 6)`. The second axis is zero-padded
|
||||
to always yield `top_k` predictions per batch item. The last axis contains
|
||||
the coordinates for each predicted box in the format
|
||||
`[class_id, confidence, xmin, ymin, xmax, ymax]`.
|
||||
'''
|
||||
|
||||
#####################################################################################
|
||||
# 1. Convert the box coordinates from predicted anchor box offsets to predicted
|
||||
# absolute coordinates
|
||||
#####################################################################################
|
||||
|
||||
# Convert anchor box offsets to image offsets.
|
||||
cx = y_pred[...,-12] * y_pred[...,-4] * y_pred[...,-6] + y_pred[...,-8] # cx = cx_pred * cx_variance * w_anchor + cx_anchor
|
||||
cy = y_pred[...,-11] * y_pred[...,-3] * y_pred[...,-5] + y_pred[...,-7] # cy = cy_pred * cy_variance * h_anchor + cy_anchor
|
||||
w = tf.exp(y_pred[...,-10] * y_pred[...,-2]) * y_pred[...,-6] # w = exp(w_pred * variance_w) * w_anchor
|
||||
h = tf.exp(y_pred[...,-9] * y_pred[...,-1]) * y_pred[...,-5] # h = exp(h_pred * variance_h) * h_anchor
|
||||
|
||||
# Convert 'centroids' to 'corners'.
|
||||
xmin = cx - 0.5 * w
|
||||
ymin = cy - 0.5 * h
|
||||
xmax = cx + 0.5 * w
|
||||
ymax = cy + 0.5 * h
|
||||
|
||||
# If the model predicts box coordinates relative to the image dimensions and they are supposed
|
||||
# to be converted back to absolute coordinates, do that.
|
||||
def normalized_coords():
|
||||
xmin1 = tf.expand_dims(xmin * self.tf_img_width, axis=-1)
|
||||
ymin1 = tf.expand_dims(ymin * self.tf_img_height, axis=-1)
|
||||
xmax1 = tf.expand_dims(xmax * self.tf_img_width, axis=-1)
|
||||
ymax1 = tf.expand_dims(ymax * self.tf_img_height, axis=-1)
|
||||
return xmin1, ymin1, xmax1, ymax1
|
||||
def non_normalized_coords():
|
||||
return tf.expand_dims(xmin, axis=-1), tf.expand_dims(ymin, axis=-1), tf.expand_dims(xmax, axis=-1), tf.expand_dims(ymax, axis=-1)
|
||||
|
||||
xmin, ymin, xmax, ymax = tf.cond(self.tf_normalize_coords, normalized_coords, non_normalized_coords)
|
||||
|
||||
# Concatenate the one-hot class confidences and the converted box coordinates to form the decoded predictions tensor.
|
||||
y_pred = tf.concat(values=[y_pred[...,:-12], xmin, ymin, xmax, ymax], axis=-1)
|
||||
|
||||
#####################################################################################
|
||||
# 2. Perform confidence thresholding, per-class non-maximum suppression, and
|
||||
# top-k filtering.
|
||||
#####################################################################################
|
||||
|
||||
batch_size = tf.shape(y_pred)[0] # Output dtype: tf.int32
|
||||
n_boxes = tf.shape(y_pred)[1]
|
||||
n_classes = y_pred.shape[2] - 4
|
||||
class_indices = tf.range(1, n_classes)
|
||||
|
||||
# Create a function that filters the predictions for the given batch item. Specifically, it performs:
|
||||
# - confidence thresholding
|
||||
# - non-maximum suppression (NMS)
|
||||
# - top-k filtering
|
||||
def filter_predictions(batch_item):
|
||||
|
||||
# Create a function that filters the predictions for one single class.
|
||||
def filter_single_class(index):
|
||||
|
||||
# From a tensor of shape (n_boxes, n_classes + 4 coordinates) extract
|
||||
# a tensor of shape (n_boxes, 1 + 4 coordinates) that contains the
|
||||
# confidnece values for just one class, determined by `index`.
|
||||
confidences = tf.expand_dims(batch_item[..., index], axis=-1)
|
||||
class_id = tf.fill(dims=tf.shape(confidences), value=tf.to_float(index))
|
||||
box_coordinates = batch_item[...,-4:]
|
||||
|
||||
single_class = tf.concat([class_id, confidences, box_coordinates], axis=-1)
|
||||
|
||||
# Apply confidence thresholding with respect to the class defined by `index`.
|
||||
threshold_met = single_class[:,1] > self.tf_confidence_thresh
|
||||
single_class = tf.boolean_mask(tensor=single_class,
|
||||
mask=threshold_met)
|
||||
|
||||
# If any boxes made the threshold, perform NMS.
|
||||
def perform_nms():
|
||||
scores = single_class[...,1]
|
||||
|
||||
# `tf.image.non_max_suppression()` needs the box coordinates in the format `(ymin, xmin, ymax, xmax)`.
|
||||
xmin = tf.expand_dims(single_class[...,-4], axis=-1)
|
||||
ymin = tf.expand_dims(single_class[...,-3], axis=-1)
|
||||
xmax = tf.expand_dims(single_class[...,-2], axis=-1)
|
||||
ymax = tf.expand_dims(single_class[...,-1], axis=-1)
|
||||
boxes = tf.concat(values=[ymin, xmin, ymax, xmax], axis=-1)
|
||||
|
||||
maxima_indices = tf.image.non_max_suppression(boxes=boxes,
|
||||
scores=scores,
|
||||
max_output_size=self.tf_nms_max_output_size,
|
||||
iou_threshold=self.iou_threshold,
|
||||
name='non_maximum_suppresion')
|
||||
maxima = tf.gather(params=single_class,
|
||||
indices=maxima_indices,
|
||||
axis=0)
|
||||
return maxima
|
||||
|
||||
def no_confident_predictions():
|
||||
return tf.constant(value=0.0, shape=(1,6))
|
||||
|
||||
single_class_nms = tf.cond(tf.equal(tf.size(single_class), 0), no_confident_predictions, perform_nms)
|
||||
|
||||
# Make sure `single_class` is exactly `self.nms_max_output_size` elements long.
|
||||
padded_single_class = tf.pad(tensor=single_class_nms,
|
||||
paddings=[[0, self.tf_nms_max_output_size - tf.shape(single_class_nms)[0]], [0, 0]],
|
||||
mode='CONSTANT',
|
||||
constant_values=0.0)
|
||||
|
||||
return padded_single_class
|
||||
|
||||
# Iterate `filter_single_class()` over all class indices.
|
||||
filtered_single_classes = tf.map_fn(fn=lambda i: filter_single_class(i),
|
||||
elems=tf.range(1,n_classes),
|
||||
dtype=tf.float32,
|
||||
parallel_iterations=128,
|
||||
back_prop=False,
|
||||
swap_memory=False,
|
||||
infer_shape=True,
|
||||
name='loop_over_classes')
|
||||
|
||||
# Concatenate the filtered results for all individual classes to one tensor.
|
||||
filtered_predictions = tf.reshape(tensor=filtered_single_classes, shape=(-1,6))
|
||||
|
||||
# Perform top-k filtering for this batch item or pad it in case there are
|
||||
# fewer than `self.top_k` boxes left at this point. Either way, produce a
|
||||
# tensor of length `self.top_k`. By the time we return the final results tensor
|
||||
# for the whole batch, all batch items must have the same number of predicted
|
||||
# boxes so that the tensor dimensions are homogenous. If fewer than `self.top_k`
|
||||
# predictions are left after the filtering process above, we pad the missing
|
||||
# predictions with zeros as dummy entries.
|
||||
def top_k():
|
||||
return tf.gather(params=filtered_predictions,
|
||||
indices=tf.nn.top_k(filtered_predictions[:, 1], k=self.tf_top_k, sorted=True).indices,
|
||||
axis=0)
|
||||
def pad_and_top_k():
|
||||
padded_predictions = tf.pad(tensor=filtered_predictions,
|
||||
paddings=[[0, self.tf_top_k - tf.shape(filtered_predictions)[0]], [0, 0]],
|
||||
mode='CONSTANT',
|
||||
constant_values=0.0)
|
||||
return tf.gather(params=padded_predictions,
|
||||
indices=tf.nn.top_k(padded_predictions[:, 1], k=self.tf_top_k, sorted=True).indices,
|
||||
axis=0)
|
||||
|
||||
top_k_boxes = tf.cond(tf.greater_equal(tf.shape(filtered_predictions)[0], self.tf_top_k), top_k, pad_and_top_k)
|
||||
|
||||
return top_k_boxes
|
||||
|
||||
# Iterate `filter_predictions()` over all batch items.
|
||||
output_tensor = tf.map_fn(fn=lambda x: filter_predictions(x),
|
||||
elems=y_pred,
|
||||
dtype=None,
|
||||
parallel_iterations=128,
|
||||
back_prop=False,
|
||||
swap_memory=False,
|
||||
infer_shape=True,
|
||||
name='loop_over_batch')
|
||||
|
||||
return output_tensor
|
||||
|
||||
def compute_output_shape(self, input_shape):
|
||||
batch_size, n_boxes, last_axis = input_shape
|
||||
return (batch_size, self.tf_top_k, 6) # Last axis: (class_ID, confidence, 4 box coordinates)
|
||||
|
||||
def get_config(self):
|
||||
config = {
|
||||
'confidence_thresh': self.confidence_thresh,
|
||||
'iou_threshold': self.iou_threshold,
|
||||
'top_k': self.top_k,
|
||||
'nms_max_output_size': self.nms_max_output_size,
|
||||
'coords': self.coords,
|
||||
'normalize_coords': self.normalize_coords,
|
||||
'img_height': self.img_height,
|
||||
'img_width': self.img_width,
|
||||
}
|
||||
base_config = super(DecodeDetections, self).get_config()
|
||||
return dict(list(base_config.items()) + list(config.items()))
|
||||
BIN
ssd_keras-master/keras_layers/keras_layer_DecodeDetections.pyc
Normal file
@@ -0,0 +1,266 @@
|
||||
'''
|
||||
A custom Keras layer to decode the raw SSD prediction output. This is a modified
|
||||
and more efficient version of the `DetectionOutput` layer type in the original Caffe
|
||||
implementation of SSD. For a faithful replication of the original layer, please
|
||||
refer to the `DecodeDetections` layer.
|
||||
|
||||
Copyright (C) 2018 Pierluigi Ferrari
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
'''
|
||||
|
||||
from __future__ import division
|
||||
import numpy as np
|
||||
import tensorflow as tf
|
||||
import keras.backend as K
|
||||
from keras.engine.topology import InputSpec
|
||||
from keras.engine.topology import Layer
|
||||
|
||||
class DecodeDetectionsFast(Layer):
|
||||
'''
|
||||
A Keras layer to decode the raw SSD prediction output.
|
||||
|
||||
Input shape:
|
||||
3D tensor of shape `(batch_size, n_boxes, n_classes + 12)`.
|
||||
|
||||
Output shape:
|
||||
3D tensor of shape `(batch_size, top_k, 6)`.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
confidence_thresh=0.01,
|
||||
iou_threshold=0.45,
|
||||
top_k=200,
|
||||
nms_max_output_size=400,
|
||||
coords='centroids',
|
||||
normalize_coords=True,
|
||||
img_height=None,
|
||||
img_width=None,
|
||||
**kwargs):
|
||||
'''
|
||||
All default argument values follow the Caffe implementation.
|
||||
|
||||
Arguments:
|
||||
confidence_thresh (float, optional): A float in [0,1), the minimum classification confidence in a specific
|
||||
positive class in order to be considered for the non-maximum suppression stage for the respective class.
|
||||
A lower value will result in a larger part of the selection process being done by the non-maximum suppression
|
||||
stage, while a larger value will result in a larger part of the selection process happening in the confidence
|
||||
thresholding stage.
|
||||
iou_threshold (float, optional): A float in [0,1]. All boxes with a Jaccard similarity of greater than `iou_threshold`
|
||||
with a locally maximal box will be removed from the set of predictions for a given class, where 'maximal' refers
|
||||
to the box score.
|
||||
top_k (int, optional): The number of highest scoring predictions to be kept for each batch item after the
|
||||
non-maximum suppression stage.
|
||||
nms_max_output_size (int, optional): The maximum number of predictions that will be left after performing non-maximum
|
||||
suppression.
|
||||
coords (str, optional): The box coordinate format that the model outputs. Must be 'centroids'
|
||||
i.e. the format `(cx, cy, w, h)` (box center coordinates, width, and height). Other coordinate formats are
|
||||
currently not supported.
|
||||
normalize_coords (bool, optional): Set to `True` if the model outputs relative coordinates (i.e. coordinates in [0,1])
|
||||
and you wish to transform these relative coordinates back to absolute coordinates. If the model outputs
|
||||
relative coordinates, but you do not want to convert them back to absolute coordinates, set this to `False`.
|
||||
Do not set this to `True` if the model already outputs absolute coordinates, as that would result in incorrect
|
||||
coordinates. Requires `img_height` and `img_width` if set to `True`.
|
||||
img_height (int, optional): The height of the input images. Only needed if `normalize_coords` is `True`.
|
||||
img_width (int, optional): The width of the input images. Only needed if `normalize_coords` is `True`.
|
||||
'''
|
||||
if K.backend() != 'tensorflow':
|
||||
raise TypeError("This layer only supports TensorFlow at the moment, but you are using the {} backend.".format(K.backend()))
|
||||
|
||||
if normalize_coords and ((img_height is None) or (img_width is None)):
|
||||
raise ValueError("If relative box coordinates are supposed to be converted to absolute coordinates, the decoder needs the image size in order to decode the predictions, but `img_height == {}` and `img_width == {}`".format(img_height, img_width))
|
||||
|
||||
if coords != 'centroids':
|
||||
raise ValueError("The DetectionOutput layer currently only supports the 'centroids' coordinate format.")
|
||||
|
||||
# We need these members for the config.
|
||||
self.confidence_thresh = confidence_thresh
|
||||
self.iou_threshold = iou_threshold
|
||||
self.top_k = top_k
|
||||
self.normalize_coords = normalize_coords
|
||||
self.img_height = img_height
|
||||
self.img_width = img_width
|
||||
self.coords = coords
|
||||
self.nms_max_output_size = nms_max_output_size
|
||||
|
||||
# We need these members for TensorFlow.
|
||||
self.tf_confidence_thresh = tf.constant(self.confidence_thresh, name='confidence_thresh')
|
||||
self.tf_iou_threshold = tf.constant(self.iou_threshold, name='iou_threshold')
|
||||
self.tf_top_k = tf.constant(self.top_k, name='top_k')
|
||||
self.tf_normalize_coords = tf.constant(self.normalize_coords, name='normalize_coords')
|
||||
self.tf_img_height = tf.constant(self.img_height, dtype=tf.float32, name='img_height')
|
||||
self.tf_img_width = tf.constant(self.img_width, dtype=tf.float32, name='img_width')
|
||||
self.tf_nms_max_output_size = tf.constant(self.nms_max_output_size, name='nms_max_output_size')
|
||||
|
||||
super(DecodeDetectionsFast, self).__init__(**kwargs)
|
||||
|
||||
def build(self, input_shape):
|
||||
self.input_spec = [InputSpec(shape=input_shape)]
|
||||
super(DecodeDetectionsFast, self).build(input_shape)
|
||||
|
||||
def call(self, y_pred, mask=None):
|
||||
'''
|
||||
Returns:
|
||||
3D tensor of shape `(batch_size, top_k, 6)`. The second axis is zero-padded
|
||||
to always yield `top_k` predictions per batch item. The last axis contains
|
||||
the coordinates for each predicted box in the format
|
||||
`[class_id, confidence, xmin, ymin, xmax, ymax]`.
|
||||
'''
|
||||
|
||||
#####################################################################################
|
||||
# 1. Convert the box coordinates from predicted anchor box offsets to predicted
|
||||
# absolute coordinates
|
||||
#####################################################################################
|
||||
|
||||
# Extract the predicted class IDs as the indices of the highest confidence values.
|
||||
class_ids = tf.expand_dims(tf.to_float(tf.argmax(y_pred[...,:-12], axis=-1)), axis=-1)
|
||||
# Extract the confidences of the maximal classes.
|
||||
confidences = tf.reduce_max(y_pred[...,:-12], axis=-1, keep_dims=True)
|
||||
|
||||
# Convert anchor box offsets to image offsets.
|
||||
cx = y_pred[...,-12] * y_pred[...,-4] * y_pred[...,-6] + y_pred[...,-8] # cx = cx_pred * cx_variance * w_anchor + cx_anchor
|
||||
cy = y_pred[...,-11] * y_pred[...,-3] * y_pred[...,-5] + y_pred[...,-7] # cy = cy_pred * cy_variance * h_anchor + cy_anchor
|
||||
w = tf.exp(y_pred[...,-10] * y_pred[...,-2]) * y_pred[...,-6] # w = exp(w_pred * variance_w) * w_anchor
|
||||
h = tf.exp(y_pred[...,-9] * y_pred[...,-1]) * y_pred[...,-5] # h = exp(h_pred * variance_h) * h_anchor
|
||||
|
||||
# Convert 'centroids' to 'corners'.
|
||||
xmin = cx - 0.5 * w
|
||||
ymin = cy - 0.5 * h
|
||||
xmax = cx + 0.5 * w
|
||||
ymax = cy + 0.5 * h
|
||||
|
||||
# If the model predicts box coordinates relative to the image dimensions and they are supposed
|
||||
# to be converted back to absolute coordinates, do that.
|
||||
def normalized_coords():
|
||||
xmin1 = tf.expand_dims(xmin * self.tf_img_width, axis=-1)
|
||||
ymin1 = tf.expand_dims(ymin * self.tf_img_height, axis=-1)
|
||||
xmax1 = tf.expand_dims(xmax * self.tf_img_width, axis=-1)
|
||||
ymax1 = tf.expand_dims(ymax * self.tf_img_height, axis=-1)
|
||||
return xmin1, ymin1, xmax1, ymax1
|
||||
def non_normalized_coords():
|
||||
return tf.expand_dims(xmin, axis=-1), tf.expand_dims(ymin, axis=-1), tf.expand_dims(xmax, axis=-1), tf.expand_dims(ymax, axis=-1)
|
||||
|
||||
xmin, ymin, xmax, ymax = tf.cond(self.tf_normalize_coords, normalized_coords, non_normalized_coords)
|
||||
|
||||
# Concatenate the one-hot class confidences and the converted box coordinates to form the decoded predictions tensor.
|
||||
y_pred = tf.concat(values=[class_ids, confidences, xmin, ymin, xmax, ymax], axis=-1)
|
||||
|
||||
#####################################################################################
|
||||
# 2. Perform confidence thresholding, non-maximum suppression, and top-k filtering.
|
||||
#####################################################################################
|
||||
|
||||
batch_size = tf.shape(y_pred)[0] # Output dtype: tf.int32
|
||||
n_boxes = tf.shape(y_pred)[1]
|
||||
n_classes = y_pred.shape[2] - 4
|
||||
class_indices = tf.range(1, n_classes)
|
||||
|
||||
# Create a function that filters the predictions for the given batch item. Specifically, it performs:
|
||||
# - confidence thresholding
|
||||
# - non-maximum suppression (NMS)
|
||||
# - top-k filtering
|
||||
def filter_predictions(batch_item):
|
||||
|
||||
# Keep only the non-background boxes.
|
||||
positive_boxes = tf.not_equal(batch_item[...,0], 0.0)
|
||||
predictions = tf.boolean_mask(tensor=batch_item,
|
||||
mask=positive_boxes)
|
||||
|
||||
def perform_confidence_thresholding():
|
||||
# Apply confidence thresholding.
|
||||
threshold_met = predictions[:,1] > self.tf_confidence_thresh
|
||||
return tf.boolean_mask(tensor=predictions,
|
||||
mask=threshold_met)
|
||||
def no_positive_boxes():
|
||||
return tf.constant(value=0.0, shape=(1,6))
|
||||
|
||||
# If there are any positive predictions, perform confidence thresholding.
|
||||
predictions_conf_thresh = tf.cond(tf.equal(tf.size(predictions), 0), no_positive_boxes, perform_confidence_thresholding)
|
||||
|
||||
def perform_nms():
|
||||
scores = predictions_conf_thresh[...,1]
|
||||
|
||||
# `tf.image.non_max_suppression()` needs the box coordinates in the format `(ymin, xmin, ymax, xmax)`.
|
||||
xmin = tf.expand_dims(predictions_conf_thresh[...,-4], axis=-1)
|
||||
ymin = tf.expand_dims(predictions_conf_thresh[...,-3], axis=-1)
|
||||
xmax = tf.expand_dims(predictions_conf_thresh[...,-2], axis=-1)
|
||||
ymax = tf.expand_dims(predictions_conf_thresh[...,-1], axis=-1)
|
||||
boxes = tf.concat(values=[ymin, xmin, ymax, xmax], axis=-1)
|
||||
|
||||
maxima_indices = tf.image.non_max_suppression(boxes=boxes,
|
||||
scores=scores,
|
||||
max_output_size=self.tf_nms_max_output_size,
|
||||
iou_threshold=self.iou_threshold,
|
||||
name='non_maximum_suppresion')
|
||||
maxima = tf.gather(params=predictions_conf_thresh,
|
||||
indices=maxima_indices,
|
||||
axis=0)
|
||||
return maxima
|
||||
def no_confident_predictions():
|
||||
return tf.constant(value=0.0, shape=(1,6))
|
||||
|
||||
# If any boxes made the threshold, perform NMS.
|
||||
predictions_nms = tf.cond(tf.equal(tf.size(predictions_conf_thresh), 0), no_confident_predictions, perform_nms)
|
||||
|
||||
# Perform top-k filtering for this batch item or pad it in case there are
|
||||
# fewer than `self.top_k` boxes left at this point. Either way, produce a
|
||||
# tensor of length `self.top_k`. By the time we return the final results tensor
|
||||
# for the whole batch, all batch items must have the same number of predicted
|
||||
# boxes so that the tensor dimensions are homogenous. If fewer than `self.top_k`
|
||||
# predictions are left after the filtering process above, we pad the missing
|
||||
# predictions with zeros as dummy entries.
|
||||
def top_k():
|
||||
return tf.gather(params=predictions_nms,
|
||||
indices=tf.nn.top_k(predictions_nms[:, 1], k=self.tf_top_k, sorted=True).indices,
|
||||
axis=0)
|
||||
def pad_and_top_k():
|
||||
padded_predictions = tf.pad(tensor=predictions_nms,
|
||||
paddings=[[0, self.tf_top_k - tf.shape(predictions_nms)[0]], [0, 0]],
|
||||
mode='CONSTANT',
|
||||
constant_values=0.0)
|
||||
return tf.gather(params=padded_predictions,
|
||||
indices=tf.nn.top_k(padded_predictions[:, 1], k=self.tf_top_k, sorted=True).indices,
|
||||
axis=0)
|
||||
|
||||
top_k_boxes = tf.cond(tf.greater_equal(tf.shape(predictions_nms)[0], self.tf_top_k), top_k, pad_and_top_k)
|
||||
|
||||
return top_k_boxes
|
||||
|
||||
# Iterate `filter_predictions()` over all batch items.
|
||||
output_tensor = tf.map_fn(fn=lambda x: filter_predictions(x),
|
||||
elems=y_pred,
|
||||
dtype=None,
|
||||
parallel_iterations=128,
|
||||
back_prop=False,
|
||||
swap_memory=False,
|
||||
infer_shape=True,
|
||||
name='loop_over_batch')
|
||||
|
||||
return output_tensor
|
||||
|
||||
def compute_output_shape(self, input_shape):
|
||||
batch_size, n_boxes, last_axis = input_shape
|
||||
return (batch_size, self.tf_top_k, 6) # Last axis: (class_ID, confidence, 4 box coordinates)
|
||||
|
||||
def get_config(self):
|
||||
config = {
|
||||
'confidence_thresh': self.confidence_thresh,
|
||||
'iou_threshold': self.iou_threshold,
|
||||
'top_k': self.top_k,
|
||||
'nms_max_output_size': self.nms_max_output_size,
|
||||
'coords': self.coords,
|
||||
'normalize_coords': self.normalize_coords,
|
||||
'img_height': self.img_height,
|
||||
'img_width': self.img_width,
|
||||
}
|
||||
base_config = super(DecodeDetectionsFast, self).get_config()
|
||||
return dict(list(base_config.items()) + list(config.items()))
|
||||
70
ssd_keras-master/keras_layers/keras_layer_L2Normalization.py
Normal file
@@ -0,0 +1,70 @@
|
||||
'''
|
||||
A custom Keras layer to perform L2-normalization.
|
||||
|
||||
Copyright (C) 2018 Pierluigi Ferrari
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
'''
|
||||
|
||||
from __future__ import division
|
||||
import numpy as np
|
||||
import keras.backend as K
|
||||
from keras.engine.topology import InputSpec
|
||||
from keras.engine.topology import Layer
|
||||
|
||||
class L2Normalization(Layer):
|
||||
'''
|
||||
Performs L2 normalization on the input tensor with a learnable scaling parameter
|
||||
as described in the paper "Parsenet: Looking Wider to See Better" (see references)
|
||||
and as used in the original SSD model.
|
||||
|
||||
Arguments:
|
||||
gamma_init (int): The initial scaling parameter. Defaults to 20 following the
|
||||
SSD paper.
|
||||
|
||||
Input shape:
|
||||
4D tensor of shape `(batch, channels, height, width)` if `dim_ordering = 'th'`
|
||||
or `(batch, height, width, channels)` if `dim_ordering = 'tf'`.
|
||||
|
||||
Returns:
|
||||
The scaled tensor. Same shape as the input tensor.
|
||||
|
||||
References:
|
||||
http://cs.unc.edu/~wliu/papers/parsenet.pdf
|
||||
'''
|
||||
|
||||
def __init__(self, gamma_init=20, **kwargs):
|
||||
if K.image_dim_ordering() == 'tf':
|
||||
self.axis = 3
|
||||
else:
|
||||
self.axis = 1
|
||||
self.gamma_init = gamma_init
|
||||
super(L2Normalization, self).__init__(**kwargs)
|
||||
|
||||
def build(self, input_shape):
|
||||
self.input_spec = [InputSpec(shape=input_shape)]
|
||||
gamma = self.gamma_init * np.ones((input_shape[self.axis],))
|
||||
self.gamma = K.variable(gamma, name='{}_gamma'.format(self.name))
|
||||
self.trainable_weights = [self.gamma]
|
||||
super(L2Normalization, self).build(input_shape)
|
||||
|
||||
def call(self, x, mask=None):
|
||||
output = K.l2_normalize(x, self.axis)
|
||||
return output * self.gamma
|
||||
|
||||
def get_config(self):
|
||||
config = {
|
||||
'gamma_init': self.gamma_init
|
||||
}
|
||||
base_config = super(L2Normalization, self).get_config()
|
||||
return dict(list(base_config.items()) + list(config.items()))
|
||||
BIN
ssd_keras-master/keras_layers/keras_layer_L2Normalization.pyc
Normal file
0
ssd_keras-master/keras_loss_function/__init__.py
Normal file
BIN
ssd_keras-master/keras_loss_function/__init__.pyc
Normal file
211
ssd_keras-master/keras_loss_function/keras_ssd_loss.py
Normal file
@@ -0,0 +1,211 @@
|
||||
'''
|
||||
The Keras-compatible loss function for the SSD model. Currently supports TensorFlow only.
|
||||
|
||||
Copyright (C) 2018 Pierluigi Ferrari
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
'''
|
||||
|
||||
from __future__ import division
|
||||
import tensorflow as tf
|
||||
|
||||
class SSDLoss:
|
||||
'''
|
||||
The SSD loss, see https://arxiv.org/abs/1512.02325.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
neg_pos_ratio=3,
|
||||
n_neg_min=0,
|
||||
alpha=1.0):
|
||||
'''
|
||||
Arguments:
|
||||
neg_pos_ratio (int, optional): The maximum ratio of negative (i.e. background)
|
||||
to positive ground truth boxes to include in the loss computation.
|
||||
There are no actual background ground truth boxes of course, but `y_true`
|
||||
contains anchor boxes labeled with the background class. Since
|
||||
the number of background boxes in `y_true` will usually exceed
|
||||
the number of positive boxes by far, it is necessary to balance
|
||||
their influence on the loss. Defaults to 3 following the paper.
|
||||
n_neg_min (int, optional): The minimum number of negative ground truth boxes to
|
||||
enter the loss computation *per batch*. This argument can be used to make
|
||||
sure that the model learns from a minimum number of negatives in batches
|
||||
in which there are very few, or even none at all, positive ground truth
|
||||
boxes. It defaults to 0 and if used, it should be set to a value that
|
||||
stands in reasonable proportion to the batch size used for training.
|
||||
alpha (float, optional): A factor to weight the localization loss in the
|
||||
computation of the total loss. Defaults to 1.0 following the paper.
|
||||
'''
|
||||
self.neg_pos_ratio = neg_pos_ratio
|
||||
self.n_neg_min = n_neg_min
|
||||
self.alpha = alpha
|
||||
|
||||
def smooth_L1_loss(self, y_true, y_pred):
|
||||
'''
|
||||
Compute smooth L1 loss, see references.
|
||||
|
||||
Arguments:
|
||||
y_true (nD tensor): A TensorFlow tensor of any shape containing the ground truth data.
|
||||
In this context, the expected tensor has shape `(batch_size, #boxes, 4)` and
|
||||
contains the ground truth bounding box coordinates, where the last dimension
|
||||
contains `(xmin, xmax, ymin, ymax)`.
|
||||
y_pred (nD tensor): A TensorFlow tensor of identical structure to `y_true` containing
|
||||
the predicted data, in this context the predicted bounding box coordinates.
|
||||
|
||||
Returns:
|
||||
The smooth L1 loss, a nD-1 Tensorflow tensor. In this context a 2D tensor
|
||||
of shape (batch, n_boxes_total).
|
||||
|
||||
References:
|
||||
https://arxiv.org/abs/1504.08083
|
||||
'''
|
||||
absolute_loss = tf.abs(y_true - y_pred)
|
||||
square_loss = 0.5 * (y_true - y_pred)**2
|
||||
l1_loss = tf.where(tf.less(absolute_loss, 1.0), square_loss, absolute_loss - 0.5)
|
||||
return tf.reduce_sum(l1_loss, axis=-1)
|
||||
|
||||
def log_loss(self, y_true, y_pred):
|
||||
'''
|
||||
Compute the softmax log loss.
|
||||
|
||||
Arguments:
|
||||
y_true (nD tensor): A TensorFlow tensor of any shape containing the ground truth data.
|
||||
In this context, the expected tensor has shape (batch_size, #boxes, #classes)
|
||||
and contains the ground truth bounding box categories.
|
||||
y_pred (nD tensor): A TensorFlow tensor of identical structure to `y_true` containing
|
||||
the predicted data, in this context the predicted bounding box categories.
|
||||
|
||||
Returns:
|
||||
The softmax log loss, a nD-1 Tensorflow tensor. In this context a 2D tensor
|
||||
of shape (batch, n_boxes_total).
|
||||
'''
|
||||
# Make sure that `y_pred` doesn't contain any zeros (which would break the log function)
|
||||
y_pred = tf.maximum(y_pred, 1e-15)
|
||||
# Compute the log loss
|
||||
log_loss = -tf.reduce_sum(y_true * tf.log(y_pred), axis=-1)
|
||||
return log_loss
|
||||
|
||||
def compute_loss(self, y_true, y_pred):
|
||||
'''
|
||||
Compute the loss of the SSD model prediction against the ground truth.
|
||||
|
||||
Arguments:
|
||||
y_true (array): A Numpy array of shape `(batch_size, #boxes, #classes + 12)`,
|
||||
where `#boxes` is the total number of boxes that the model predicts
|
||||
per image. Be careful to make sure that the index of each given
|
||||
box in `y_true` is the same as the index for the corresponding
|
||||
box in `y_pred`. The last axis must have length `#classes + 12` and contain
|
||||
`[classes one-hot encoded, 4 ground truth box coordinate offsets, 8 arbitrary entries]`
|
||||
in this order, including the background class. The last eight entries of the
|
||||
last axis are not used by this function and therefore their contents are
|
||||
irrelevant, they only exist so that `y_true` has the same shape as `y_pred`,
|
||||
where the last four entries of the last axis contain the anchor box
|
||||
coordinates, which are needed during inference. Important: Boxes that
|
||||
you want the cost function to ignore need to have a one-hot
|
||||
class vector of all zeros.
|
||||
y_pred (Keras tensor): The model prediction. The shape is identical
|
||||
to that of `y_true`, i.e. `(batch_size, #boxes, #classes + 12)`.
|
||||
The last axis must contain entries in the format
|
||||
`[classes one-hot encoded, 4 predicted box coordinate offsets, 8 arbitrary entries]`.
|
||||
|
||||
Returns:
|
||||
A scalar, the total multitask loss for classification and localization.
|
||||
'''
|
||||
self.neg_pos_ratio = tf.constant(self.neg_pos_ratio)
|
||||
self.n_neg_min = tf.constant(self.n_neg_min)
|
||||
self.alpha = tf.constant(self.alpha)
|
||||
|
||||
batch_size = tf.shape(y_pred)[0] # Output dtype: tf.int32
|
||||
n_boxes = tf.shape(y_pred)[1] # Output dtype: tf.int32, note that `n_boxes` in this context denotes the total number of boxes per image, not the number of boxes per cell.
|
||||
|
||||
# 1: Compute the losses for class and box predictions for every box.
|
||||
|
||||
classification_loss = tf.to_float(self.log_loss(y_true[:,:,:-12], y_pred[:,:,:-12])) # Output shape: (batch_size, n_boxes)
|
||||
localization_loss = tf.to_float(self.smooth_L1_loss(y_true[:,:,-12:-8], y_pred[:,:,-12:-8])) # Output shape: (batch_size, n_boxes)
|
||||
|
||||
# 2: Compute the classification losses for the positive and negative targets.
|
||||
|
||||
# Create masks for the positive and negative ground truth classes.
|
||||
negatives = y_true[:,:,0] # Tensor of shape (batch_size, n_boxes)
|
||||
positives = tf.to_float(tf.reduce_max(y_true[:,:,1:-12], axis=-1)) # Tensor of shape (batch_size, n_boxes)
|
||||
|
||||
# Count the number of positive boxes (classes 1 to n) in y_true across the whole batch.
|
||||
n_positive = tf.reduce_sum(positives)
|
||||
|
||||
# Now mask all negative boxes and sum up the losses for the positive boxes PER batch item
|
||||
# (Keras loss functions must output one scalar loss value PER batch item, rather than just
|
||||
# one scalar for the entire batch, that's why we're not summing across all axes).
|
||||
pos_class_loss = tf.reduce_sum(classification_loss * positives, axis=-1) # Tensor of shape (batch_size,)
|
||||
|
||||
# Compute the classification loss for the negative default boxes (if there are any).
|
||||
|
||||
# First, compute the classification loss for all negative boxes.
|
||||
neg_class_loss_all = classification_loss * negatives # Tensor of shape (batch_size, n_boxes)
|
||||
n_neg_losses = tf.count_nonzero(neg_class_loss_all, dtype=tf.int32) # The number of non-zero loss entries in `neg_class_loss_all`
|
||||
# What's the point of `n_neg_losses`? For the next step, which will be to compute which negative boxes enter the classification
|
||||
# loss, we don't just want to know how many negative ground truth boxes there are, but for how many of those there actually is
|
||||
# a positive (i.e. non-zero) loss. This is necessary because `tf.nn.top-k()` in the function below will pick the top k boxes with
|
||||
# the highest losses no matter what, even if it receives a vector where all losses are zero. In the unlikely event that all negative
|
||||
# classification losses ARE actually zero though, this behavior might lead to `tf.nn.top-k()` returning the indices of positive
|
||||
# boxes, leading to an incorrect negative classification loss computation, and hence an incorrect overall loss computation.
|
||||
# We therefore need to make sure that `n_negative_keep`, which assumes the role of the `k` argument in `tf.nn.top-k()`,
|
||||
# is at most the number of negative boxes for which there is a positive classification loss.
|
||||
|
||||
# Compute the number of negative examples we want to account for in the loss.
|
||||
# We'll keep at most `self.neg_pos_ratio` times the number of positives in `y_true`, but at least `self.n_neg_min` (unless `n_neg_loses` is smaller).
|
||||
n_negative_keep = tf.minimum(tf.maximum(self.neg_pos_ratio * tf.to_int32(n_positive), self.n_neg_min), n_neg_losses)
|
||||
|
||||
# In the unlikely case when either (1) there are no negative ground truth boxes at all
|
||||
# or (2) the classification loss for all negative boxes is zero, return zero as the `neg_class_loss`.
|
||||
def f1():
|
||||
return tf.zeros([batch_size])
|
||||
# Otherwise compute the negative loss.
|
||||
def f2():
|
||||
# Now we'll identify the top-k (where k == `n_negative_keep`) boxes with the highest confidence loss that
|
||||
# belong to the background class in the ground truth data. Note that this doesn't necessarily mean that the model
|
||||
# predicted the wrong class for those boxes, it just means that the loss for those boxes is the highest.
|
||||
|
||||
# To do this, we reshape `neg_class_loss_all` to 1D...
|
||||
neg_class_loss_all_1D = tf.reshape(neg_class_loss_all, [-1]) # Tensor of shape (batch_size * n_boxes,)
|
||||
# ...and then we get the indices for the `n_negative_keep` boxes with the highest loss out of those...
|
||||
values, indices = tf.nn.top_k(neg_class_loss_all_1D,
|
||||
k=n_negative_keep,
|
||||
sorted=False) # We don't need them sorted.
|
||||
# ...and with these indices we'll create a mask...
|
||||
negatives_keep = tf.scatter_nd(indices=tf.expand_dims(indices, axis=1),
|
||||
updates=tf.ones_like(indices, dtype=tf.int32),
|
||||
shape=tf.shape(neg_class_loss_all_1D)) # Tensor of shape (batch_size * n_boxes,)
|
||||
negatives_keep = tf.to_float(tf.reshape(negatives_keep, [batch_size, n_boxes])) # Tensor of shape (batch_size, n_boxes)
|
||||
# ...and use it to keep only those boxes and mask all other classification losses
|
||||
neg_class_loss = tf.reduce_sum(classification_loss * negatives_keep, axis=-1) # Tensor of shape (batch_size,)
|
||||
return neg_class_loss
|
||||
|
||||
neg_class_loss = tf.cond(tf.equal(n_neg_losses, tf.constant(0)), f1, f2)
|
||||
|
||||
class_loss = pos_class_loss + neg_class_loss # Tensor of shape (batch_size,)
|
||||
|
||||
# 3: Compute the localization loss for the positive targets.
|
||||
# We don't compute a localization loss for negative predicted boxes (obviously: there are no ground truth boxes they would correspond to).
|
||||
|
||||
loc_loss = tf.reduce_sum(localization_loss * positives, axis=-1) # Tensor of shape (batch_size,)
|
||||
|
||||
# 4: Compute the total loss.
|
||||
|
||||
total_loss = (class_loss + self.alpha * loc_loss) / tf.maximum(1.0, n_positive) # In case `n_positive == 0`
|
||||
# Keras has the annoying habit of dividing the loss by the batch size, which sucks in our case
|
||||
# because the relevant criterion to average our loss over is the number of positive boxes in the batch
|
||||
# (by which we're dividing in the line above), not the batch size. So in order to revert Keras' averaging
|
||||
# over the batch size, we'll have to multiply by it.
|
||||
total_loss = total_loss * tf.to_float(batch_size)
|
||||
|
||||
return total_loss
|
||||
BIN
ssd_keras-master/keras_loss_function/keras_ssd_loss.pyc
Normal file
398
ssd_keras-master/log.csv
Normal file
@@ -0,0 +1,398 @@
|
||||
epoch,loss,val_loss
|
||||
0,20.277508449554443,18.43082230275991
|
||||
0,7.1915742305224075,6.3664290333280755
|
||||
1,6.165657311318146,5.740384768223276
|
||||
2,5.619835971168131,5.055156362981212
|
||||
3,5.369787324809428,4.892946821913427
|
||||
4,5.132327414380266,4.604732026761892
|
||||
5,5.0042940591924046,5.3367882135449625
|
||||
6,4.817910179700142,4.068967317586043
|
||||
7,4.781344171415022,4.1435956740622615
|
||||
0,4.711332150380216,3.9899120714713114
|
||||
1,4.565739538037641,3.8868639851346307
|
||||
2,4.5467505189074835,4.518684427008337
|
||||
3,4.446662645487534,3.6808233204909735
|
||||
4,4.384432893490333,3.9689779205224953
|
||||
5,4.338269265632533,3.632280271783167
|
||||
6,4.232728542852971,3.5473593521848015
|
||||
7,4.24265526459042,3.675496294182174
|
||||
8,4.198724102928925,3.8537149546584306
|
||||
9,4.149862735920051,3.242039015268793
|
||||
10,4.086929438042281,3.2605271822092483
|
||||
11,4.080999140106535,3.4492100918293
|
||||
12,4.051774456474609,3.4600228681856273
|
||||
13,4.047840290988972,3.3476012737167125
|
||||
14,4.004921493658581,3.253551969005137
|
||||
15,3.980387693464584,3.1475591296809062
|
||||
16,3.963608845837807,3.439066876367647
|
||||
17,3.9319142337899247,3.227249937106152
|
||||
18,3.9267380777162972,3.1900331236391652
|
||||
19,3.88652819715875,3.4196941712194557
|
||||
20,3.891526617915775,3.2850187503561683
|
||||
21,3.8810042729401117,3.178472664550859
|
||||
22,3.845480888771085,3.1901466000080108
|
||||
23,3.8601040031731895,3.313066850292439
|
||||
24,3.833493468026303,3.1854778224959666
|
||||
25,3.846581113314497,3.3080512863762523
|
||||
26,3.7922811536337204,3.180536364243955
|
||||
27,3.796920354224469,3.230332650749051
|
||||
28,3.77545154190615,3.0828077124941107
|
||||
29,3.7792578078768884,3.088481028274614
|
||||
30,3.799028399090284,3.040389155903641
|
||||
31,3.7627443589116534,3.5030247830858037
|
||||
32,3.7849467292635994,3.430109476434941
|
||||
33,3.7617942428636812,3.076172747368715
|
||||
34,3.69820216201435,3.0636714777776173
|
||||
35,3.7106582014515714,3.1230773734316535
|
||||
36,3.7186973696831696,3.2554605943086194
|
||||
37,3.73952524356691,3.322282877552266
|
||||
38,3.7127712956183494,3.1296832254833107
|
||||
39,3.743936704607518,2.9867530497969415
|
||||
40,3.6970554582580717,3.1789982390160465
|
||||
41,3.675703220584339,3.181230039730364
|
||||
42,3.6800708012869046,2.9879249061492024
|
||||
43,3.6996077292521954,2.909196970170858
|
||||
44,3.6874807460968984,3.0076349156608386
|
||||
0,5.301829584752811,3.880189075713255
|
||||
1,4.237945272776155,3.4012826725658107
|
||||
2,3.967824310967916,3.3003988957161807
|
||||
3,3.811831304539573,4.506175486311621
|
||||
4,3.738135123222297,3.005752024164005
|
||||
5,3.6835490122348116,3.1180185431728558
|
||||
6,3.641814648534126,3.1133831091559663
|
||||
7,3.62354097024078,2.9153411725467566
|
||||
8,3.6078073618738418,3.366496021869231
|
||||
9,3.5808043392312707,2.916329422021399
|
||||
10,3.558084192422922,3.240870023990164
|
||||
11,3.543319643807441,3.101310088415535
|
||||
12,3.5348065467450422,3.1885890494803992
|
||||
13,3.5254843150241024,3.081872761176557
|
||||
14,3.532030360467253,3.140263960093868
|
||||
15,3.502065125688435,2.9188486405051486
|
||||
0,5.393483727917134,3.9214612931864603
|
||||
1,4.453533147422382,3.54439757006509
|
||||
2,4.242004539589649,3.4558666370109634
|
||||
3,4.149720608257483,3.3706597017755313
|
||||
4,4.070950873899982,3.365175929799372
|
||||
5,4.036489350869629,3.2572230537570253
|
||||
6,3.9832210350476616,3.244548196865588
|
||||
7,3.9685796719093296,3.186030752999442
|
||||
8,3.9331882864690937,3.1441271540583395
|
||||
9,3.9003949890268834,3.15957534821666
|
||||
10,3.869919748394882,3.108536195487392
|
||||
11,3.8835768105307977,3.199020989184477
|
||||
12,3.859471693538142,3.1070882893095213
|
||||
13,3.831863939446656,3.0932964833172
|
||||
14,3.837622393809239,3.06195428563624
|
||||
15,3.840526257394262,3.1301962093431124
|
||||
16,3.806362711088162,3.022412871852213
|
||||
17,3.7940347837546606,3.0083183234078543
|
||||
0,3.583134603498268,2.899540112699781
|
||||
1,3.5758411770470437,2.8950582454885754
|
||||
2,3.552036837656122,3.1472871506943996
|
||||
3,3.549508606013809,3.1411996200133343
|
||||
4,3.536157440913235,2.8483173701836138
|
||||
0,3.792046067396762,3.302008090408481
|
||||
1,3.7974349303578765,3.048314040972262
|
||||
2,3.774148417477707,3.0506933468458604
|
||||
3,3.770723174057516,2.995274811155942
|
||||
4,3.7590958569089965,2.9874864899868867
|
||||
5,3.756503374499173,2.9651731917809467
|
||||
6,3.7571751423402726,3.040447744666314
|
||||
7,3.7372909294386574,2.9443872574884065
|
||||
8,3.735867359026198,3.062159067051751
|
||||
9,3.739980499226432,3.0531008007575053
|
||||
10,3.720392432353275,3.0481183667328895
|
||||
11,3.7236807229637274,3.054165705065338
|
||||
12,3.738948112931461,2.9650489200134666
|
||||
13,3.725341242685843,3.0501689881937843
|
||||
14,3.717509788593112,2.906775436851443
|
||||
15,3.7082020335921086,2.8870353331614513
|
||||
16,3.7062981972233824,2.949263564859118
|
||||
17,3.696328950256506,2.898153125753208
|
||||
18,3.694157539690318,2.9727682158411763
|
||||
19,3.6942759911594907,2.936929242586603
|
||||
20,3.688544740044238,2.9775942695262483
|
||||
21,3.685629239042291,3.008222121559844
|
||||
22,3.6829704219200865,2.963830918925149
|
||||
23,3.687984784862804,2.9122491843116527
|
||||
24,3.6889919993408293,2.920590233389212
|
||||
25,3.6744058603034384,3.156900228261948
|
||||
26,3.678084367126283,2.9845731163024904
|
||||
27,3.6721215580784996,2.864051256106824
|
||||
28,3.6761251689142522,2.9282362472281163
|
||||
29,3.670322015591761,2.916268544233575
|
||||
30,3.6746563016959897,2.93533816483556
|
||||
31,3.68822167135601,3.1514753251659626
|
||||
32,3.673546909383746,2.9957992443989734
|
||||
33,3.6622909284476606,2.9045850520109644
|
||||
34,3.6821291304096255,2.846033621831816
|
||||
35,3.6746097554417245,2.946126491439586
|
||||
36,3.6777817256709335,2.895232820973104
|
||||
37,3.6627973835325474,2.8787644779925445
|
||||
38,3.656972606852946,2.910535082330509
|
||||
39,3.6593537592859406,2.9323528041158404
|
||||
40,3.6681329787519275,2.8517043751113267
|
||||
41,3.651395040133656,2.896727514206147
|
||||
42,3.6524612816863242,2.9148669353918155
|
||||
43,3.700482932643524,2.8747942129324895
|
||||
44,3.6568134403923254,2.9226642251501276
|
||||
45,3.6591445048575175,3.0673245993195746
|
||||
46,3.667327405720154,2.923382118916025
|
||||
47,3.666819140911102,2.8185447474280183
|
||||
48,3.6595909573863317,2.8957767840307587
|
||||
49,3.6621842928973347,2.8596115811990233
|
||||
50,3.6531581074987307,2.8383416561447845
|
||||
51,3.6436384217574638,2.8588484636618166
|
||||
52,3.653585070944212,2.9522264416363773
|
||||
53,3.6567384985471123,2.869638296122454
|
||||
54,3.642349756571655,2.9472995919840677
|
||||
55,3.6503870386117447,2.907968194436054
|
||||
56,3.6526537667205465,2.97333710225261
|
||||
0,3.527430093659335,2.9499444086697637
|
||||
1,3.518461085146007,2.7458512868321674
|
||||
2,3.497357860100584,2.750564945644262
|
||||
3,3.505551182790645,2.934770630172321
|
||||
4,3.5010816638422906,2.97169387428128
|
||||
5,3.4826972408190953,2.9497152687700425
|
||||
6,3.482018192963834,2.769856216822352
|
||||
7,3.483267085125539,3.1796632373089695
|
||||
8,3.4743894621160525,2.8287636573582278
|
||||
9,3.476382410010232,2.7349996355723363
|
||||
10,3.4668052105305263,3.3569845280476978
|
||||
11,3.464654709591174,2.956850599500598
|
||||
12,3.469194402020336,2.9216046754559692
|
||||
13,3.4582013188650005,2.766444167117683
|
||||
14,3.4473263411923116,2.574555371805113
|
||||
15,3.440576969020528,2.961924656763369
|
||||
16,3.440138483498881,2.5872522278221286
|
||||
17,3.4416094603475087,2.94846072853828
|
||||
18,3.431826221428119,2.8613412939285747
|
||||
19,3.426628774009799,2.703764006556297
|
||||
20,3.418753977854678,2.794771757904364
|
||||
21,3.422769560654093,2.614489094541997
|
||||
22,3.4175067899222435,3.1974430852641866
|
||||
23,3.4211014375507407,2.886513024483408
|
||||
0,7.251550525006652,6.727467104464161
|
||||
1,6.5805361751809714,6.240762918117095
|
||||
2,6.426467575960234,6.261519120201773
|
||||
3,6.382249749480933,6.350825642858233
|
||||
4,6.346947526265681,6.231373463066257
|
||||
5,6.340085077037663,6.2704033127123
|
||||
6,6.330737964940444,6.153748500079525
|
||||
7,6.277891544860601,6.231535480581984
|
||||
8,6.251803111067415,6.1931941764208736
|
||||
9,6.227573697961121,6.1739476390760775
|
||||
10,6.275304704672471,6.228755848821328
|
||||
11,6.2669886230789125,6.165582716902908
|
||||
12,6.26808624998182,6.197941570963178
|
||||
13,6.244138752343505,6.2477391894982786
|
||||
14,6.226582288056612,6.1598160204838734
|
||||
15,6.240093465811573,6.190422949109759
|
||||
16,6.226800398378633,6.153926610241131
|
||||
17,6.2345370232274755,6.151287238792498
|
||||
18,6.169805461343378,6.16343318907582
|
||||
19,6.222987758064456,6.15657544605586
|
||||
20,6.226215513777733,6.116615871896549
|
||||
21,6.1693542201332745,6.131834213733673
|
||||
22,6.179976459885388,6.121268307749106
|
||||
23,6.176489705658332,6.11803888705312
|
||||
24,6.173434678544104,6.119943920398245
|
||||
25,6.18218496430628,6.11959202433119
|
||||
26,6.18542869777903,6.127611294644219
|
||||
27,6.211702589542791,6.17060423977521
|
||||
28,6.180811053234339,6.125907942786509
|
||||
29,6.218389826829731,6.138787643155273
|
||||
30,6.196371093794331,6.146416762896947
|
||||
31,6.194937759243325,6.137278949557518
|
||||
32,6.19617432137914,6.117240555602677
|
||||
33,6.183772964755073,6.095507074229571
|
||||
34,6.194296064190194,6.1405050451901495
|
||||
35,6.1723592400137335,6.105950779063361
|
||||
36,6.174787776814774,6.0992349732652
|
||||
37,6.179421859563514,6.131365781170981
|
||||
38,6.206091299973801,6.113531703194793
|
||||
39,6.196451108533144,6.130458958051642
|
||||
40,6.175900908626988,6.109128736549494
|
||||
41,6.194291211727261,6.113921468866114
|
||||
42,6.174372794448212,6.096925194433757
|
||||
43,6.190038766126334,6.118585459577794
|
||||
44,6.198391766541452,6.129865546883369
|
||||
45,6.167305888028443,6.117949859998664
|
||||
46,6.191963202090934,6.125024612679773
|
||||
47,6.200261698535457,6.133790407521384
|
||||
48,6.168389475045353,6.152726840948572
|
||||
49,6.1765353440582755,6.131584920445267
|
||||
50,6.2232111624162645,6.099260960938979
|
||||
51,6.197849844019115,6.152765401674777
|
||||
52,6.204225545700453,6.084189028399331
|
||||
53,6.184185870978422,6.141087080483534
|
||||
54,6.17963873746153,6.105953947792248
|
||||
55,6.161203468418494,6.137251816039183
|
||||
56,6.196522537269443,6.117956788977798
|
||||
57,6.171350480107404,6.110516171017472
|
||||
58,6.190189365617,6.14508030217521
|
||||
59,6.173887720591202,6.144299315992667
|
||||
60,6.172330596484616,6.110945329617481
|
||||
61,6.208208630987257,6.116943371952797
|
||||
62,6.118527266681567,6.113059694937297
|
||||
63,6.16644262248762,6.105053848952664
|
||||
64,6.198519827031158,6.110208214083497
|
||||
65,6.159561837555282,6.1219949465138574
|
||||
66,6.154563741190173,6.131545047346426
|
||||
67,6.154950792051479,6.12576116649472
|
||||
68,6.166894907002337,6.118150112434309
|
||||
69,6.207425719556958,6.139764730662716
|
||||
70,6.158974324407429,6.1250802998153535
|
||||
71,6.177289243172854,6.1134030548650395
|
||||
72,6.177155931112543,6.133135798464016
|
||||
73,6.204698515486717,6.099255515945201
|
||||
74,6.142999435611628,6.093320363468053
|
||||
75,6.1286770001064985,6.118610319020797
|
||||
76,6.195084075715206,6.1120577796624636
|
||||
77,6.188022490739077,6.125176494997375
|
||||
78,6.173007156901806,6.131874611621
|
||||
79,6.169041640301794,6.136714840640827
|
||||
80,6.158187964781932,6.088659851551056
|
||||
81,6.124840645731054,6.073482194530721
|
||||
82,6.11550829057917,6.0627874065175344
|
||||
83,6.142901296035387,6.071128609764333
|
||||
84,6.134051843394339,6.059427362023568
|
||||
85,6.131704535519704,6.076201498459796
|
||||
86,6.131018532524816,6.066524108332031
|
||||
87,6.129356051648408,6.077126537366789
|
||||
88,6.119815099205821,6.066800057766389
|
||||
89,6.135372443350591,6.076053387072622
|
||||
90,6.143837644260377,6.0712920576455645
|
||||
91,6.128566016821563,6.073624474102137
|
||||
92,6.127937101376616,6.072794317153035
|
||||
93,6.107040068831481,6.070746099608285
|
||||
94,6.114805160044693,6.065637336336836
|
||||
95,6.0848008258445185,6.077359912711747
|
||||
96,6.120510688652285,6.07685818613792
|
||||
97,6.121718007607199,6.077014412150091
|
||||
98,6.138589511550031,6.0753674102559385
|
||||
99,6.14711362022683,6.072682453807519
|
||||
0,5.459651564954003,4.123017392645077
|
||||
1,4.404505207740189,3.7492737925782498
|
||||
2,4.218900287955977,4.037354177358199
|
||||
3,4.113248738984554,3.6995143434952715
|
||||
4,4.0643949411929805,3.5690866557919247
|
||||
5,4.025469528097724,3.5282344099940084
|
||||
6,4.0457330943727445,3.557335000719343
|
||||
7,3.9765755680961963,3.4300471677585525
|
||||
8,3.9456495689745363,3.430577338277077
|
||||
9,3.915462172458488,3.672232191513996
|
||||
10,3.908008457826215,3.3030753601813805
|
||||
11,3.882619246215927,3.3868675887827973
|
||||
12,3.8698588563729848,3.4117591986364246
|
||||
13,3.8457697521586476,3.333784429394469
|
||||
14,3.829451695427381,3.344102716445923
|
||||
15,3.829135890411407,3.4384743941560085
|
||||
16,3.8185325401698296,3.2764437681314895
|
||||
17,3.811728405532498,3.3070398575919016
|
||||
18,3.8193986879577944,3.3324410565045417
|
||||
19,3.8045236736306745,3.2836114091289286
|
||||
20,3.7939121548769945,3.187099374848969
|
||||
21,3.7803091051688136,3.2528416243377998
|
||||
22,3.7838520857264712,3.217187368042615
|
||||
23,3.7684858936724477,3.1406039195158044
|
||||
24,3.7643722954691268,3.263700286222964
|
||||
25,3.764850974855544,3.3180053218530148
|
||||
26,3.762447085700412,3.216772758328185
|
||||
27,3.7541283817721425,3.2951196048697646
|
||||
28,3.7537426449527898,3.2274136846892687
|
||||
29,3.747767803558582,3.1421889299275922
|
||||
30,3.7473671009458767,3.2217603358443903
|
||||
31,3.7368718368898564,3.1999600636229224
|
||||
32,3.7414575453173877,3.2511923570049053
|
||||
33,3.7405168471323598,3.144407963266178
|
||||
34,3.7356711041752884,3.186742841662193
|
||||
35,3.7488686152503843,3.2660124095605343
|
||||
36,3.732873413276759,3.1449162584421586
|
||||
37,3.721955984815495,3.2085911212648663
|
||||
38,3.7279464378143787,3.1323465240244963
|
||||
39,3.7387744518253365,3.342787105404601
|
||||
40,3.7224520102824785,3.129538404892902
|
||||
41,3.729480919002857,3.1475857473879443
|
||||
42,3.7171482909510605,3.2321544336786077
|
||||
43,3.7299101766625076,3.268986857667261
|
||||
44,3.7153538607347905,3.2292985235914893
|
||||
45,3.710317350398387,3.193034345860384
|
||||
46,3.7254012536669867,3.2186355395219763
|
||||
0,5.233777807004288,3.754912186447455
|
||||
1,4.155797876627342,3.3218673182506953
|
||||
2,3.9582901459886544,3.3553187786802954
|
||||
3,3.854809833242829,3.0148061607321917
|
||||
4,3.774991210410646,2.9839462475873986
|
||||
5,3.7435560775237327,2.944469572962547
|
||||
6,3.690073798276478,2.9661303657414964
|
||||
7,3.688818962637007,3.0197564858806376
|
||||
8,3.6428677247337298,2.9265284273575762
|
||||
9,3.6327320056962416,2.859319869936729
|
||||
10,3.5973561518366832,2.8247028306065776
|
||||
0,3.5895758565068245,2.878430740298057
|
||||
1,3.57021487172842,2.891466063382674
|
||||
2,3.5890834237337113,2.810688891021573
|
||||
3,3.5526506499171258,2.8535458037318016
|
||||
4,3.557103377425671,2.819768620607804
|
||||
5,3.5469331957697867,2.8853546847129357
|
||||
6,3.5281683020591736,2.8020037130433684
|
||||
7,3.5180265937447546,2.760811321102843
|
||||
8,3.502203633582592,2.810385329772015
|
||||
9,3.4997954434514047,2.799852768936936
|
||||
10,3.4855113650679588,3.1717292711686116
|
||||
11,3.4743440980196,2.7339165886080994
|
||||
12,3.478519773006439,2.7316556148139797
|
||||
13,3.470843297624588,2.7262015390396117
|
||||
14,3.468432496213913,2.74120696826857
|
||||
15,3.4654084459781647,2.9365515257387744
|
||||
16,3.4587265821099282,2.7102422758024565
|
||||
17,3.442611147677898,2.7034222851967327
|
||||
18,3.451740815258026,2.759180706374499
|
||||
19,3.4330322801709174,2.8289515509897347
|
||||
20,3.433631085264683,2.7586020536811984
|
||||
21,3.42998209284544,2.7625791699545723
|
||||
22,3.438184221959114,2.709267522266933
|
||||
23,3.443043806731701,2.69548738421226
|
||||
24,3.443766810965538,2.8324346519003107
|
||||
25,3.419732032930851,2.6618910677578986
|
||||
26,3.4263635221004485,2.670782587868827
|
||||
27,3.4174978087067602,2.6961302435154817
|
||||
28,3.4232222255468367,2.6739595896857127
|
||||
29,3.418972552371025,2.657863086875604
|
||||
30,3.4127560631990432,2.6398553523238824
|
||||
31,3.4195866104125976,2.6925482973760486
|
||||
32,3.426458643066883,2.678463907339135
|
||||
33,3.421615189695358,2.887123303413391
|
||||
34,3.409783862519264,2.6411300960852175
|
||||
35,3.4169762951254845,2.7320803192683627
|
||||
36,3.405258295547962,2.6527417176110406
|
||||
37,3.4024106793880464,2.6393382494790214
|
||||
38,3.4015627622365954,2.7767718032914765
|
||||
39,3.4063776480317114,2.685369894845145
|
||||
40,3.393300777006149,2.6538100697069753
|
||||
41,3.4112252692580225,2.6793857584194263
|
||||
42,3.4120474227547644,2.726417605049756
|
||||
43,3.3938982912421225,2.6654360306019687
|
||||
44,3.4041917283177376,2.743247573035104
|
||||
45,3.408869186782837,2.637516905920846
|
||||
46,3.3951859700441362,2.6712587169725066
|
||||
47,3.4072072798848154,2.649881097248622
|
||||
48,3.3960764342546463,2.700681756953804
|
||||
49,3.3881560341000556,2.6843594738901877
|
||||
50,3.389593525660038,2.6199345495262922
|
||||
51,3.382925266957283,2.6239259885281934
|
||||
52,3.3866692927956583,2.6355316166001925
|
||||
53,3.3969139186263084,2.72334972177233
|
||||
54,3.3867322647333147,2.7168657478021117
|
||||
55,3.3895327091932295,2.738639141296854
|
||||
56,3.3796878326773645,2.638875687462943
|
||||
57,3.3830816565036774,2.6640367179014244
|
||||
58,3.382064008331299,2.682919617380415
|
||||
59,3.3827162971138955,2.7199460838278946
|
||||
60,3.3851185901761056,2.6911930183488497
|
||||
61,3.3796840319156645,2.6435422468185426
|
||||
62,3.3814005301952363,2.710524764060974
|
||||
63,3.3771395704865457,2.6270531114266844
|
||||
64,3.4597128042459486,2.7650137408898803
|
||||
|
0
ssd_keras-master/misc_utils/__init__.py
Normal file
177
ssd_keras-master/misc_utils/tensor_sampling_utils.py
Normal file
@@ -0,0 +1,177 @@
|
||||
'''
|
||||
Utilities that are useful to sub- or up-sample weights tensors.
|
||||
|
||||
Copyright (C) 2018 Pierluigi Ferrari
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
'''
|
||||
|
||||
import numpy as np
|
||||
|
||||
def sample_tensors(weights_list, sampling_instructions, axes=None, init=None, mean=0.0, stddev=0.005):
|
||||
'''
|
||||
Can sub-sample and/or up-sample individual dimensions of the tensors in the given list
|
||||
of input tensors.
|
||||
|
||||
It is possible to sub-sample some dimensions and up-sample other dimensions at the same time.
|
||||
|
||||
The tensors in the list will be sampled consistently, i.e. for any given dimension that
|
||||
corresponds among all tensors in the list, the same elements will be picked for every tensor
|
||||
along that dimension.
|
||||
|
||||
For dimensions that are being sub-sampled, you can either provide a list of the indices
|
||||
that should be picked, or you can provide the number of elements to be sub-sampled, in which
|
||||
case the elements will be chosen at random.
|
||||
|
||||
For dimensions that are being up-sampled, "filler" elements will be insterted at random
|
||||
positions along the respective dimension. These filler elements will be initialized either
|
||||
with zero or from a normal distribution with selectable mean and standard deviation.
|
||||
|
||||
Arguments:
|
||||
weights_list (list): A list of Numpy arrays. Each array represents one of the tensors
|
||||
to be sampled. The tensor with the greatest number of dimensions must be the first
|
||||
element in the list. For example, in the case of the weights of a 2D convolutional
|
||||
layer, the kernel must be the first element in the list and the bias the second,
|
||||
not the other way around. For all tensors in the list after the first tensor, the
|
||||
lengths of each of their axes must identical to the length of some axis of the
|
||||
first tensor.
|
||||
sampling_instructions (list): A list that contains the sampling instructions for each
|
||||
dimension of the first tensor. If the first tensor has `n` dimensions, then this
|
||||
must be a list of length `n`. That means, sampling instructions for every dimension
|
||||
of the first tensor must still be given even if not all dimensions should be changed.
|
||||
The elements of this list can be either lists of integers or integers. If the sampling
|
||||
instruction for a given dimension is a list of integers, then these integers represent
|
||||
the indices of the elements of that dimension that will be sub-sampled. If the sampling
|
||||
instruction for a given dimension is an integer, then that number of elements will be
|
||||
sampled along said dimension. If the integer is greater than the number of elements
|
||||
of the input tensors in that dimension, that dimension will be up-sampled. If the integer
|
||||
is smaller than the number of elements of the input tensors in that dimension, that
|
||||
dimension will be sub-sampled. If the integer is equal to the number of elements
|
||||
of the input tensors in that dimension, that dimension will remain the same.
|
||||
axes (list, optional): Only relevant if `weights_list` contains more than one tensor.
|
||||
This list contains a list for each additional tensor in `weights_list` beyond the first.
|
||||
Each of these lists contains integers that determine to which axes of the first tensor
|
||||
the axes of the respective tensor correspond. For example, let the first tensor be a
|
||||
4D tensor and the second tensor in the list be a 2D tensor. If the first element of
|
||||
`axis` is the list `[2,3]`, then that means that the two axes of the second tensor
|
||||
correspond to the last two axes of the first tensor, in the same order. The point of
|
||||
this list is for the program to know, if a given dimension of the first tensor is to
|
||||
be sub- or up-sampled, which dimensions of the other tensors in the list must be
|
||||
sub- or up-sampled accordingly.
|
||||
init (list, optional): Only relevant for up-sampling. Must be `None` or a list of strings
|
||||
that determines for each tensor in `weights_list` how the newly inserted values should
|
||||
be initialized. The possible values are 'gaussian' for initialization from a normal
|
||||
distribution with the selected mean and standard deviation (see the following two arguments),
|
||||
or 'zeros' for zero-initialization. If `None`, all initializations default to
|
||||
'gaussian'.
|
||||
mean (float, optional): Only relevant for up-sampling. The mean of the values that will
|
||||
be inserted into the tensors at random in the case of up-sampling.
|
||||
stddev (float, optional): Only relevant for up-sampling. The standard deviation of the
|
||||
values that will be inserted into the tensors at random in the case of up-sampling.
|
||||
|
||||
Returns:
|
||||
A list containing the sampled tensors in the same order in which they were given.
|
||||
'''
|
||||
|
||||
first_tensor = weights_list[0]
|
||||
|
||||
if (not isinstance(sampling_instructions, (list, tuple))) or (len(sampling_instructions) != first_tensor.ndim):
|
||||
raise ValueError("The sampling instructions must be a list whose length is the number of dimensions of the first tensor in `weights_list`.")
|
||||
|
||||
if (not init is None) and len(init) != len(weights_list):
|
||||
raise ValueError("`init` must either be `None` or a list of strings that has the same length as `weights_list`.")
|
||||
|
||||
up_sample = [] # Store the dimensions along which we need to up-sample.
|
||||
out_shape = [] # Store the shape of the output tensor here.
|
||||
# Store two stages of the new (sub-sampled and/or up-sampled) weights tensors in the following two lists.
|
||||
subsampled_weights_list = [] # Tensors after sub-sampling, but before up-sampling (if any).
|
||||
upsampled_weights_list = [] # Sub-sampled tensors after up-sampling (if any), i.e. final output tensors.
|
||||
|
||||
# Create the slicing arrays from the sampling instructions.
|
||||
sampling_slices = []
|
||||
for i, sampling_inst in enumerate(sampling_instructions):
|
||||
if isinstance(sampling_inst, (list, tuple)):
|
||||
amax = np.amax(np.array(sampling_inst))
|
||||
if amax >= first_tensor.shape[i]:
|
||||
raise ValueError("The sample instructions for dimension {} contain index {}, which is greater than the length of that dimension.".format(i, amax))
|
||||
sampling_slices.append(np.array(sampling_inst))
|
||||
out_shape.append(len(sampling_inst))
|
||||
elif isinstance(sampling_inst, int):
|
||||
out_shape.append(sampling_inst)
|
||||
if sampling_inst == first_tensor.shape[i]:
|
||||
# Nothing to sample here, we're keeping the original number of elements along this axis.
|
||||
sampling_slice = np.arange(sampling_inst)
|
||||
sampling_slices.append(sampling_slice)
|
||||
elif sampling_inst < first_tensor.shape[i]:
|
||||
# We want to SUB-sample this dimension. Randomly pick `sample_inst` many elements from it.
|
||||
sampling_slice1 = np.array([0]) # We will always sample class 0, the background class.
|
||||
# Sample the rest of the classes.
|
||||
sampling_slice2 = np.sort(np.random.choice(np.arange(1, first_tensor.shape[i]), sampling_inst - 1, replace=False))
|
||||
sampling_slice = np.concatenate([sampling_slice1, sampling_slice2])
|
||||
sampling_slices.append(sampling_slice)
|
||||
else:
|
||||
# We want to UP-sample. Pick all elements from this dimension.
|
||||
sampling_slice = np.arange(first_tensor.shape[i])
|
||||
sampling_slices.append(sampling_slice)
|
||||
up_sample.append(i)
|
||||
else:
|
||||
raise ValueError("Each element of the sampling instructions must be either an integer or a list/tuple of integers, but received `{}`".format(type(sampling_inst)))
|
||||
|
||||
# Process the first tensor.
|
||||
subsampled_first_tensor = np.copy(first_tensor[np.ix_(*sampling_slices)])
|
||||
subsampled_weights_list.append(subsampled_first_tensor)
|
||||
|
||||
# Process the other tensors.
|
||||
if len(weights_list) > 1:
|
||||
for j in range(1, len(weights_list)):
|
||||
this_sampling_slices = [sampling_slices[i] for i in axes[j-1]] # Get the sampling slices for this tensor.
|
||||
subsampled_weights_list.append(np.copy(weights_list[j][np.ix_(*this_sampling_slices)]))
|
||||
|
||||
if up_sample:
|
||||
# Take care of the dimensions that are to be up-sampled.
|
||||
|
||||
out_shape = np.array(out_shape)
|
||||
|
||||
# Process the first tensor.
|
||||
if init is None or init[0] == 'gaussian':
|
||||
upsampled_first_tensor = np.random.normal(loc=mean, scale=stddev, size=out_shape)
|
||||
elif init[0] == 'zeros':
|
||||
upsampled_first_tensor = np.zeros(out_shape)
|
||||
else:
|
||||
raise ValueError("Valid initializations are 'gaussian' and 'zeros', but received '{}'.".format(init[0]))
|
||||
# Pick the indices of the elements in `upsampled_first_tensor` that should be occupied by `subsampled_first_tensor`.
|
||||
up_sample_slices = [np.arange(k) for k in subsampled_first_tensor.shape]
|
||||
for i in up_sample:
|
||||
# Randomly select across which indices of this dimension to scatter the elements of `new_weights_tensor` in this dimension.
|
||||
up_sample_slice1 = np.array([0])
|
||||
up_sample_slice2 = np.sort(np.random.choice(np.arange(1, upsampled_first_tensor.shape[i]), subsampled_first_tensor.shape[i] - 1, replace=False))
|
||||
up_sample_slices[i] = np.concatenate([up_sample_slice1, up_sample_slice2])
|
||||
upsampled_first_tensor[np.ix_(*up_sample_slices)] = subsampled_first_tensor
|
||||
upsampled_weights_list.append(upsampled_first_tensor)
|
||||
|
||||
# Process the other tensors
|
||||
if len(weights_list) > 1:
|
||||
for j in range(1, len(weights_list)):
|
||||
if init is None or init[j] == 'gaussian':
|
||||
upsampled_tensor = np.random.normal(loc=mean, scale=stddev, size=out_shape[axes[j-1]])
|
||||
elif init[j] == 'zeros':
|
||||
upsampled_tensor = np.zeros(out_shape[axes[j-1]])
|
||||
else:
|
||||
raise ValueError("Valid initializations are 'gaussian' and 'zeros', but received '{}'.".format(init[j]))
|
||||
this_up_sample_slices = [up_sample_slices[i] for i in axes[j-1]] # Get the up-sampling slices for this tensor.
|
||||
upsampled_tensor[np.ix_(*this_up_sample_slices)] = subsampled_weights_list[j]
|
||||
upsampled_weights_list.append(upsampled_tensor)
|
||||
|
||||
return upsampled_weights_list
|
||||
else:
|
||||
return subsampled_weights_list
|
||||