This commit is contained in:
dl-desktop
2020-02-06 16:47:03 -03:00
parent 6328265287
commit b586f22bf0
318 changed files with 25111 additions and 664 deletions

1
keras-yolo3-master/.gitattributes vendored Executable file
View File

@@ -0,0 +1 @@
*.h5 filter=lfs diff=lfs merge=lfs -text

4
keras-yolo3-master/.gitignore vendored Executable file
View File

@@ -0,0 +1,4 @@
*.jpg
*.jpeg
*.weights
*.h5

21
keras-yolo3-master/LICENSE Executable file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2017 Ngoc Anh Huynh
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@@ -0,0 +1 @@
https://github.com/experiencor/keras-yolo3

113
keras-yolo3-master/README.md Executable file
View File

@@ -0,0 +1,113 @@
# YOLO3 (Detection, Training, and Evaluation)
## Dataset and Model
Dataset | mAP | Demo | Config | Model
:---:|:---:|:---:|:---:|:---:
Kangaroo Detection (1 class) (https://github.com/experiencor/kangaroo) | 95% | https://youtu.be/URO3UDHvoLY | check zoo | http://bit.do/ekQFj
Raccoon Detection (1 class) (https://github.com/experiencor/raccoon_dataset) | 98% | https://youtu.be/lxLyLIL7OsU | check zoo | http://bit.do/ekQFf
Red Blood Cell Detection (3 classes) (https://github.com/experiencor/BCCD_Dataset) | 84% | https://imgur.com/a/uJl2lRI | check zoo | http://bit.do/ekQFc
VOC (20 classes) (http://host.robots.ox.ac.uk/pascal/VOC/voc2012/) | 72% | https://youtu.be/0RmOI6hcfBI | check zoo | http://bit.do/ekQE5
## Todo list:
- [x] Yolo3 detection
- [x] Yolo3 training (warmup and multi-scale)
- [x] mAP Evaluation
- [x] Multi-GPU training
- [x] Evaluation on VOC
- [ ] Evaluation on COCO
- [ ] MobileNet, DenseNet, ResNet, and VGG backends
## Detection
Grab the pretrained weights of yolo3 from https://pjreddie.com/media/files/yolov3.weights.
```python yolo3_one_file_to_detect_them_all.py -w yolo3.weights -i dog.jpg```
## Training
### 1. Data preparation
Download the Raccoon dataset from from https://github.com/experiencor/raccoon_dataset.
Organize the dataset into 4 folders:
+ train_image_folder <= the folder that contains the train images.
+ train_annot_folder <= the folder that contains the train annotations in VOC format.
+ valid_image_folder <= the folder that contains the validation images.
+ valid_annot_folder <= the folder that contains the validation annotations in VOC format.
There is a one-to-one correspondence by file name between images and annotations. If the validation set is empty, the training set will be automatically splitted into the training set and validation set using the ratio of 0.8.
### 2. Edit the configuration file
The configuration file is a json file, which looks like this:
```python
{
"model" : {
"min_input_size": 352,
"max_input_size": 448,
"anchors": [10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326],
"labels": ["raccoon"]
},
"train": {
"train_image_folder": "/home/andy/data/raccoon_dataset/images/",
"train_annot_folder": "/home/andy/data/raccoon_dataset/anns/",
"train_times": 10, # the number of time to cycle through the training set, useful for small datasets
"pretrained_weights": "", # specify the path of the pretrained weights, but it's fine to start from scratch
"batch_size": 16, # the number of images to read in each batch
"learning_rate": 1e-4, # the base learning rate of the default Adam rate scheduler
"nb_epoch": 50, # number of epoches
"warmup_epochs": 3, # the number of initial epochs during which the sizes of the 5 boxes in each cell is forced to match the sizes of the 5 anchors, this trick seems to improve precision emperically
"ignore_thresh": 0.5,
"gpus": "0,1",
"saved_weights_name": "raccoon.h5",
"debug": true # turn on/off the line that prints current confidence, position, size, class losses and recall
},
"valid": {
"valid_image_folder": "",
"valid_annot_folder": "",
"valid_times": 1
}
}
```
The ```labels``` setting lists the labels to be trained on. Only images, which has labels being listed, are fed to the network. The rest images are simply ignored. By this way, a Dog Detector can easily be trained using VOC or COCO dataset by setting ```labels``` to ```['dog']```.
Download pretrained weights for backend at:
https://1drv.ms/u/s!ApLdDEW3ut5fgQXa7GzSlG-mdza6
**This weights must be put in the root folder of the repository. They are the pretrained weights for the backend only and will be loaded during model creation. The code does not work without this weights.**
### 3. Generate anchors for your dataset (optional)
`python gen_anchors.py -c config.json`
Copy the generated anchors printed on the terminal to the ```anchors``` setting in ```config.json```.
### 4. Start the training process
`python train.py -c config.json`
By the end of this process, the code will write the weights of the best model to file best_weights.h5 (or whatever name specified in the setting "saved_weights_name" in the config.json file). The training process stops when the loss on the validation set is not improved in 3 consecutive epoches.
### 5. Perform detection using trained weights on image, set of images, video, or webcam
`python predict.py -c config.json -i /path/to/image/or/video`
It carries out detection on the image and write the image with detected bounding boxes to the same folder.
## Evaluation
`python evaluate.py -c config.json`
Compute the mAP performance of the model defined in `saved_weights_name` on the validation dataset defined in `valid_image_folder` and `valid_annot_folder`.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

70
keras-yolo3-master/callbacks.py Executable file
View File

@@ -0,0 +1,70 @@
from keras.callbacks import TensorBoard, ModelCheckpoint
import tensorflow as tf
import numpy as np
class CustomTensorBoard(TensorBoard):
""" to log the loss after each batch
"""
def __init__(self, log_every=1, **kwargs):
super(CustomTensorBoard, self).__init__(**kwargs)
self.log_every = log_every
self.counter = 0
def on_batch_end(self, batch, logs=None):
self.counter+=1
if self.counter%self.log_every==0:
for name, value in logs.items():
if name in ['batch', 'size']:
continue
summary = tf.Summary()
summary_value = summary.value.add()
summary_value.simple_value = value.item()
summary_value.tag = name
self.writer.add_summary(summary, self.counter)
self.writer.flush()
super(CustomTensorBoard, self).on_batch_end(batch, logs)
class CustomModelCheckpoint(ModelCheckpoint):
""" to save the template model, not the multi-GPU model
"""
def __init__(self, model_to_save, **kwargs):
super(CustomModelCheckpoint, self).__init__(**kwargs)
self.model_to_save = model_to_save
def on_epoch_end(self, epoch, logs=None):
logs = logs or {}
self.epochs_since_last_save += 1
if self.epochs_since_last_save >= self.period:
self.epochs_since_last_save = 0
filepath = self.filepath.format(epoch=epoch + 1, **logs)
if self.save_best_only:
current = logs.get(self.monitor)
if current is None:
warnings.warn('Can save best model only with %s available, '
'skipping.' % (self.monitor), RuntimeWarning)
else:
if self.monitor_op(current, self.best):
if self.verbose > 0:
print('\nEpoch %05d: %s improved from %0.5f to %0.5f,'
' saving model to %s'
% (epoch + 1, self.monitor, self.best,
current, filepath))
self.best = current
if self.save_weights_only:
self.model_to_save.save_weights(filepath, overwrite=True)
else:
self.model_to_save.save(filepath, overwrite=True)
else:
if self.verbose > 0:
print('\nEpoch %05d: %s did not improve from %0.5f' %
(epoch + 1, self.monitor, self.best))
else:
if self.verbose > 0:
print('\nEpoch %05d: saving model to %s' % (epoch + 1, filepath))
if self.save_weights_only:
self.model_to_save.save_weights(filepath, overwrite=True)
else:
self.model_to_save.save(filepath, overwrite=True)
super(CustomModelCheckpoint, self).on_batch_end(epoch, logs)

BIN
keras-yolo3-master/callbacks.pyc Executable file

Binary file not shown.

View File

@@ -0,0 +1,49 @@
{
"model" : {
"min_input_size": 448,
"max_input_size": 448,
"anchors": [26,32, 45,119, 54,18, 94,59, 109,183, 200,21, 203,91, 210,253, 249,157],
"labels": ["Gun", "Knife", "Razor", "Shuriken"],
"backend": "full_yolo_backend.h5"
},
"train": {
"train_image_folder": "../Experimento_6/Training/images/",
"train_annot_folder": "../Experimento_6/Training/anns/",
"cache_name": "experimento_6_gpu.pkl",
"train_times": 1,
"batch_size": 2,
"learning_rate": 1e-4,
"nb_epochs": 100,
"warmup_epochs": 10,
"ignore_thresh": 0.5,
"gpus": "0,1",
"grid_scales": [1,1,1],
"obj_scale": 5,
"noobj_scale": 1,
"xywh_scale": 1,
"class_scale": 1,
"tensorboard_dir": "log_experimento_3_gpu",
"saved_weights_name": "../Experimento_5/Resultados_yolo3/full_yolo/experimento_5_yolo3_full_yolo.h5",
"debug": true
},
"valid": {
"valid_image_folder": "../Experimento_6/Training/images/",
"valid_annot_folder": "../Experimento_6/Training/anns/",
"cache_name": "val_6.pkl",
"valid_times": 1
},
"test": {
"test_image_folder": "../Experimento_3/Baggages/Testing_678/images/",
"test_annot_folder": "../Experimento_3/Baggages/Testing_678/anns/",
"cache_name": "experimento_3_test678.pkl",
"test_times": 1
}
}

View File

@@ -0,0 +1,49 @@
{
"model" : {
"min_input_size": 448,
"max_input_size": 448,
"anchors": [5,7, 10,14, 26,32, 45,119, 54,18, 94,59, 109,183, 200,21, 203,91],
"labels": ["1", "2", "3", "4", "5", "6", "7", "8"],
"backend": "full_yolo_backend.h5"
},
"train": {
"train_image_folder": "../model-definition/Train&Test_B/images/",
"train_annot_folder": "../model-definition/Train&Test_B/anns/",
"cache_name": "experimento_fault_gpu.pkl",
"train_times": 1,
"batch_size": 2,
"learning_rate": 1e-4,
"nb_epochs": 100,
"warmup_epochs": 10,
"ignore_thresh": 0.5,
"gpus": "0,1",
"grid_scales": [1,1,1],
"obj_scale": 5,
"noobj_scale": 1,
"xywh_scale": 1,
"class_scale": 1,
"tensorboard_dir": "log_experimento_fault_gpu",
"saved_weights_name": "../model-definition/experimento_yolo3_full_fault.h5",
"debug": true
},
"valid": {
"valid_image_folder": "../model-definition/Train&Test_B/images/",
"valid_annot_folder": "../model-definition/Train&Test_B/anns/",
"cache_name": "val_fault.pkl",
"valid_times": 1
},
"test": {
"test_image_folder": "../model-definition/Train&Test_B/images/",
"test_annot_folder": "../model-definition/Train&Test_B/anns/",
"cache_name": "test_fault.pkl",
"test_times": 1
}
}

View File

@@ -0,0 +1,49 @@
{
"model" : {
"min_input_size": 400,
"max_input_size": 400,
"anchors": [5,7, 10,14, 15, 15, 26,32, 45,119, 54,18, 94,59, 109,183, 200,21],
"labels": ["1"],
"backend": "full_yolo_backend.h5"
},
"train": {
"train_image_folder": "../Train&Test_S/Train/images/",
"train_annot_folder": "../Train&Test_S/Train/anns/",
"cache_name": "../Experimento_fault_1/Resultados_yolo3/full_yolo/experimento_fault_1_gpu.pkl",
"train_times": 1,
"batch_size": 2,
"learning_rate": 1e-4,
"nb_epochs": 200,
"warmup_epochs": 15,
"ignore_thresh": 0.5,
"gpus": "0,1",
"grid_scales": [1,1,1],
"obj_scale": 5,
"noobj_scale": 1,
"xywh_scale": 1,
"class_scale": 1,
"tensorboard_dir": "log_experimento_fault_gpu",
"saved_weights_name": "../Experimento_fault_1/Resultados_yolo3/full_yolo/experimento_yolo3_full_fault.h5",
"debug": true
},
"valid": {
"valid_image_folder": "../Train&Test_S/Test/images/",
"valid_annot_folder": "../Train&Test_S/Test/anns/",
"cache_name": "../Experimento_fault_1/Resultados_yolo3/full_yolo/val_fault_1.pkl",
"valid_times": 1
},
"test": {
"test_image_folder": "../Train&Test_S/Test/images/",
"test_annot_folder": "../Train&Test_S/Test/anns/",
"cache_name": "../Experimento_fault_1/Resultados_yolo3/full_yolo/test_fault_1.pkl",
"test_times": 1
}
}

View File

@@ -0,0 +1,49 @@
{
"model" : {
"min_input_size": 400,
"max_input_size": 400,
"anchors": [5,7, 10,14, 15, 15, 26,32, 45,119, 54,18, 94,59, 109,183, 200,21],
"labels": ["panel"],
"backend": "full_yolo_backend.h5"
},
"train": {
"train_image_folder": "../Train&Test_A/Train/images/",
"train_annot_folder": "../Train&Test_A/Train/anns/",
"cache_name": "../Resultados_yolo3_panel/experimento_fault_1_gpu.pkl",
"train_times": 1,
"batch_size": 2,
"learning_rate": 1e-4,
"nb_epochs": 200,
"warmup_epochs": 15,
"ignore_thresh": 0.5,
"gpus": "0,1",
"grid_scales": [1,1,1],
"obj_scale": 5,
"noobj_scale": 1,
"xywh_scale": 1,
"class_scale": 1,
"tensorboard_dir": "log_experimento_fault_gpu",
"saved_weights_name": "../Resultados_yolo3_panel/experimento_yolo3_full_panel.h5",
"debug": true
},
"valid": {
"valid_image_folder": "../Train&Test_A/Test/images/",
"valid_annot_folder": "../Train&Test_A/Test/anns/",
"cache_name": "../Resultados_yolo3_panel//val_fault_1.pkl",
"valid_times": 1
},
"test": {
"test_image_folder": "../Train&Test_A/Test/images/",
"test_annot_folder": "../Train&Test_A/Test/anns/",
"cache_name": "../Resultados_yolo3_panel//test_fault_1.pkl",
"test_times": 1
}
}

View File

@@ -0,0 +1,80 @@
#! /usr/bin/env python
import argparse
import os
import numpy as np
import json
from voc import parse_voc_annotation
from yolo import create_yolov3_model
from generator import BatchGenerator
from utils.utils import normalize, evaluate
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.optimizers import Adam
from keras.models import load_model
def _main_(args):
config_path = args.conf
with open(config_path) as config_buffer:
config = json.loads(config_buffer.read())
###############################
# Create the validation generator
###############################
valid_ints, labels = parse_voc_annotation(
config['test']['test_annot_folder'],
config['test']['test_image_folder'],
config['test']['cache_name'],
config['model']['labels']
)
labels = labels.keys() if len(config['model']['labels']) == 0 else config['model']['labels']
labels = sorted(labels)
valid_generator = BatchGenerator(
instances = valid_ints,
anchors = config['model']['anchors'],
labels = labels,
downsample = 32, # ratio between network input's size and network output's size, 32 for YOLOv3
max_box_per_image = 0,
batch_size = config['train']['batch_size'],
min_net_size = config['model']['min_input_size'],
max_net_size = config['model']['max_input_size'],
shuffle = True,
jitter = 0.0,
norm = normalize
)
###############################
# Load the model and do evaluation
###############################
os.environ['CUDA_VISIBLE_DEVICES'] = config['train']['gpus']
infer_model = load_model(config['train']['saved_weights_name'])
# compute mAP for all the classes
average_precisions = evaluate(infer_model, valid_generator)
# print the score
total_instances = []
precisions = []
print(average_precisions.items())
for label, (average_precision, num_annotations) in average_precisions.items():
print('{:.0f} instances of class'.format(num_annotations),
labels[label], 'with average precision: {:.4f}'.format(average_precision))
total_instances.append(num_annotations)
precisions.append(average_precision)
if sum(total_instances) == 0:
print('No test instances found.')
return
print('mAP using the weighted average of precisions among classes: {:.4f}'.format(sum([a * b for a, b in zip(total_instances, precisions)]) / sum(total_instances)))
print('mAP: {:.4f}'.format(sum(precisions) / sum(x > 0 for x in total_instances)))
if __name__ == '__main__':
argparser = argparse.ArgumentParser(description='Evaluate YOLO_v3 model on any dataset')
argparser.add_argument('-c', '--conf', help='path to configuration file')
args = argparser.parse_args()
_main_(args)

132
keras-yolo3-master/gen_anchors.py Executable file
View File

@@ -0,0 +1,132 @@
import random
import argparse
import numpy as np
from voc import parse_voc_annotation
import json
def IOU(ann, centroids):
w, h = ann
similarities = []
for centroid in centroids:
c_w, c_h = centroid
if c_w >= w and c_h >= h:
similarity = w*h/(c_w*c_h)
elif c_w >= w and c_h <= h:
similarity = w*c_h/(w*h + (c_w-w)*c_h)
elif c_w <= w and c_h >= h:
similarity = c_w*h/(w*h + c_w*(c_h-h))
else: #means both w,h are bigger than c_w and c_h respectively
similarity = (c_w*c_h)/(w*h)
similarities.append(similarity) # will become (k,) shape
return np.array(similarities)
def avg_IOU(anns, centroids):
n,d = anns.shape
sum = 0.
for i in range(anns.shape[0]):
sum+= max(IOU(anns[i], centroids))
return sum/n
def print_anchors(centroids):
out_string = ''
anchors = centroids.copy()
widths = anchors[:, 0]
sorted_indices = np.argsort(widths)
r = "anchors: ["
for i in sorted_indices:
out_string += str(int(anchors[i,0]*416)) + ',' + str(int(anchors[i,1]*416)) + ', '
print(out_string[:-2])
def run_kmeans(ann_dims, anchor_num):
ann_num = ann_dims.shape[0]
iterations = 0
prev_assignments = np.ones(ann_num)*(-1)
iteration = 0
old_distances = np.zeros((ann_num, anchor_num))
indices = [random.randrange(ann_dims.shape[0]) for i in range(anchor_num)]
centroids = ann_dims[indices]
anchor_dim = ann_dims.shape[1]
while True:
distances = []
iteration += 1
for i in range(ann_num):
d = 1 - IOU(ann_dims[i], centroids)
distances.append(d)
distances = np.array(distances) # distances.shape = (ann_num, anchor_num)
print("iteration {}: dists = {}".format(iteration, np.sum(np.abs(old_distances-distances))))
#assign samples to centroids
assignments = np.argmin(distances,axis=1)
if (assignments == prev_assignments).all() :
return centroids
#calculate new centroids
centroid_sums=np.zeros((anchor_num, anchor_dim), np.float)
for i in range(ann_num):
centroid_sums[assignments[i]]+=ann_dims[i]
for j in range(anchor_num):
centroids[j] = centroid_sums[j]/(np.sum(assignments==j) + 1e-6)
prev_assignments = assignments.copy()
old_distances = distances.copy()
def _main_(argv):
config_path = args.conf
num_anchors = args.anchors
with open(config_path) as config_buffer:
config = json.loads(config_buffer.read())
train_imgs, train_labels = parse_voc_annotation(
config['train']['train_annot_folder'],
config['train']['train_image_folder'],
config['train']['cache_name'],
config['model']['labels']
)
# run k_mean to find the anchors
annotation_dims = []
for image in train_imgs:
print(image['filename'])
for obj in image['object']:
relative_w = (float(obj['xmax']) - float(obj['xmin']))/image['width']
relatice_h = (float(obj["ymax"]) - float(obj['ymin']))/image['height']
annotation_dims.append(tuple(map(float, (relative_w,relatice_h))))
annotation_dims = np.array(annotation_dims)
centroids = run_kmeans(annotation_dims, num_anchors)
# write anchors to file
print('\naverage IOU for', num_anchors, 'anchors:', '%0.2f' % avg_IOU(annotation_dims, centroids))
print_anchors(centroids)
if __name__ == '__main__':
argparser = argparse.ArgumentParser()
argparser.add_argument(
'-c',
'--conf',
default='config.json',
help='path to configuration file')
argparser.add_argument(
'-a',
'--anchors',
default=9,
help='number of anchors to use')
args = argparser.parse_args()
_main_(args)

228
keras-yolo3-master/generator.py Executable file
View File

@@ -0,0 +1,228 @@
import cv2
import copy
import numpy as np
from keras.utils import Sequence
from utils.bbox import BoundBox, bbox_iou
from utils.image import apply_random_scale_and_crop, random_distort_image, random_flip, correct_bounding_boxes
class BatchGenerator(Sequence):
def __init__(self,
instances,
anchors,
labels,
downsample=32, # ratio between network input's size and network output's size, 32 for YOLOv3
max_box_per_image=30,
batch_size=1,
min_net_size=320,
max_net_size=608,
shuffle=True,
jitter=True,
norm=None
):
self.instances = instances
self.batch_size = batch_size
self.labels = labels
self.downsample = downsample
self.max_box_per_image = max_box_per_image
self.min_net_size = (min_net_size//self.downsample)*self.downsample
self.max_net_size = (max_net_size//self.downsample)*self.downsample
self.shuffle = shuffle
self.jitter = jitter
self.norm = norm
self.anchors = [BoundBox(0, 0, anchors[2*i], anchors[2*i+1]) for i in range(len(anchors)//2)]
self.net_h = 416
self.net_w = 416
if shuffle: np.random.shuffle(self.instances)
def __len__(self):
return int(np.ceil(float(len(self.instances))/self.batch_size))
def __getitem__(self, idx):
# get image input size, change every 10 batches
net_h, net_w = self._get_net_size(idx)
base_grid_h, base_grid_w = net_h//self.downsample, net_w//self.downsample
# determine the first and the last indices of the batch
l_bound = idx*self.batch_size
r_bound = (idx+1)*self.batch_size
if r_bound > len(self.instances):
r_bound = len(self.instances)
l_bound = r_bound - self.batch_size
x_batch = np.zeros((r_bound - l_bound, net_h, net_w, 3)) # input images
t_batch = np.zeros((r_bound - l_bound, 1, 1, 1, self.max_box_per_image, 4)) # list of groundtruth boxes
# initialize the inputs and the outputs
yolo_1 = np.zeros((r_bound - l_bound, 1*base_grid_h, 1*base_grid_w, len(self.anchors)//3, 4+1+len(self.labels))) # desired network output 1
yolo_2 = np.zeros((r_bound - l_bound, 2*base_grid_h, 2*base_grid_w, len(self.anchors)//3, 4+1+len(self.labels))) # desired network output 2
yolo_3 = np.zeros((r_bound - l_bound, 4*base_grid_h, 4*base_grid_w, len(self.anchors)//3, 4+1+len(self.labels))) # desired network output 3
yolos = [yolo_3, yolo_2, yolo_1]
dummy_yolo_1 = np.zeros((r_bound - l_bound, 1))
dummy_yolo_2 = np.zeros((r_bound - l_bound, 1))
dummy_yolo_3 = np.zeros((r_bound - l_bound, 1))
instance_count = 0
true_box_index = 0
# do the logic to fill in the inputs and the output
for train_instance in self.instances[l_bound:r_bound]:
# augment input image and fix object's position and size
img, all_objs = self._aug_image(train_instance, net_h, net_w)
for obj in all_objs:
# find the best anchor box for this object
max_anchor = None
max_index = -1
max_iou = -1
shifted_box = BoundBox(0,
0,
obj['xmax']-obj['xmin'],
obj['ymax']-obj['ymin'])
for i in range(len(self.anchors)):
anchor = self.anchors[i]
iou = bbox_iou(shifted_box, anchor)
if max_iou < iou:
max_anchor = anchor
max_index = i
max_iou = iou
# determine the yolo to be responsible for this bounding box
yolo = yolos[max_index//3]
grid_h, grid_w = yolo.shape[1:3]
# determine the position of the bounding box on the grid
center_x = .5*(obj['xmin'] + obj['xmax'])
center_x = center_x / float(net_w) * grid_w # sigma(t_x) + c_x
center_y = .5*(obj['ymin'] + obj['ymax'])
center_y = center_y / float(net_h) * grid_h # sigma(t_y) + c_y
# determine the sizes of the bounding box
w = np.log((obj['xmax'] - obj['xmin']) / float(max_anchor.xmax)) # t_w
h = np.log((obj['ymax'] - obj['ymin']) / float(max_anchor.ymax)) # t_h
box = [center_x, center_y, w, h]
# determine the index of the label
obj_indx = self.labels.index(obj['name'])
# determine the location of the cell responsible for this object
grid_x = int(np.floor(center_x))
grid_y = int(np.floor(center_y))
# assign ground truth x, y, w, h, confidence and class probs to y_batch
yolo[instance_count, grid_y, grid_x, max_index%3] = 0
yolo[instance_count, grid_y, grid_x, max_index%3, 0:4] = box
yolo[instance_count, grid_y, grid_x, max_index%3, 4 ] = 1.
yolo[instance_count, grid_y, grid_x, max_index%3, 5+obj_indx] = 1
# assign the true box to t_batch
true_box = [center_x, center_y, obj['xmax'] - obj['xmin'], obj['ymax'] - obj['ymin']]
t_batch[instance_count, 0, 0, 0, true_box_index] = true_box
true_box_index += 1
true_box_index = true_box_index % self.max_box_per_image
# assign input image to x_batch
if self.norm != None:
x_batch[instance_count] = self.norm(img)
else:
# plot image and bounding boxes for sanity check
for obj in all_objs:
cv2.rectangle(img, (obj['xmin'],obj['ymin']), (obj['xmax'],obj['ymax']), (255,0,0), 3)
cv2.putText(img, obj['name'],
(obj['xmin']+2, obj['ymin']+12),
0, 1.2e-3 * img.shape[0],
(0,255,0), 2)
x_batch[instance_count] = img
# increase instance counter in the current batch
instance_count += 1
return [x_batch, t_batch, yolo_1, yolo_2, yolo_3], [dummy_yolo_1, dummy_yolo_2, dummy_yolo_3]
def _get_net_size(self, idx):
if idx%10 == 0:
net_size = self.downsample*np.random.randint(self.min_net_size/self.downsample, \
self.max_net_size/self.downsample+1)
#print("resizing: ", net_size, net_size)
self.net_h, self.net_w = net_size, net_size
return self.net_h, self.net_w
def _aug_image(self, instance, net_h, net_w):
image_name = instance['filename']
image = cv2.imread(image_name) # RGB image
if image is None: print('Cannot find ', image_name)
image = image[:,:,::-1] # RGB image
image_h, image_w, _ = image.shape
# determine the amount of scaling and cropping
dw = self.jitter * image_w;
dh = self.jitter * image_h;
new_ar = (image_w + np.random.uniform(-dw, dw)) / (image_h + np.random.uniform(-dh, dh));
scale = np.random.uniform(0.25, 2);
if (new_ar < 1):
new_h = int(scale * net_h);
new_w = int(net_h * new_ar);
else:
new_w = int(scale * net_w);
new_h = int(net_w / new_ar);
dx = int(np.random.uniform(0, net_w - new_w));
dy = int(np.random.uniform(0, net_h - new_h));
# apply scaling and cropping
im_sized = apply_random_scale_and_crop(image, new_w, new_h, net_w, net_h, dx, dy)
# randomly distort hsv space
im_sized = random_distort_image(im_sized)
# randomly flip
flip = np.random.randint(2)
im_sized = random_flip(im_sized, flip)
# correct the size and pos of bounding boxes
all_objs = correct_bounding_boxes(instance['object'], new_w, new_h, net_w, net_h, dx, dy, flip, image_w, image_h)
return im_sized, all_objs
def on_epoch_end(self):
if self.shuffle: np.random.shuffle(self.instances)
def num_classes(self):
return len(self.labels)
def size(self):
return len(self.instances)
def get_anchors(self):
anchors = []
for anchor in self.anchors:
anchors += [anchor.xmax, anchor.ymax]
return anchors
def load_annotation(self, i):
annots = []
for obj in self.instances[i]['object']:
annot = [obj['xmin'], obj['ymin'], obj['xmax'], obj['ymax'], self.labels.index(obj['name'])]
annots += [annot]
if len(annots) == 0: annots = [[]]
return np.array(annots)
def load_image(self, i):
return cv2.imread(self.instances[i]['filename'])

BIN
keras-yolo3-master/generator.pyc Executable file

Binary file not shown.

Binary file not shown.

Binary file not shown.

144
keras-yolo3-master/predict.py Executable file
View File

@@ -0,0 +1,144 @@
#! /usr/bin/env python
import time
import os
import argparse
import json
import cv2
from utils.utils import get_yolo_boxes, makedirs
from utils.bbox import draw_boxes
from keras.models import load_model
from tqdm import tqdm
import numpy as np
def _main_(args):
config_path = args.conf
input_path = args.input
output_path = args.output
with open(config_path) as config_buffer:
config = json.load(config_buffer)
makedirs(output_path)
###############################
# Set some parameter
###############################
net_h, net_w = 416, 416 # a multiple of 32, the smaller the faster
obj_thresh, nms_thresh = 0.5, 0.45
###############################
# Load the model
###############################
os.environ['CUDA_VISIBLE_DEVICES'] = config['train']['gpus']
infer_model = load_model(config['train']['saved_weights_name'])
###############################
# Predict bounding boxes
###############################
if 'webcam' in input_path: # do detection on the first webcam
video_reader = cv2.VideoCapture(0)
# the main loop
batch_size = 1
images = []
while True:
ret_val, image = video_reader.read()
if ret_val == True: images += [image]
if (len(images)==batch_size) or (ret_val==False and len(images)>0):
batch_boxes = get_yolo_boxes(infer_model, images, net_h, net_w, config['model']['anchors'], obj_thresh, nms_thresh)
for i in range(len(images)):
draw_boxes(images[i], batch_boxes[i], config['model']['labels'], obj_thresh)
cv2.imshow('video with bboxes', images[i])
images = []
if cv2.waitKey(1) == 27:
break # esc to quit
cv2.destroyAllWindows()
elif input_path[-4:] == '.mp4': # do detection on a video
video_out = output_path + input_path.split('/')[-1]
video_reader = cv2.VideoCapture(input_path)
nb_frames = int(video_reader.get(cv2.CAP_PROP_FRAME_COUNT))
frame_h = int(video_reader.get(cv2.CAP_PROP_FRAME_HEIGHT))
frame_w = int(video_reader.get(cv2.CAP_PROP_FRAME_WIDTH))
video_writer = cv2.VideoWriter(video_out,
cv2.VideoWriter_fourcc(*'MPEG'),
50.0,
(frame_w, frame_h))
# the main loop
batch_size = 1
images = []
start_point = 0 #%
show_window = False
for i in tqdm(range(nb_frames)):
_, image = video_reader.read()
if (float(i+1)/nb_frames) > start_point/100.:
images += [image]
if (i%batch_size == 0) or (i == (nb_frames-1) and len(images) > 0):
# predict the bounding boxes
batch_boxes = get_yolo_boxes(infer_model, images, net_h, net_w, config['model']['anchors'], obj_thresh, nms_thresh)
for i in range(len(images)):
# draw bounding boxes on the image using labels
draw_boxes(images[i], batch_boxes[i], config['model']['labels'], obj_thresh)
# show the video with detection bounding boxes
if show_window: cv2.imshow('video with bboxes', images[i])
# write result to the output video
video_writer.write(images[i])
images = []
if show_window and cv2.waitKey(1) == 27: break # esc to quit
if show_window: cv2.destroyAllWindows()
video_reader.release()
video_writer.release()
else: # do detection on an image or a set of images
image_paths = []
if os.path.isdir(input_path):
for inp_file in os.listdir(input_path):
image_paths += [input_path + inp_file]
else:
image_paths += [input_path]
image_paths = [inp_file for inp_file in image_paths if (inp_file[-4:] in ['.jpg', '.png', 'JPEG'])]
# the main loop
times = []
for image_path in image_paths:
image = cv2.imread(image_path)
print(image_path)
start = time.time()
# predict the bounding boxes
boxes = get_yolo_boxes(infer_model, [image], net_h, net_w, config['model']['anchors'], obj_thresh, nms_thresh)[0]
print('Elapsed time = {}'.format(time.time() - start))
times.append(time.time() - start)
# draw bounding boxes on the image using labels
draw_boxes(image, boxes, config['model']['labels'], obj_thresh)
# write the image with bounding boxes to file
cv2.imwrite(output_path + image_path.split('/')[-1], np.uint8(image))
file = open(args.output + '/time.txt','w')
file.write('Tiempo promedio:' + str(np.mean(times)))
file.close()
if __name__ == '__main__':
argparser = argparse.ArgumentParser(description='Predict with a trained yolo model')
argparser.add_argument('-c', '--conf', help='path to configuration file')
argparser.add_argument('-i', '--input', help='path to an image, a directory of images, a video, or webcam')
argparser.add_argument('-o', '--output', default='output/', help='path to output directory')
args = argparser.parse_args()
_main_(args)

294
keras-yolo3-master/train.py Executable file
View File

@@ -0,0 +1,294 @@
#! /usr/bin/env python
import argparse
import os
import numpy as np
import json
from voc import parse_voc_annotation
from yolo import create_yolov3_model, dummy_loss
from generator import BatchGenerator
from utils.utils import normalize, evaluate, makedirs
from keras.callbacks import EarlyStopping, ReduceLROnPlateau
from keras.optimizers import Adam
from callbacks import CustomModelCheckpoint, CustomTensorBoard
from utils.multi_gpu_model import multi_gpu_model
import tensorflow as tf
import keras
from keras.models import load_model
def create_training_instances(
train_annot_folder,
train_image_folder,
train_cache,
valid_annot_folder,
valid_image_folder,
valid_cache,
labels,
):
# parse annotations of the training set
train_ints, train_labels = parse_voc_annotation(train_annot_folder, train_image_folder, train_cache, labels)
# parse annotations of the validation set, if any, otherwise split the training set
if os.path.exists(valid_annot_folder):
valid_ints, valid_labels = parse_voc_annotation(valid_annot_folder, valid_image_folder, valid_cache, labels)
else:
print("valid_annot_folder not exists. Spliting the trainining set.")
train_valid_split = int(0.8*len(train_ints))
np.random.seed(0)
np.random.shuffle(train_ints)
np.random.seed()
valid_ints = train_ints[train_valid_split:]
train_ints = train_ints[:train_valid_split]
# compare the seen labels with the given labels in config.json
if len(labels) > 0:
overlap_labels = set(labels).intersection(set(train_labels.keys()))
print('Seen labels: \t' + str(train_labels) + '\n')
print('Given labels: \t' + str(labels))
# return None, None, None if some given label is not in the dataset
if len(overlap_labels) < len(labels):
print('Some labels have no annotations! Please revise the list of labels in the config.json.')
return None, None, None
else:
print('No labels are provided. Train on all seen labels.')
print(train_labels)
labels = train_labels.keys()
max_box_per_image = max([len(inst['object']) for inst in (train_ints + valid_ints)])
return train_ints, valid_ints, sorted(labels), max_box_per_image
def create_callbacks(saved_weights_name, tensorboard_logs, model_to_save):
makedirs(tensorboard_logs)
early_stop = EarlyStopping(
monitor = 'loss',
min_delta = 0.01,
patience = 25,
mode = 'min',
verbose = 1
)
checkpoint = CustomModelCheckpoint(
model_to_save = model_to_save,
filepath = saved_weights_name,# + '{epoch:02d}.h5',
monitor = 'loss',
verbose = 1,
save_best_only = True,
mode = 'min',
period = 1
)
reduce_on_plateau = ReduceLROnPlateau(
monitor = 'loss',
factor = 0.5,
patience = 15,
verbose = 1,
mode = 'min',
epsilon = 0.01,
cooldown = 0,
min_lr = 0
)
tensorboard = CustomTensorBoard(
log_dir = tensorboard_logs,
write_graph = True,
write_images = True,
)
return [early_stop, checkpoint, reduce_on_plateau, tensorboard]
def create_model(
nb_class,
anchors,
max_box_per_image,
max_grid, batch_size,
warmup_batches,
ignore_thresh,
multi_gpu,
saved_weights_name,
lr,
grid_scales,
obj_scale,
noobj_scale,
xywh_scale,
class_scale,
backend
):
if multi_gpu > 1:
with tf.device('/cpu:0'):
template_model, infer_model = create_yolov3_model(
nb_class = nb_class,
anchors = anchors,
max_box_per_image = max_box_per_image,
max_grid = max_grid,
batch_size = batch_size//multi_gpu,
warmup_batches = warmup_batches,
ignore_thresh = ignore_thresh,
grid_scales = grid_scales,
obj_scale = obj_scale,
noobj_scale = noobj_scale,
xywh_scale = xywh_scale,
class_scale = class_scale
)
else:
template_model, infer_model = create_yolov3_model(
nb_class = nb_class,
anchors = anchors,
max_box_per_image = max_box_per_image,
max_grid = max_grid,
batch_size = batch_size,
warmup_batches = warmup_batches,
ignore_thresh = ignore_thresh,
grid_scales = grid_scales,
obj_scale = obj_scale,
noobj_scale = noobj_scale,
xywh_scale = xywh_scale,
class_scale = class_scale
)
# load the pretrained weight if exists, otherwise load the backend weight only
if os.path.exists(saved_weights_name):
print("\nLoading pretrained weights.\n")
template_model.load_weights(saved_weights_name)
else:
template_model.load_weights(backend, by_name=True)
if multi_gpu > 1:
train_model = multi_gpu_model(template_model, gpus=multi_gpu)
else:
train_model = template_model
optimizer = Adam(lr=lr, clipnorm=0.001)
train_model.compile(loss=dummy_loss, optimizer=optimizer)
return train_model, infer_model
def _main_(args):
config_path = args.conf
with open(config_path) as config_buffer:
config = json.loads(config_buffer.read())
###############################
# Parse the annotations
###############################
train_ints, valid_ints, labels, max_box_per_image = create_training_instances(
config['train']['train_annot_folder'],
config['train']['train_image_folder'],
config['train']['cache_name'],
config['valid']['valid_annot_folder'],
config['valid']['valid_image_folder'],
config['valid']['cache_name'],
config['model']['labels']
)
print('\nTraining on: \t' + str(labels) + '\n')
###############################
# Create the generators
###############################
train_generator = BatchGenerator(
instances = train_ints,
anchors = config['model']['anchors'],
labels = labels,
downsample = 32, # ratio between network input's size and network output's size, 32 for YOLOv3
max_box_per_image = max_box_per_image,
batch_size = config['train']['batch_size'],
min_net_size = config['model']['min_input_size'],
max_net_size = config['model']['max_input_size'],
shuffle = True,
jitter = 0.3,
norm = normalize
)
valid_generator = BatchGenerator(
instances = valid_ints,
anchors = config['model']['anchors'],
labels = labels,
downsample = 32, # ratio between network input's size and network output's size, 32 for YOLOv3
max_box_per_image = max_box_per_image,
batch_size = config['train']['batch_size'],
min_net_size = config['model']['min_input_size'],
max_net_size = config['model']['max_input_size'],
shuffle = True,
jitter = 0.0,
norm = normalize
)
###############################
# Create the model
###############################
if os.path.exists(config['train']['saved_weights_name']):
config['train']['warmup_epochs'] = 0
warmup_batches = config['train']['warmup_epochs'] * (config['train']['train_times']*len(train_generator))
os.environ['CUDA_VISIBLE_DEVICES'] = config['train']['gpus']
multi_gpu = len(config['train']['gpus'].split(','))
print('multi_gpu:' + str(multi_gpu))
train_model, infer_model = create_model(
nb_class = len(labels),
anchors = config['model']['anchors'],
max_box_per_image = max_box_per_image,
max_grid = [config['model']['max_input_size'], config['model']['max_input_size']],
batch_size = config['train']['batch_size'],
warmup_batches = warmup_batches,
ignore_thresh = config['train']['ignore_thresh'],
multi_gpu = multi_gpu,
saved_weights_name = config['train']['saved_weights_name'],
lr = config['train']['learning_rate'],
grid_scales = config['train']['grid_scales'],
obj_scale = config['train']['obj_scale'],
noobj_scale = config['train']['noobj_scale'],
xywh_scale = config['train']['xywh_scale'],
class_scale = config['train']['class_scale'],
backend = config['model']['backend']
)
###############################
# Kick off the training
###############################
callbacks = create_callbacks(config['train']['saved_weights_name'], config['train']['tensorboard_dir'], infer_model)
train_model.fit_generator(
generator = train_generator,
steps_per_epoch = len(train_generator) * config['train']['train_times'],
epochs = config['train']['nb_epochs'] + config['train']['warmup_epochs'],
verbose = 2 if config['train']['debug'] else 1,
callbacks = callbacks,
workers = 4,
max_queue_size = 8
)
# make a GPU version of infer_model for evaluation
if multi_gpu > 1:
infer_model = load_model(config['train']['saved_weights_name'])
###############################
# Run the evaluation
###############################
# compute mAP for all the classes
average_precisions = evaluate(infer_model, valid_generator)
# print the score
total_instances = []
precisions = []
for label, (average_precision, num_annotations) in average_precisions.items():
print('{:.0f} instances of class'.format(num_annotations),
labels[label], 'with average precision: {:.4f}'.format(average_precision))
total_instances.append(num_annotations)
precisions.append(average_precision)
if sum(total_instances) == 0:
print('No test instances found.')
return
print('mAP using the weighted average of precisions among classes: {:.4f}'.format(sum([a * b for a, b in zip(total_instances, precisions)]) / sum(total_instances)))
print('mAP: {:.4f}'.format(sum(precisions) / sum(x > 0 for x in total_instances)))
if __name__ == '__main__':
argparser = argparse.ArgumentParser(description='train and evaluate YOLO_v3 model on any dataset')
argparser.add_argument('-c', '--conf', help='path to configuration file')
args = argparser.parse_args()
_main_(args)

View File

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1,89 @@
import numpy as np
import os
import cv2
from .colors import get_color
class BoundBox:
def __init__(self, xmin, ymin, xmax, ymax, c = None, classes = None):
self.xmin = xmin
self.ymin = ymin
self.xmax = xmax
self.ymax = ymax
self.c = c
self.classes = classes
self.label = -1
self.score = -1
def get_label(self):
if self.label == -1:
self.label = np.argmax(self.classes)
return self.label
def get_score(self):
if self.score == -1:
self.score = self.classes[self.get_label()]
return self.score
def _interval_overlap(interval_a, interval_b):
x1, x2 = interval_a
x3, x4 = interval_b
if x3 < x1:
if x4 < x1:
return 0
else:
return min(x2,x4) - x1
else:
if x2 < x3:
return 0
else:
return min(x2,x4) - x3
def bbox_iou(box1, box2):
intersect_w = _interval_overlap([box1.xmin, box1.xmax], [box2.xmin, box2.xmax])
intersect_h = _interval_overlap([box1.ymin, box1.ymax], [box2.ymin, box2.ymax])
intersect = intersect_w * intersect_h
w1, h1 = box1.xmax-box1.xmin, box1.ymax-box1.ymin
w2, h2 = box2.xmax-box2.xmin, box2.ymax-box2.ymin
union = w1*h1 + w2*h2 - intersect
return float(intersect) / union
def draw_boxes(image, boxes, labels, obj_thresh, quiet=True):
for box in boxes:
label_str = ''
label = -1
for i in range(len(labels)):
if box.classes[i] > obj_thresh:
if label_str != '': label_str += ', '
label_str += (labels[i] + ' ' + str(round(box.get_score()*100,0)) + '%')
label = i
if not quiet: print(label_str)
if label >= 0:
text_size = cv2.getTextSize(label_str, cv2.FONT_HERSHEY_SIMPLEX, 1.1e-4 * image.shape[0], 2)
width, height = text_size[0][0], text_size[0][1]
region = np.array([[box.xmin-3, box.ymin],
[box.xmin-3, box.ymin-height-16],
[box.xmin+width+6, box.ymin-height-16],
[box.xmin+width+6, box.ymin]], dtype='int32')
cv2.rectangle(img=image, pt1=(box.xmin,box.ymin), pt2=(box.xmax,box.ymax), color=get_color(label), thickness=1)
cv2.fillPoly(img=image, pts=[region], color=get_color(label))
cv2.putText(img=image,
text=label_str,
org=(box.xmin+6, box.ymin - 6),
fontFace=cv2.FONT_HERSHEY_SIMPLEX,
fontScale=0.7e-3 * image.shape[0],
color=(0,0,0),
thickness=2)
return image

BIN
keras-yolo3-master/utils/bbox.pyc Executable file

Binary file not shown.

View File

@@ -0,0 +1,96 @@
def get_color(label):
""" Return a color from a set of predefined colors. Contains 80 colors in total.
code originally from https://github.com/fizyr/keras-retinanet/
Args
label: The label to get the color for.
Returns
A list of three values representing a RGB color.
"""
if label < len(colors):
return colors[label]
else:
print('Label {} has no color, returning default.'.format(label))
return (0, 255, 0)
colors = [
[31 , 0 , 255] ,
[0 , 159 , 255] ,
[255 , 95 , 0] ,
[255 , 19 , 0] ,
[255 , 0 , 0] ,
[255 , 38 , 0] ,
[0 , 255 , 25] ,
[255 , 0 , 133] ,
[255 , 172 , 0] ,
[108 , 0 , 255] ,
[0 , 82 , 255] ,
[0 , 255 , 6] ,
[255 , 0 , 152] ,
[223 , 0 , 255] ,
[12 , 0 , 255] ,
[0 , 255 , 178] ,
[108 , 255 , 0] ,
[184 , 0 , 255] ,
[255 , 0 , 76] ,
[146 , 255 , 0] ,
[51 , 0 , 255] ,
[0 , 197 , 255] ,
[255 , 248 , 0] ,
[255 , 0 , 19] ,
[255 , 0 , 38] ,
[89 , 255 , 0] ,
[127 , 255 , 0] ,
[255 , 153 , 0] ,
[0 , 255 , 255] ,
[0 , 255 , 216] ,
[0 , 255 , 121] ,
[255 , 0 , 248] ,
[70 , 0 , 255] ,
[0 , 255 , 159] ,
[0 , 216 , 255] ,
[0 , 6 , 255] ,
[0 , 63 , 255] ,
[31 , 255 , 0] ,
[255 , 57 , 0] ,
[255 , 0 , 210] ,
[0 , 255 , 102] ,
[242 , 255 , 0] ,
[255 , 191 , 0] ,
[0 , 255 , 63] ,
[255 , 0 , 95] ,
[146 , 0 , 255] ,
[184 , 255 , 0] ,
[255 , 114 , 0] ,
[0 , 255 , 235] ,
[255 , 229 , 0] ,
[0 , 178 , 255] ,
[255 , 0 , 114] ,
[255 , 0 , 57] ,
[0 , 140 , 255] ,
[0 , 121 , 255] ,
[12 , 255 , 0] ,
[255 , 210 , 0] ,
[0 , 255 , 44] ,
[165 , 255 , 0] ,
[0 , 25 , 255] ,
[0 , 255 , 140] ,
[0 , 101 , 255] ,
[0 , 255 , 82] ,
[223 , 255 , 0] ,
[242 , 0 , 255] ,
[89 , 0 , 255] ,
[165 , 0 , 255] ,
[70 , 255 , 0] ,
[255 , 0 , 172] ,
[255 , 76 , 0] ,
[203 , 255 , 0] ,
[204 , 0 , 255] ,
[255 , 0 , 229] ,
[255 , 133 , 0] ,
[127 , 0 , 255] ,
[0 , 235 , 255] ,
[0 , 255 , 197] ,
[255 , 0 , 191] ,
[0 , 44 , 255] ,
[50 , 255 , 0]
]

Binary file not shown.

View File

@@ -0,0 +1,86 @@
import cv2
import numpy as np
import copy
def _rand_scale(scale):
scale = np.random.uniform(1, scale)
return scale if (np.random.randint(2) == 0) else 1./scale;
def _constrain(min_v, max_v, value):
if value < min_v: return min_v
if value > max_v: return max_v
return value
def random_flip(image, flip):
if flip == 1: return cv2.flip(image, 1)
return image
def correct_bounding_boxes(boxes, new_w, new_h, net_w, net_h, dx, dy, flip, image_w, image_h):
boxes = copy.deepcopy(boxes)
# randomize boxes' order
np.random.shuffle(boxes)
# correct sizes and positions
sx, sy = float(new_w)/image_w, float(new_h)/image_h
zero_boxes = []
for i in range(len(boxes)):
boxes[i]['xmin'] = int(_constrain(0, net_w, boxes[i]['xmin']*sx + dx))
boxes[i]['xmax'] = int(_constrain(0, net_w, boxes[i]['xmax']*sx + dx))
boxes[i]['ymin'] = int(_constrain(0, net_h, boxes[i]['ymin']*sy + dy))
boxes[i]['ymax'] = int(_constrain(0, net_h, boxes[i]['ymax']*sy + dy))
if boxes[i]['xmax'] <= boxes[i]['xmin'] or boxes[i]['ymax'] <= boxes[i]['ymin']:
zero_boxes += [i]
continue
if flip == 1:
swap = boxes[i]['xmin'];
boxes[i]['xmin'] = net_w - boxes[i]['xmax']
boxes[i]['xmax'] = net_w - swap
boxes = [boxes[i] for i in range(len(boxes)) if i not in zero_boxes]
return boxes
def random_distort_image(image, hue=18, saturation=1.5, exposure=1.5):
# determine scale factors
dhue = np.random.uniform(-hue, hue)
dsat = _rand_scale(saturation);
dexp = _rand_scale(exposure);
# convert RGB space to HSV space
image = cv2.cvtColor(image, cv2.COLOR_RGB2HSV).astype('float')
# change satuation and exposure
image[:,:,1] *= dsat
image[:,:,2] *= dexp
# change hue
image[:,:,0] += dhue
image[:,:,0] -= (image[:,:,0] > 180)*180
image[:,:,0] += (image[:,:,0] < 0) *180
# convert back to RGB from HSV
return cv2.cvtColor(image.astype('uint8'), cv2.COLOR_HSV2RGB)
def apply_random_scale_and_crop(image, new_w, new_h, net_w, net_h, dx, dy):
im_sized = cv2.resize(image, (new_w, new_h))
if dx > 0:
im_sized = np.pad(im_sized, ((0,0), (dx,0), (0,0)), mode='constant', constant_values=127)
else:
im_sized = im_sized[:,-dx:,:]
if (new_w + dx) < net_w:
im_sized = np.pad(im_sized, ((0,0), (0, net_w - (new_w+dx)), (0,0)), mode='constant', constant_values=127)
if dy > 0:
im_sized = np.pad(im_sized, ((dy,0), (0,0), (0,0)), mode='constant', constant_values=127)
else:
im_sized = im_sized[-dy:,:,:]
if (new_h + dy) < net_h:
im_sized = np.pad(im_sized, ((0, net_h - (new_h+dy)), (0,0), (0,0)), mode='constant', constant_values=127)
return im_sized[:net_h, :net_w,:]

Binary file not shown.

View File

@@ -0,0 +1,62 @@
from keras.layers import Lambda, concatenate
from keras.models import Model
import tensorflow as tf
def multi_gpu_model(model, gpus):
if isinstance(gpus, (list, tuple)):
num_gpus = len(gpus)
target_gpu_ids = gpus
else:
num_gpus = gpus
target_gpu_ids = range(num_gpus)
def get_slice(data, i, parts):
shape = tf.shape(data)
batch_size = shape[:1]
input_shape = shape[1:]
step = batch_size // parts
if i == num_gpus - 1:
size = batch_size - step * i
else:
size = step
size = tf.concat([size, input_shape], axis=0)
stride = tf.concat([step, input_shape * 0], axis=0)
start = stride * i
return tf.slice(data, start, size)
all_outputs = []
for i in range(len(model.outputs)):
all_outputs.append([])
# Place a copy of the model on each GPU,
# each getting a slice of the inputs.
for i, gpu_id in enumerate(target_gpu_ids):
with tf.device('/gpu:%d' % gpu_id):
with tf.name_scope('replica_%d' % gpu_id):
inputs = []
# Retrieve a slice of the input.
for x in model.inputs:
input_shape = tuple(x.get_shape().as_list())[1:]
slice_i = Lambda(get_slice,
output_shape=input_shape,
arguments={'i': i,
'parts': num_gpus})(x)
inputs.append(slice_i)
# Apply model on slice
# (creating a model replica on the target device).
outputs = model(inputs)
if not isinstance(outputs, list):
outputs = [outputs]
# Save the outputs for merging back together later.
for o in range(len(outputs)):
all_outputs[o].append(outputs[o])
# Merge outputs on CPU.
with tf.device('/cpu:0'):
merged = []
for name, outputs in zip(model.output_names, all_outputs):
merged.append(concatenate(outputs,
axis=0, name=name))
return Model(model.inputs, merged)

Binary file not shown.

View File

@@ -0,0 +1,323 @@
import cv2
import numpy as np
import os
from .bbox import BoundBox, bbox_iou
from scipy.special import expit
def _sigmoid(x):
return expit(x)
def makedirs(path):
try:
os.makedirs(path)
except OSError:
if not os.path.isdir(path):
raise
def evaluate(model,
generator,
iou_threshold=0.5,
obj_thresh=0.5,
nms_thresh=0.45,
net_h=416,
net_w=416,
save_path=None):
""" Evaluate a given dataset using a given model.
code originally from https://github.com/fizyr/keras-retinanet
# Arguments
model : The model to evaluate.
generator : The generator that represents the dataset to evaluate.
iou_threshold : The threshold used to consider when a detection is positive or negative.
obj_thresh : The threshold used to distinguish between object and non-object
nms_thresh : The threshold used to determine whether two detections are duplicates
net_h : The height of the input image to the model, higher value results in better accuracy
net_w : The width of the input image to the model
save_path : The path to save images with visualized detections to.
# Returns
A dict mapping class names to mAP scores.
"""
# gather all detections and annotations
all_detections = [[None for i in range(generator.num_classes())] for j in range(generator.size())]
all_annotations = [[None for i in range(generator.num_classes())] for j in range(generator.size())]
for i in range(generator.size()):
raw_image = [generator.load_image(i)]
# make the boxes and the labels
pred_boxes = get_yolo_boxes(model, raw_image, net_h, net_w, generator.get_anchors(), obj_thresh, nms_thresh)[0]
score = np.array([box.get_score() for box in pred_boxes])
pred_labels = np.array([box.label for box in pred_boxes])
if len(pred_boxes) > 0:
pred_boxes = np.array([[box.xmin, box.ymin, box.xmax, box.ymax, box.get_score()] for box in pred_boxes])
else:
pred_boxes = np.array([[]])
# sort the boxes and the labels according to scores
score_sort = np.argsort(-score)
pred_labels = pred_labels[score_sort]
pred_boxes = pred_boxes[score_sort]
# copy detections to all_detections
for label in range(generator.num_classes()):
all_detections[i][label] = pred_boxes[pred_labels == label, :]
annotations = generator.load_annotation(i)
# copy detections to all_annotations
for label in range(generator.num_classes()):
all_annotations[i][label] = annotations[annotations[:, 4] == label, :4].copy()
# compute mAP by comparing all detections and all annotations
average_precisions = {}
for label in range(generator.num_classes()):
false_positives = np.zeros((0,))
true_positives = np.zeros((0,))
scores = np.zeros((0,))
num_annotations = 0.0
for i in range(generator.size()):
detections = all_detections[i][label]
annotations = all_annotations[i][label]
num_annotations += annotations.shape[0]
detected_annotations = []
for d in detections:
scores = np.append(scores, d[4])
if annotations.shape[0] == 0:
false_positives = np.append(false_positives, 1)
true_positives = np.append(true_positives, 0)
continue
overlaps = compute_overlap(np.expand_dims(d, axis=0), annotations)
assigned_annotation = np.argmax(overlaps, axis=1)
max_overlap = overlaps[0, assigned_annotation]
if max_overlap >= iou_threshold and assigned_annotation not in detected_annotations:
false_positives = np.append(false_positives, 0)
true_positives = np.append(true_positives, 1)
detected_annotations.append(assigned_annotation)
else:
false_positives = np.append(false_positives, 1)
true_positives = np.append(true_positives, 0)
# no annotations -> AP for this class is 0 (is this correct?)
if num_annotations == 0:
average_precisions[label] = 0
continue
# sort by score
indices = np.argsort(-scores)
false_positives = false_positives[indices]
true_positives = true_positives[indices]
# compute false positives and true positives
false_positives = np.cumsum(false_positives)
true_positives = np.cumsum(true_positives)
# compute recall and precision
recall = true_positives / num_annotations
precision = true_positives / np.maximum(true_positives + false_positives, np.finfo(np.float64).eps)
# compute average precision
average_precision = compute_ap(recall, precision)
average_precisions[label] = average_precision,num_annotations
return average_precisions
def correct_yolo_boxes(boxes, image_h, image_w, net_h, net_w):
if (float(net_w)/image_w) < (float(net_h)/image_h):
new_w = net_w
new_h = (image_h*net_w)/image_w
else:
new_h = net_w
new_w = (image_w*net_h)/image_h
for i in range(len(boxes)):
x_offset, x_scale = (net_w - new_w)/2./net_w, float(new_w)/net_w
y_offset, y_scale = (net_h - new_h)/2./net_h, float(new_h)/net_h
boxes[i].xmin = int((boxes[i].xmin - x_offset) / x_scale * image_w)
boxes[i].xmax = int((boxes[i].xmax - x_offset) / x_scale * image_w)
boxes[i].ymin = int((boxes[i].ymin - y_offset) / y_scale * image_h)
boxes[i].ymax = int((boxes[i].ymax - y_offset) / y_scale * image_h)
def do_nms(boxes, nms_thresh):
if len(boxes) > 0:
nb_class = len(boxes[0].classes)
else:
return
for c in range(nb_class):
sorted_indices = np.argsort([-box.classes[c] for box in boxes])
for i in range(len(sorted_indices)):
index_i = sorted_indices[i]
if boxes[index_i].classes[c] == 0: continue
for j in range(i+1, len(sorted_indices)):
index_j = sorted_indices[j]
if bbox_iou(boxes[index_i], boxes[index_j]) >= nms_thresh:
boxes[index_j].classes[c] = 0
def decode_netout(netout, anchors, obj_thresh, net_h, net_w):
grid_h, grid_w = netout.shape[:2]
nb_box = 3
netout = netout.reshape((grid_h, grid_w, nb_box, -1))
nb_class = netout.shape[-1] - 5
boxes = []
netout[..., :2] = _sigmoid(netout[..., :2])
netout[..., 4] = _sigmoid(netout[..., 4])
netout[..., 5:] = netout[..., 4][..., np.newaxis] * _softmax(netout[..., 5:])
netout[..., 5:] *= netout[..., 5:] > obj_thresh
for i in range(grid_h*grid_w):
row = i // grid_w
col = i % grid_w
for b in range(nb_box):
# 4th element is objectness score
objectness = netout[row, col, b, 4]
if(objectness <= obj_thresh): continue
# first 4 elements are x, y, w, and h
x, y, w, h = netout[row,col,b,:4]
x = (col + x) / grid_w # center position, unit: image width
y = (row + y) / grid_h # center position, unit: image height
w = anchors[2 * b + 0] * np.exp(w) / net_w # unit: image width
h = anchors[2 * b + 1] * np.exp(h) / net_h # unit: image height
# last elements are class probabilities
classes = netout[row,col,b,5:]
box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, objectness, classes)
boxes.append(box)
return boxes
def preprocess_input(image, net_h, net_w):
new_h, new_w, _ = image.shape
# determine the new size of the image
if (float(net_w)/new_w) < (float(net_h)/new_h):
new_h = (new_h * net_w)//new_w
new_w = net_w
else:
new_w = (new_w * net_h)//new_h
new_h = net_h
# resize the image to the new size
resized = cv2.resize(image[:,:,::-1]/255., (new_w, new_h))
# embed the image into the standard letter box
new_image = np.ones((net_h, net_w, 3)) * 0.5
new_image[(net_h-new_h)//2:(net_h+new_h)//2, (net_w-new_w)//2:(net_w+new_w)//2, :] = resized
new_image = np.expand_dims(new_image, 0)
return new_image
def normalize(image):
return image/255.
def get_yolo_boxes(model, images, net_h, net_w, anchors, obj_thresh, nms_thresh):
image_h, image_w, _ = images[0].shape
nb_images = len(images)
batch_input = np.zeros((nb_images, net_h, net_w, 3))
# preprocess the input
for i in range(nb_images):
batch_input[i] = preprocess_input(images[i], net_h, net_w)
# run the prediction
batch_output = model.predict_on_batch(batch_input)
batch_boxes = [None]*nb_images
for i in range(nb_images):
yolos = [batch_output[0][i], batch_output[1][i], batch_output[2][i]]
boxes = []
# decode the output of the network
for j in range(len(yolos)):
yolo_anchors = anchors[(2-j)*6:(3-j)*6] # config['model']['anchors']
boxes += decode_netout(yolos[j], yolo_anchors, obj_thresh, net_h, net_w)
# correct the sizes of the bounding boxes
correct_yolo_boxes(boxes, image_h, image_w, net_h, net_w)
# suppress non-maximal boxes
do_nms(boxes, nms_thresh)
batch_boxes[i] = boxes
return batch_boxes
def compute_overlap(a, b):
"""
Code originally from https://github.com/rbgirshick/py-faster-rcnn.
Parameters
----------
a: (N, 4) ndarray of float
b: (K, 4) ndarray of float
Returns
-------
overlaps: (N, K) ndarray of overlap between boxes and query_boxes
"""
area = (b[:, 2] - b[:, 0]) * (b[:, 3] - b[:, 1])
iw = np.minimum(np.expand_dims(a[:, 2], axis=1), b[:, 2]) - np.maximum(np.expand_dims(a[:, 0], 1), b[:, 0])
ih = np.minimum(np.expand_dims(a[:, 3], axis=1), b[:, 3]) - np.maximum(np.expand_dims(a[:, 1], 1), b[:, 1])
iw = np.maximum(iw, 0)
ih = np.maximum(ih, 0)
ua = np.expand_dims((a[:, 2] - a[:, 0]) * (a[:, 3] - a[:, 1]), axis=1) + area - iw * ih
ua = np.maximum(ua, np.finfo(float).eps)
intersection = iw * ih
return intersection / ua
def compute_ap(recall, precision):
""" Compute the average precision, given the recall and precision curves.
Code originally from https://github.com/rbgirshick/py-faster-rcnn.
# Arguments
recall: The recall curve (list).
precision: The precision curve (list).
# Returns
The average precision as computed in py-faster-rcnn.
"""
# correct AP calculation
# first append sentinel values at the end
mrec = np.concatenate(([0.], recall, [1.]))
mpre = np.concatenate(([0.], precision, [0.]))
# compute the precision envelope
for i in range(mpre.size - 1, 0, -1):
mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
# to calculate area under PR curve, look for points
# where X axis (recall) changes value
i = np.where(mrec[1:] != mrec[:-1])[0]
# and sum (\Delta recall) * prec
ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
return ap
def _softmax(x, axis=-1):
x = x - np.amax(x, axis, keepdims=True)
e_x = np.exp(x)
return e_x / e_x.sum(axis, keepdims=True)

Binary file not shown.

67
keras-yolo3-master/voc.py Executable file
View File

@@ -0,0 +1,67 @@
import numpy as np
import os
import xml.etree.ElementTree as ET
import pickle
def parse_voc_annotation(ann_dir, img_dir, cache_name, labels=[]):
if os.path.exists(cache_name):
with open(cache_name, 'rb') as handle:
cache = pickle.load(handle)
all_insts, seen_labels = cache['all_insts'], cache['seen_labels']
else:
all_insts = []
seen_labels = {}
for ann in sorted(os.listdir(ann_dir)):
img = {'object':[]}
try:
tree = ET.parse(ann_dir + ann)
except Exception as e:
print(e)
print('Ignore this bad annotation: ' + ann_dir + ann)
continue
for elem in tree.iter():
if 'filename' in elem.tag:
img['filename'] = img_dir + elem.text
if 'width' in elem.tag:
img['width'] = int(elem.text)
if 'height' in elem.tag:
img['height'] = int(elem.text)
if 'object' in elem.tag or 'part' in elem.tag:
obj = {}
for attr in list(elem):
if 'name' in attr.tag:
obj['name'] = attr.text
if obj['name'] in seen_labels:
seen_labels[obj['name']] += 1
else:
seen_labels[obj['name']] = 1
if len(labels) > 0 and obj['name'] not in labels:
break
else:
img['object'] += [obj]
if 'bndbox' in attr.tag:
for dim in list(attr):
if 'xmin' in dim.tag:
obj['xmin'] = int(round(float(dim.text)))
if 'ymin' in dim.tag:
obj['ymin'] = int(round(float(dim.text)))
if 'xmax' in dim.tag:
obj['xmax'] = int(round(float(dim.text)))
if 'ymax' in dim.tag:
obj['ymax'] = int(round(float(dim.text)))
if len(img['object']) > 0:
all_insts += [img]
cache = {'all_insts': all_insts, 'seen_labels': seen_labels}
with open(cache_name, 'wb') as handle:
pickle.dump(cache, handle, protocol=pickle.HIGHEST_PROTOCOL)
return all_insts, seen_labels

BIN
keras-yolo3-master/voc.pyc Executable file

Binary file not shown.

364
keras-yolo3-master/yolo.py Executable file
View File

@@ -0,0 +1,364 @@
from keras.layers import Conv2D, Input, BatchNormalization, LeakyReLU, ZeroPadding2D, UpSampling2D, Lambda
from keras.layers.merge import add, concatenate
from keras.models import Model
from keras.engine.topology import Layer
import tensorflow as tf
class YoloLayer(Layer):
def __init__(self, anchors, max_grid, batch_size, warmup_batches, ignore_thresh,
grid_scale, obj_scale, noobj_scale, xywh_scale, class_scale,
**kwargs):
# make the model settings persistent
self.ignore_thresh = ignore_thresh
self.warmup_batches = warmup_batches
self.anchors = tf.constant(anchors, dtype='float', shape=[1,1,1,3,2])
self.grid_scale = grid_scale
self.obj_scale = obj_scale
self.noobj_scale = noobj_scale
self.xywh_scale = xywh_scale
self.class_scale = class_scale
# make a persistent mesh grid
max_grid_h, max_grid_w = max_grid
cell_x = tf.to_float(tf.reshape(tf.tile(tf.range(max_grid_w), [max_grid_h]), (1, max_grid_h, max_grid_w, 1, 1)))
cell_y = tf.transpose(cell_x, (0,2,1,3,4))
self.cell_grid = tf.tile(tf.concat([cell_x,cell_y],-1), [batch_size, 1, 1, 3, 1])
super(YoloLayer, self).__init__(**kwargs)
def build(self, input_shape):
super(YoloLayer, self).build(input_shape) # Be sure to call this somewhere!
def call(self, x):
input_image, y_pred, y_true, true_boxes = x
# adjust the shape of the y_predict [batch, grid_h, grid_w, 3, 4+1+nb_class]
y_pred = tf.reshape(y_pred, tf.concat([tf.shape(y_pred)[:3], tf.constant([3, -1])], axis=0))
# initialize the masks
object_mask = tf.expand_dims(y_true[..., 4], 4)
# the variable to keep track of number of batches processed
batch_seen = tf.Variable(0.)
# compute grid factor and net factor
grid_h = tf.shape(y_true)[1]
grid_w = tf.shape(y_true)[2]
grid_factor = tf.reshape(tf.cast([grid_w, grid_h], tf.float32), [1,1,1,1,2])
net_h = tf.shape(input_image)[1]
net_w = tf.shape(input_image)[2]
net_factor = tf.reshape(tf.cast([net_w, net_h], tf.float32), [1,1,1,1,2])
"""
Adjust prediction
"""
pred_box_xy = (self.cell_grid[:,:grid_h,:grid_w,:,:] + tf.sigmoid(y_pred[..., :2])) # sigma(t_xy) + c_xy
pred_box_wh = y_pred[..., 2:4] # t_wh
pred_box_conf = tf.expand_dims(tf.sigmoid(y_pred[..., 4]), 4) # adjust confidence
pred_box_class = y_pred[..., 5:] # adjust class probabilities
"""
Adjust ground truth
"""
true_box_xy = y_true[..., 0:2] # (sigma(t_xy) + c_xy)
true_box_wh = y_true[..., 2:4] # t_wh
true_box_conf = tf.expand_dims(y_true[..., 4], 4)
true_box_class = tf.argmax(y_true[..., 5:], -1)
"""
Compare each predicted box to all true boxes
"""
# initially, drag all objectness of all boxes to 0
conf_delta = pred_box_conf - 0
# then, ignore the boxes which have good overlap with some true box
true_xy = true_boxes[..., 0:2] / grid_factor
true_wh = true_boxes[..., 2:4] / net_factor
true_wh_half = true_wh / 2.
true_mins = true_xy - true_wh_half
true_maxes = true_xy + true_wh_half
pred_xy = tf.expand_dims(pred_box_xy / grid_factor, 4)
pred_wh = tf.expand_dims(tf.exp(pred_box_wh) * self.anchors / net_factor, 4)
pred_wh_half = pred_wh / 2.
pred_mins = pred_xy - pred_wh_half
pred_maxes = pred_xy + pred_wh_half
intersect_mins = tf.maximum(pred_mins, true_mins)
intersect_maxes = tf.minimum(pred_maxes, true_maxes)
intersect_wh = tf.maximum(intersect_maxes - intersect_mins, 0.)
intersect_areas = intersect_wh[..., 0] * intersect_wh[..., 1]
true_areas = true_wh[..., 0] * true_wh[..., 1]
pred_areas = pred_wh[..., 0] * pred_wh[..., 1]
union_areas = pred_areas + true_areas - intersect_areas
iou_scores = tf.truediv(intersect_areas, union_areas)
best_ious = tf.reduce_max(iou_scores, axis=4)
conf_delta *= tf.expand_dims(tf.to_float(best_ious < self.ignore_thresh), 4)
"""
Compute some online statistics
"""
true_xy = true_box_xy / grid_factor
true_wh = tf.exp(true_box_wh) * self.anchors / net_factor
true_wh_half = true_wh / 2.
true_mins = true_xy - true_wh_half
true_maxes = true_xy + true_wh_half
pred_xy = pred_box_xy / grid_factor
pred_wh = tf.exp(pred_box_wh) * self.anchors / net_factor
pred_wh_half = pred_wh / 2.
pred_mins = pred_xy - pred_wh_half
pred_maxes = pred_xy + pred_wh_half
intersect_mins = tf.maximum(pred_mins, true_mins)
intersect_maxes = tf.minimum(pred_maxes, true_maxes)
intersect_wh = tf.maximum(intersect_maxes - intersect_mins, 0.)
intersect_areas = intersect_wh[..., 0] * intersect_wh[..., 1]
true_areas = true_wh[..., 0] * true_wh[..., 1]
pred_areas = pred_wh[..., 0] * pred_wh[..., 1]
union_areas = pred_areas + true_areas - intersect_areas
iou_scores = tf.truediv(intersect_areas, union_areas)
iou_scores = object_mask * tf.expand_dims(iou_scores, 4)
count = tf.reduce_sum(object_mask)
count_noobj = tf.reduce_sum(1 - object_mask)
detect_mask = tf.to_float((pred_box_conf*object_mask) >= 0.5)
class_mask = tf.expand_dims(tf.to_float(tf.equal(tf.argmax(pred_box_class, -1), true_box_class)), 4)
recall50 = tf.reduce_sum(tf.to_float(iou_scores >= 0.5 ) * detect_mask * class_mask) / (count + 1e-3)
recall75 = tf.reduce_sum(tf.to_float(iou_scores >= 0.75) * detect_mask * class_mask) / (count + 1e-3)
avg_iou = tf.reduce_sum(iou_scores) / (count + 1e-3)
avg_obj = tf.reduce_sum(pred_box_conf * object_mask) / (count + 1e-3)
avg_noobj = tf.reduce_sum(pred_box_conf * (1-object_mask)) / (count_noobj + 1e-3)
avg_cat = tf.reduce_sum(object_mask * class_mask) / (count + 1e-3)
"""
Warm-up training
"""
batch_seen = tf.assign_add(batch_seen, 1.)
true_box_xy, true_box_wh, xywh_mask = tf.cond(tf.less(batch_seen, self.warmup_batches+1),
lambda: [true_box_xy + (0.5 + self.cell_grid[:,:grid_h,:grid_w,:,:]) * (1-object_mask),
true_box_wh + tf.zeros_like(true_box_wh) * (1-object_mask),
tf.ones_like(object_mask)],
lambda: [true_box_xy,
true_box_wh,
object_mask])
"""
Compare each true box to all anchor boxes
"""
wh_scale = tf.exp(true_box_wh) * self.anchors / net_factor
wh_scale = tf.expand_dims(2 - wh_scale[..., 0] * wh_scale[..., 1], axis=4) # the smaller the box, the bigger the scale
xy_delta = xywh_mask * (pred_box_xy-true_box_xy) * wh_scale * self.xywh_scale
wh_delta = xywh_mask * (pred_box_wh-true_box_wh) * wh_scale * self.xywh_scale
conf_delta = object_mask * (pred_box_conf-true_box_conf) * self.obj_scale + (1-object_mask) * conf_delta * self.noobj_scale
class_delta = object_mask * \
tf.expand_dims(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=true_box_class, logits=pred_box_class), 4) * \
self.class_scale
loss_xy = tf.reduce_sum(tf.square(xy_delta), list(range(1,5)))
loss_wh = tf.reduce_sum(tf.square(wh_delta), list(range(1,5)))
loss_conf = tf.reduce_sum(tf.square(conf_delta), list(range(1,5)))
loss_class = tf.reduce_sum(class_delta, list(range(1,5)))
loss = loss_xy + loss_wh + loss_conf + loss_class
#loss = tf.Print(loss, [grid_h, avg_obj], message='avg_obj \t\t', summarize=1000)
#loss = tf.Print(loss, [grid_h, avg_noobj], message='avg_noobj \t\t', summarize=1000)
#loss = tf.Print(loss, [grid_h, avg_iou], message='avg_iou \t\t', summarize=1000)
#loss = tf.Print(loss, [grid_h, avg_cat], message='avg_cat \t\t', summarize=1000)
#loss = tf.Print(loss, [grid_h, recall50], message='recall50 \t', summarize=1000)
#loss = tf.Print(loss, [grid_h, recall75], message='recall75 \t', summarize=1000)
#loss = tf.Print(loss, [grid_h, count], message='count \t', summarize=1000)
#loss = tf.Print(loss, [grid_h, tf.reduce_sum(loss_xy),
# tf.reduce_sum(loss_wh),
# tf.reduce_sum(loss_conf),
# tf.reduce_sum(loss_class)], message='loss xy, wh, conf, class: \t', summarize=1000)
return loss*self.grid_scale
def compute_output_shape(self, input_shape):
return [(None, 1)]
def _conv_block(inp, convs, do_skip=True):
x = inp
count = 0
for conv in convs:
if count == (len(convs) - 2) and do_skip:
skip_connection = x
count += 1
if conv['stride'] > 1: x = ZeroPadding2D(((1,0),(1,0)))(x) # unlike tensorflow darknet prefer left and top paddings
x = Conv2D(conv['filter'],
conv['kernel'],
strides=conv['stride'],
padding='valid' if conv['stride'] > 1 else 'same', # unlike tensorflow darknet prefer left and top paddings
name='conv_' + str(conv['layer_idx']),
use_bias=False if conv['bnorm'] else True)(x)
if conv['bnorm']: x = BatchNormalization(epsilon=0.001, name='bnorm_' + str(conv['layer_idx']))(x)
if conv['leaky']: x = LeakyReLU(alpha=0.1, name='leaky_' + str(conv['layer_idx']))(x)
return add([skip_connection, x]) if do_skip else x
def create_yolov3_model(
nb_class,
anchors,
max_box_per_image,
max_grid,
batch_size,
warmup_batches,
ignore_thresh,
grid_scales,
obj_scale,
noobj_scale,
xywh_scale,
class_scale
):
input_image = Input(shape=(None, None, 3)) # net_h, net_w, 3
true_boxes = Input(shape=(1, 1, 1, max_box_per_image, 4))
true_yolo_1 = Input(shape=(None, None, len(anchors)//6, 4+1+nb_class)) # grid_h, grid_w, nb_anchor, 5+nb_class
true_yolo_2 = Input(shape=(None, None, len(anchors)//6, 4+1+nb_class)) # grid_h, grid_w, nb_anchor, 5+nb_class
true_yolo_3 = Input(shape=(None, None, len(anchors)//6, 4+1+nb_class)) # grid_h, grid_w, nb_anchor, 5+nb_class
# Layer 0 => 4
x = _conv_block(input_image, [{'filter': 32, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 0},
{'filter': 64, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 1},
{'filter': 32, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 2},
{'filter': 64, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 3}])
# Layer 5 => 8
x = _conv_block(x, [{'filter': 128, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 5},
{'filter': 64, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 6},
{'filter': 128, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 7}])
# Layer 9 => 11
x = _conv_block(x, [{'filter': 64, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 9},
{'filter': 128, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 10}])
# Layer 12 => 15
x = _conv_block(x, [{'filter': 256, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 12},
{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 13},
{'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 14}])
# Layer 16 => 36
for i in range(7):
x = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 16+i*3},
{'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 17+i*3}])
skip_36 = x
# Layer 37 => 40
x = _conv_block(x, [{'filter': 512, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 37},
{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 38},
{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 39}])
# Layer 41 => 61
for i in range(7):
x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 41+i*3},
{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 42+i*3}])
skip_61 = x
# Layer 62 => 65
x = _conv_block(x, [{'filter': 1024, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 62},
{'filter': 512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 63},
{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 64}])
# Layer 66 => 74
for i in range(3):
x = _conv_block(x, [{'filter': 512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 66+i*3},
{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 67+i*3}])
# Layer 75 => 79
x = _conv_block(x, [{'filter': 512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 75},
{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 76},
{'filter': 512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 77},
{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 78},
{'filter': 512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 79}], do_skip=False)
# Layer 80 => 82
pred_yolo_1 = _conv_block(x, [{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 80},
{'filter': (3*(5+nb_class)), 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 81}], do_skip=False)
loss_yolo_1 = YoloLayer(anchors[12:],
[1*num for num in max_grid],
batch_size,
warmup_batches,
ignore_thresh,
grid_scales[0],
obj_scale,
noobj_scale,
xywh_scale,
class_scale)([input_image, pred_yolo_1, true_yolo_1, true_boxes])
# Layer 83 => 86
x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 84}], do_skip=False)
x = UpSampling2D(2)(x)
x = concatenate([x, skip_61])
# Layer 87 => 91
x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 87},
{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 88},
{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 89},
{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 90},
{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 91}], do_skip=False)
# Layer 92 => 94
pred_yolo_2 = _conv_block(x, [{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 92},
{'filter': (3*(5+nb_class)), 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 93}], do_skip=False)
loss_yolo_2 = YoloLayer(anchors[6:12],
[2*num for num in max_grid],
batch_size,
warmup_batches,
ignore_thresh,
grid_scales[1],
obj_scale,
noobj_scale,
xywh_scale,
class_scale)([input_image, pred_yolo_2, true_yolo_2, true_boxes])
# Layer 95 => 98
x = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 96}], do_skip=False)
x = UpSampling2D(2)(x)
x = concatenate([x, skip_36])
# Layer 99 => 106
pred_yolo_3 = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 99},
{'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 100},
{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 101},
{'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 102},
{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 103},
{'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 104},
{'filter': (3*(5+nb_class)), 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 105}], do_skip=False)
loss_yolo_3 = YoloLayer(anchors[:6],
[4*num for num in max_grid],
batch_size,
warmup_batches,
ignore_thresh,
grid_scales[2],
obj_scale,
noobj_scale,
xywh_scale,
class_scale)([input_image, pred_yolo_3, true_yolo_3, true_boxes])
train_model = Model([input_image, true_boxes, true_yolo_1, true_yolo_2, true_yolo_3], [loss_yolo_1, loss_yolo_2, loss_yolo_3])
infer_model = Model(input_image, [pred_yolo_1, pred_yolo_2, pred_yolo_3])
return [train_model, infer_model]
def dummy_loss(y_true, y_pred):
return tf.sqrt(tf.reduce_sum(y_pred))

BIN
keras-yolo3-master/yolo.pyc Executable file

Binary file not shown.

View File

@@ -0,0 +1,434 @@
import argparse
import os
import numpy as np
from keras.layers import Conv2D, Input, BatchNormalization, LeakyReLU, ZeroPadding2D, UpSampling2D
from keras.layers.merge import add, concatenate
from keras.models import Model
import struct
import cv2
np.set_printoptions(threshold=np.nan)
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"
argparser = argparse.ArgumentParser(
description='test yolov3 network with coco weights')
argparser.add_argument(
'-w',
'--weights',
help='path to weights file')
argparser.add_argument(
'-i',
'--image',
help='path to image file')
class WeightReader:
def __init__(self, weight_file):
with open(weight_file, 'rb') as w_f:
major, = struct.unpack('i', w_f.read(4))
minor, = struct.unpack('i', w_f.read(4))
revision, = struct.unpack('i', w_f.read(4))
if (major*10 + minor) >= 2 and major < 1000 and minor < 1000:
w_f.read(8)
else:
w_f.read(4)
transpose = (major > 1000) or (minor > 1000)
binary = w_f.read()
self.offset = 0
self.all_weights = np.frombuffer(binary, dtype='float32')
def read_bytes(self, size):
self.offset = self.offset + size
return self.all_weights[self.offset-size:self.offset]
def load_weights(self, model):
for i in range(106):
try:
conv_layer = model.get_layer('conv_' + str(i))
print("loading weights of convolution #" + str(i))
if i not in [81, 93, 105]:
norm_layer = model.get_layer('bnorm_' + str(i))
size = np.prod(norm_layer.get_weights()[0].shape)
beta = self.read_bytes(size) # bias
gamma = self.read_bytes(size) # scale
mean = self.read_bytes(size) # mean
var = self.read_bytes(size) # variance
weights = norm_layer.set_weights([gamma, beta, mean, var])
if len(conv_layer.get_weights()) > 1:
bias = self.read_bytes(np.prod(conv_layer.get_weights()[1].shape))
kernel = self.read_bytes(np.prod(conv_layer.get_weights()[0].shape))
kernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))
kernel = kernel.transpose([2,3,1,0])
conv_layer.set_weights([kernel, bias])
else:
kernel = self.read_bytes(np.prod(conv_layer.get_weights()[0].shape))
kernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))
kernel = kernel.transpose([2,3,1,0])
conv_layer.set_weights([kernel])
except ValueError:
print("no convolution #" + str(i))
def reset(self):
self.offset = 0
class BoundBox:
def __init__(self, xmin, ymin, xmax, ymax, objness = None, classes = None):
self.xmin = xmin
self.ymin = ymin
self.xmax = xmax
self.ymax = ymax
self.objness = objness
self.classes = classes
self.label = -1
self.score = -1
def get_label(self):
if self.label == -1:
self.label = np.argmax(self.classes)
return self.label
def get_score(self):
if self.score == -1:
self.score = self.classes[self.get_label()]
return self.score
def _conv_block(inp, convs, skip=True):
x = inp
count = 0
for conv in convs:
if count == (len(convs) - 2) and skip:
skip_connection = x
count += 1
if conv['stride'] > 1: x = ZeroPadding2D(((1,0),(1,0)))(x) # peculiar padding as darknet prefer left and top
x = Conv2D(conv['filter'],
conv['kernel'],
strides=conv['stride'],
padding='valid' if conv['stride'] > 1 else 'same', # peculiar padding as darknet prefer left and top
name='conv_' + str(conv['layer_idx']),
use_bias=False if conv['bnorm'] else True)(x)
if conv['bnorm']: x = BatchNormalization(epsilon=0.001, name='bnorm_' + str(conv['layer_idx']))(x)
if conv['leaky']: x = LeakyReLU(alpha=0.1, name='leaky_' + str(conv['layer_idx']))(x)
return add([skip_connection, x]) if skip else x
def _interval_overlap(interval_a, interval_b):
x1, x2 = interval_a
x3, x4 = interval_b
if x3 < x1:
if x4 < x1:
return 0
else:
return min(x2,x4) - x1
else:
if x2 < x3:
return 0
else:
return min(x2,x4) - x3
def _sigmoid(x):
return 1. / (1. + np.exp(-x))
def bbox_iou(box1, box2):
intersect_w = _interval_overlap([box1.xmin, box1.xmax], [box2.xmin, box2.xmax])
intersect_h = _interval_overlap([box1.ymin, box1.ymax], [box2.ymin, box2.ymax])
intersect = intersect_w * intersect_h
w1, h1 = box1.xmax-box1.xmin, box1.ymax-box1.ymin
w2, h2 = box2.xmax-box2.xmin, box2.ymax-box2.ymin
union = w1*h1 + w2*h2 - intersect
return float(intersect) / union
def make_yolov3_model():
input_image = Input(shape=(None, None, 3))
# Layer 0 => 4
x = _conv_block(input_image, [{'filter': 32, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 0},
{'filter': 64, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 1},
{'filter': 32, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 2},
{'filter': 64, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 3}])
# Layer 5 => 8
x = _conv_block(x, [{'filter': 128, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 5},
{'filter': 64, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 6},
{'filter': 128, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 7}])
# Layer 9 => 11
x = _conv_block(x, [{'filter': 64, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 9},
{'filter': 128, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 10}])
# Layer 12 => 15
x = _conv_block(x, [{'filter': 256, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 12},
{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 13},
{'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 14}])
# Layer 16 => 36
for i in range(7):
x = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 16+i*3},
{'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 17+i*3}])
skip_36 = x
# Layer 37 => 40
x = _conv_block(x, [{'filter': 512, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 37},
{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 38},
{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 39}])
# Layer 41 => 61
for i in range(7):
x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 41+i*3},
{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 42+i*3}])
skip_61 = x
# Layer 62 => 65
x = _conv_block(x, [{'filter': 1024, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 62},
{'filter': 512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 63},
{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 64}])
# Layer 66 => 74
for i in range(3):
x = _conv_block(x, [{'filter': 512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 66+i*3},
{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 67+i*3}])
# Layer 75 => 79
x = _conv_block(x, [{'filter': 512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 75},
{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 76},
{'filter': 512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 77},
{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 78},
{'filter': 512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 79}], skip=False)
# Layer 80 => 82
yolo_82 = _conv_block(x, [{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 80},
{'filter': 255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 81}], skip=False)
# Layer 83 => 86
x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 84}], skip=False)
x = UpSampling2D(2)(x)
x = concatenate([x, skip_61])
# Layer 87 => 91
x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 87},
{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 88},
{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 89},
{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 90},
{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 91}], skip=False)
# Layer 92 => 94
yolo_94 = _conv_block(x, [{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 92},
{'filter': 255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 93}], skip=False)
# Layer 95 => 98
x = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 96}], skip=False)
x = UpSampling2D(2)(x)
x = concatenate([x, skip_36])
# Layer 99 => 106
yolo_106 = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 99},
{'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 100},
{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 101},
{'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 102},
{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 103},
{'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 104},
{'filter': 255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 105}], skip=False)
model = Model(input_image, [yolo_82, yolo_94, yolo_106])
return model
def preprocess_input(image, net_h, net_w):
new_h, new_w, _ = image.shape
# determine the new size of the image
if (float(net_w)/new_w) < (float(net_h)/new_h):
new_h = (new_h * net_w)/new_w
new_w = net_w
else:
new_w = (new_w * net_h)/new_h
new_h = net_h
# resize the image to the new size
resized = cv2.resize(image[:,:,::-1]/255., (int(new_w), int(new_h)))
# embed the image into the standard letter box
new_image = np.ones((net_h, net_w, 3)) * 0.5
new_image[int((net_h-new_h)//2):int((net_h+new_h)//2), int((net_w-new_w)//2):int((net_w+new_w)//2), :] = resized
new_image = np.expand_dims(new_image, 0)
return new_image
def decode_netout(netout, anchors, obj_thresh, nms_thresh, net_h, net_w):
grid_h, grid_w = netout.shape[:2]
nb_box = 3
netout = netout.reshape((grid_h, grid_w, nb_box, -1))
nb_class = netout.shape[-1] - 5
boxes = []
netout[..., :2] = _sigmoid(netout[..., :2])
netout[..., 4:] = _sigmoid(netout[..., 4:])
netout[..., 5:] = netout[..., 4][..., np.newaxis] * netout[..., 5:]
netout[..., 5:] *= netout[..., 5:] > obj_thresh
for i in range(grid_h*grid_w):
row = i / grid_w
col = i % grid_w
for b in range(nb_box):
# 4th element is objectness score
objectness = netout[int(row)][int(col)][b][4]
#objectness = netout[..., :4]
if(objectness.all() <= obj_thresh): continue
# first 4 elements are x, y, w, and h
x, y, w, h = netout[int(row)][int(col)][b][:4]
x = (col + x) / grid_w # center position, unit: image width
y = (row + y) / grid_h # center position, unit: image height
w = anchors[2 * b + 0] * np.exp(w) / net_w # unit: image width
h = anchors[2 * b + 1] * np.exp(h) / net_h # unit: image height
# last elements are class probabilities
classes = netout[int(row)][col][b][5:]
box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, objectness, classes)
#box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, None, classes)
boxes.append(box)
return boxes
def correct_yolo_boxes(boxes, image_h, image_w, net_h, net_w):
if (float(net_w)/image_w) < (float(net_h)/image_h):
new_w = net_w
new_h = (image_h*net_w)/image_w
else:
new_h = net_w
new_w = (image_w*net_h)/image_h
for i in range(len(boxes)):
x_offset, x_scale = (net_w - new_w)/2./net_w, float(new_w)/net_w
y_offset, y_scale = (net_h - new_h)/2./net_h, float(new_h)/net_h
boxes[i].xmin = int((boxes[i].xmin - x_offset) / x_scale * image_w)
boxes[i].xmax = int((boxes[i].xmax - x_offset) / x_scale * image_w)
boxes[i].ymin = int((boxes[i].ymin - y_offset) / y_scale * image_h)
boxes[i].ymax = int((boxes[i].ymax - y_offset) / y_scale * image_h)
def do_nms(boxes, nms_thresh):
if len(boxes) > 0:
nb_class = len(boxes[0].classes)
else:
return
for c in range(nb_class):
sorted_indices = np.argsort([-box.classes[c] for box in boxes])
for i in range(len(sorted_indices)):
index_i = sorted_indices[i]
if boxes[index_i].classes[c] == 0: continue
for j in range(i+1, len(sorted_indices)):
index_j = sorted_indices[j]
if bbox_iou(boxes[index_i], boxes[index_j]) >= nms_thresh:
boxes[index_j].classes[c] = 0
def draw_boxes(image, boxes, labels, obj_thresh):
for box in boxes:
label_str = ''
label = -1
for i in range(len(labels)):
if box.classes[i] > obj_thresh:
label_str += labels[i]
label = i
print(labels[i] + ': ' + str(box.classes[i]*100) + '%')
if label >= 0:
cv2.rectangle(image, (box.xmin,box.ymin), (box.xmax,box.ymax), (0,255,0), 3)
cv2.putText(image,
label_str + ' ' + str(box.get_score()),
(box.xmin, box.ymin - 13),
cv2.FONT_HERSHEY_SIMPLEX,
1e-3 * image.shape[0],
(0,255,0), 2)
return image
def _main_(args):
weights_path = args.weights
image_path = args.image
# set some parameters
net_h, net_w = 416, 416
obj_thresh, nms_thresh = 0.5, 0.45
anchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]]
labels = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck", \
"boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", \
"bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", \
"backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", \
"sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard", \
"tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", \
"apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", \
"chair", "sofa", "pottedplant", "bed", "diningtable", "toilet", "tvmonitor", "laptop", "mouse", \
"remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator", \
"book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"]
# make the yolov3 model to predict 80 classes on COCO
yolov3 = make_yolov3_model()
# load the weights trained on COCO into the model
weight_reader = WeightReader(weights_path)
weight_reader.load_weights(yolov3)
yolov3.save('yolo_infer_coco.h5')
# preprocess the image
image = cv2.imread(image_path)
image_h, image_w, _ = image.shape
new_image = preprocess_input(image, net_h, net_w)
# run the prediction
yolos = yolov3.predict(new_image)
boxes = []
for i in range(len(yolos)):
# decode the output of the network
boxes += decode_netout(yolos[i][0], anchors[i], obj_thresh, nms_thresh, net_h, net_w)
# correct the sizes of the bounding boxes
correct_yolo_boxes(boxes, image_h, image_w, net_h, net_w)
# suppress non-maximal boxes
do_nms(boxes, nms_thresh)
# draw bounding boxes on the image using labels
draw_boxes(image, boxes, labels, obj_thresh)
# write the image with bounding boxes to file
cv2.imwrite(image_path[:-4] + '_detected' + image_path[-4:], (image).astype('uint8'))
if __name__ == '__main__':
args = argparser.parse_args()
_main_(args)

View File

@@ -0,0 +1,40 @@
{
"model" : {
"min_input_size": 288,
"max_input_size": 448,
"anchors": [55,69, 75,234, 133,240, 136,129, 142,363, 203,290, 228,184, 285,359, 341,260],
"labels": ["kangaroo"]
},
"train": {
"train_image_folder": "/home/andy/Desktop/github/kangaroo/images/",
"train_annot_folder": "/home/andy/Desktop/github/kangaroo/annots/",
"cache_name": "kangaroo_train.pkl",
"train_times": 3,
"batch_size": 16,
"learning_rate": 1e-4,
"nb_epochs": 100,
"warmup_epochs": 3,
"ignore_thresh": 0.5,
"gpus": "0,1",
"grid_scales": [1,1,1],
"obj_scale": 5,
"noobj_scale": 1,
"xywh_scale": 1,
"class_scale": 1,
"tensorboard_dir": "log_kangaroo",
"saved_weights_name": "kangaroo.h5",
"debug": true
},
"valid": {
"valid_image_folder": "",
"valid_annot_folder": "",
"cache_name": "",
"valid_times": 1
}
}

View File

@@ -0,0 +1,40 @@
{
"model" : {
"min_input_size": 288,
"max_input_size": 448,
"anchors": [17,18, 28,24, 36,34, 42,44, 56,51, 72,66, 90,95, 92,154, 139,281],
"labels": ["raccoon"]
},
"train": {
"train_image_folder": "/home/andy/Desktop/github/raccoon_dataset/images/",
"train_annot_folder": "/home/andy/Desktop/github/raccoon_dataset/annotations/",
"cache_name": "raccoon_train.pkl",
"train_times": 3,
"batch_size": 16,
"learning_rate": 1e-4,
"nb_epochs": 100,
"warmup_epochs": 3,
"ignore_thresh": 0.5,
"gpus": "0,1",
"grid_scales": [1,1,1],
"obj_scale": 5,
"noobj_scale": 1,
"xywh_scale": 1,
"class_scale": 1,
"tensorboard_dir": "log_raccoon",
"saved_weights_name": "raccoon.h5",
"debug": true
},
"valid": {
"valid_image_folder": "",
"valid_annot_folder": "",
"cache_name": "",
"valid_times": 1
}
}

View File

@@ -0,0 +1,40 @@
{
"model" : {
"min_input_size": 224,
"max_input_size": 480,
"anchors": [25,33, 52,94, 56,71, 67,83, 68,98, 73,65, 81,96, 116,134, 147,182],
"labels": ["Platelets", "RBC", "WBC"]
},
"train": {
"train_image_folder": "/home/experiencor/data/BCCD_Dataset/BCCD/JPEGImages/",
"train_annot_folder": "/home/experiencor/data/BCCD_Dataset/BCCD/Annotations/",
"cache_name": "rbc_train.pkl",
"train_times": 3,
"batch_size": 16,
"learning_rate": 1e-4,
"nb_epochs": 100,
"warmup_epochs": 3,
"ignore_thresh": 0.5,
"gpus": "0,1",
"grid_scales": [1,1,1],
"obj_scale": 5,
"noobj_scale": 1,
"xywh_scale": 1,
"class_scale": 1,
"tensorboard_dir": "log_rbc",
"saved_weights_name": "rbc.h5",
"debug": true
},
"valid": {
"valid_image_folder": "",
"valid_annot_folder": "",
"cache_name": "",
"valid_times": 1
}
}

View File

@@ -0,0 +1,40 @@
{
"model" : {
"min_input_size": 224,
"max_input_size": 480,
"anchors": [24,34, 46,84, 68,185, 116,286, 122,97, 171,180, 214,327, 326,193, 359,359],
"labels": ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
},
"train": {
"train_image_folder": "/home/experiencor/data/pascal/train/images/",
"train_annot_folder": "/home/experiencor/data/pascal/train/annots/",
"cache_name": "voc_train.pkl",
"train_times": 1,
"batch_size": 8,
"learning_rate": 1e-5,
"nb_epochs": 100,
"warmup_epochs": 3,
"ignore_thresh": 0.5,
"gpus": "0",
"grid_scales": [1,1,1],
"obj_scale": 5,
"noobj_scale": 1,
"xywh_scale": 1,
"class_scale": 1,
"tensorboard_dir": "log_voc",
"saved_weights_name": "voc.h5",
"debug": true
},
"valid": {
"valid_image_folder": "/home/experiencor/data/pascal/valid/images/",
"valid_annot_folder": "/home/experiencor/data/pascal/valid/annots/",
"cache_name": "voc_valid.pkl",
"valid_times": 1
}
}