Notes on "Single Shot MultiBox Detector"
Notes on Single Shot MultiBox Detector by Liu et al (2016):
This paper introduces Single Shot MultiBox Detector (SSD) which is a feedforward convolutional neural network that prodcues a fixed size collection of bounding boxes and scores for the instances of those bounding boxes, followed by a non minimal suppression step to produce the final detections. Compared to YOLO this algorithm has following features:
The data sets PASCAL VOC2007, PASCAL VOC2012 and COCO are used to compare SSD to various algorithms. SSD is found to be, atleast in the experiment set up the author used, competitive in both accuracy and speed to other state-of-the-art object detection algorithms.
Find the overview below.
Q&A:
Q1) Different overlap coefficients?
A) https://stats.stackexchange.com/questions/238684/what-are-the-difference-between-dice-jaccard-and-overlap-coefficients
This paper introduces Single Shot MultiBox Detector (SSD) which is a feedforward convolutional neural network that prodcues a fixed size collection of bounding boxes and scores for the instances of those bounding boxes, followed by a non minimal suppression step to produce the final detections. Compared to YOLO this algorithm has following features:
- Multi-scale feature maps: Unlike YOLO that operates with a single feature map, multi-sacle feature maps are used; convolution layers for detection are added with decreasing size which allows dectecting at multiple scales.
- Convolutional predictors for detection: Extra feature layers produce predictions for detection; Each feature layers (at fixed sizes) produce either the score predictions or the offsets in bounding boxes (absolute box positions relative to each feature map positions).
- Default boxes and aspect ratios: A set of default bounding boxes are associated with each feature map.
- Matching strategy: Using the Jacard overlap coefficient the ground truth box and the sets of default boxes are compared; all the boxes that are over 0.5 are taken to simplify the learning problem.
- Training objective: The overall objective is a weighted sum of the localization (Smooth L1 loss with offsets to the center, and the width and height of the bounding box) and the confidence loss (softmax over multiple classes confidences).
- Scales and aspect ratio for the boxes: Rules are presented for choosing the sets of default boxes in terms of scales and aspect ratio (it is done evenly spaced way). Optimal selection is still an open question.
- Hard negative mining: Because many sets of default boxes are tried out, there exists imbalance between negatives and positives for confidences (large negatives will exist). By picking the negative default boxes using confidences the ratio of 3:1 between negatives and positives are kept which was found to make the training more stable.
- Data augmentation: To increase the robustness with respect to the size and the shapes, the data is augmented by randomly patching the images.
The data sets PASCAL VOC2007, PASCAL VOC2012 and COCO are used to compare SSD to various algorithms. SSD is found to be, atleast in the experiment set up the author used, competitive in both accuracy and speed to other state-of-the-art object detection algorithms.
Find the overview below.
Q&A:
Q1) Different overlap coefficients?
A) https://stats.stackexchange.com/questions/238684/what-are-the-difference-between-dice-jaccard-and-overlap-coefficients
Comments
Post a Comment