Posts

Showing posts from February, 2018

Notes on "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks"

Image
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks by Ren et al (2016) Region proposal based approaches for object detection tends to have the state-of-the-art performances in many benchmark datasets. Faster R-CNN uses convolutional network to propose regions which result in sharing the computation. Therefore, the Faster R-CNN is faster than R-CNN and Fast R-CNN (this is not real time capable!). See the slides below which is taken from a course on convolutional neural network from coursera. Faster R-CNN uses two convolutional neural networks; one is for the region proposal while other is for detection. As shown below the classifier uses proposals meaning "Region Proposal Network" gives the classifier where to look. To generate region proposal network, a convolutional network over the feature maps is proposed in which the inputs are n by n spatial window of the input feature map (each slideing window is mapped to a lower dimensional featu

Notes on "Focal Loss for Dense Object Detection" (RetinaNet)

Image
Focal Loss for Dense Object Detection by Lin et al (2017) The central idea of this paper is a proposal for a new loss function to train one-stage detectors which works effectively for class imbalance problems (typically found in one-stage detectors such as SSD). The authors also introduce RetinaNet based on the proposed loss function which outperforms the state-of-the-art detection algorithms including the two-stage detectors. Focal loss function: pt is defined as p if y = 1 and p-1 otherwise. In this way, CE(pt) is the cross entropy where p is the probability score. As shown below, the cross entropy is denoted blue (gamma = 0). By adding the factor (see the focal loss FL(pt)) the loss function computes smaller value for the well classified examples and puts more focus on hard and misclassified eamples. In this way the easy background examples are dealt with, which enables the training of highly accurate desne object detection algorithms. Just like alpha-cross entropy is the comm

Notes on "Single Shot MultiBox Detector"

Notes on Single Shot MultiBox Detector by Liu et al (2016): This paper introduces Single Shot MultiBox Detector (SSD) which is a feedforward convolutional neural network that prodcues a fixed size collection of bounding boxes and scores for the instances of those bounding boxes, followed by a non minimal suppression step to produce the final detections. Compared to YOLO this algorithm has following features: Multi-scale feature maps: Unlike YOLO that operates with a single feature map, multi-sacle feature maps are used; convolution layers for detection are added with decreasing size which allows dectecting at multiple scales. Convolutional predictors for detection: Extra feature layers produce predictions for detection; Each feature layers (at fixed sizes) produce either the score predictions or the offsets in bounding boxes (absolute box positions relative to each feature map positions). Default boxes and aspect ratios: A set of default bounding boxes are associated with each f

Notes on "Speed/accuracy trade-offs for modern convolutional object detectors"

"Speed/accuracy trade-offs for modern convolutional object detectors" by Huang et al (2017). The aim of this paper is to evaluate modern convolutional object detection systems where the metrics is speed and accuracy. Separation between "meta-architectures" (Faster R-CNN, R-FCN and SSD) and base feature extractors (VGG, Residual Networks, Inception) are made to evaluate their different combinations. The paper uses three meta-architectures namely Single Shot Detector (SSD), Faster R-CNN and R-FCN. See the following posts to get familiar with these object detection algorithms. For these meta-architectures six feature extractors are considered and they are VGG-16. Resnet-101, Inception v2, Inception v3, Inception Resnet and MobileNet. Other architecture configurations such as the number of proposals and output stride settings for Resnet and Inception Resnet are also set up in a reasonable way (see the detail in the paper :P). Furthermore the loss function configura

Variational Autoencoders

Variational Autoencoders: 1. Scaling Variational Inference and Unbiased Estimate. See slide 1. Bayesian methods are thought to be mostly suited for small data sets as they are computationally expensive, and to be useful for extracting most information from the small data-set. This view has changed when Bayesian methods met deep learning. The learning goal of this post is also presented in the slide 2. The rest of the slides are focused on the concept of estimation being unbiased, as building the unbiased estimates for gradients of neural network can be essential. An estimate is called unbiased if its expected values equal to the true mean of the distribution which we want to approximate. Sometimes it is non-trivial to understand if the estimator is unbiased or not - one needs to reduce the particular problem in this case to just expected value of some function which is estimated with average of its samples. This idea of unbiased estimate for the MC estimation is illustrated in