Face detection is a task of finding
the location of a face from a single image or a video and determining extend of
each face. Detecting multi-view faces in any image with a complex and noisy
background remain a challenging problem. In last few decades, many techniques
and algorithm have been applied to address such problems i.e. neural networks,
appearance based learning, view-based learning and AdaBoost all these have
recently played a crucial role. Most of these
techniques are appearance based on analytical analysis of non-face and face
samples. These techniques produces good results in detection of frontal faces
but detecting non-frontal faces still remains challenging issues. One of the
major challenges encountered by face detection is to handle random poses
variations the images. Fig.1 shows, in
real world, image faces have significant variations in orientation, facial
expression, pose, lighting conditions, etc. This paper reviews
various techniques and algorithms to detect faces with rotated and
non-frontal (profile) faces and investigates the development for system which
can reliably detect faces with different poses and orientation. As the number
of proposed techniques have increased so, the survey and evaluation have become
detection, Multiview, Boosting, Profile face, rotation invariant.
We can define Face detection as
the process of extracting faces from the given images. Hence, the system should
positively identify a certain region as a face. According to Yang et al. 1, 2
face detection can be described as the process of finding regions of the input
image where the faces are present. To accurately detect faces, the faces need
to be located and registered first to facilitate the further processing. We
understand that face detection plays an important and crucial role for the exploration
of any face detecting and processing systems.
Example of multi-view face images.
The face detection problem is challenging as it needs
to account for all possible variations caused by the change in illumination,
facial features, occlusions, etc. In spite of all these challenges, tremendous
progress has been observed in the last decade and many such systems have shown
impressive real-time performance.
Real world statistics shows that most of the faces in
images and videos are non-frontal 3. Detecting such multi view faces from
photos and videos remain a challenging problem due to the complex and
complicated variation within the multi view face classes. First approach for
real-time face detection 4 was proposed by Viola and Jones. This approach
utilized the boosting algorithm known as AdaBoost algorithm 5, which
identified a sequence of rectangle features that indicated the presence of a
face. But there are a many other techniques that can be accurately and
successfully applied to detect frontal faces in a wide variety of images 3, 6,
7, 8,. Of late i.e. during the
last decade a number of promising face detection algorithms for frontal face
detection have been developed and published.
Viola and Jones, too, 9 developed a fast frontal face detection system
which uses Haar like features. A cascade of boosting classifiers is built on an
over complete set of Haar like features that integrates tow design in same
framework, the feature selection and classifier. Fig.1. Shows the example of
face images with multi view faces. We can observe huge variations in pose, facial
expression, face orientation, lighting condition is clearly visible. Poggio 10
developed a classifier based on the difference feature vector which was
computed between the distribution based model and local image pattern.
The rest of the paper is organized as follows :
Section 2 contains how the Viola-Jones framework can be extended to the rotated
and profile faces. Section 3 contains the discussion on vector boosting method.
Section 4 provides a few other prominent multi view face detection methods.
Conclusions and future projections are
provided in section 5.
framework for face detector
There are various proposed approaches for face
detection in a wide variety of images. However, they can successfully detect
frontal upright faces but detecting rotated faces or pose variant faces
detection is difficult with these methods. Viola Jones proposed first method to
detect faces in the real world .
Jones Face Detector
The most important real world face detection in last few decades is the seminal work by
Viola and Jones 10, 11. The basic
principle of the Viola-Jones algorithm is to scan a sub-window across a given
input image. The conventional image processing approach would be to rescale the
input image into different sizes and then run the standard size detector
through these images. This approach is very time consuming due to the
calculation of the different size images.
The Viola-Jones face detector implemented three main ideas
that make it most practical to build a successful face detector that can run in
real time and these are: the integral image, classifier learning with AdaBoost,
and the attentional cascade structure for low false alarm.
Fig. 2. Illustration of the integral image and Haar-like
rectangle features (a-f)
B. The Integral Image
image which is a summed area table is an algorithm for quickly and efficiently
computing the sum of values in a rectangle sub window of a grid. The integral
image as constructed is as it follows:
Where ii(x; y) is the integral image at
pixel location (x; y) and i(x0; y0)
is the original image. With the help of integral image to compute the sum of
any rectangular area is extremely efficient, as shown in Fig. 2. The sum of the
pixels in the rectangle region ABCD can be calculated as:
The integral image can be used to compute simple Haar
like rectangular features as shown in Fig. 2.
C. Learning Algorithms
Boosting is a
method of finding a highly accurate hypothesis by combining various “weak”
hypotheses. An introduction on boosting, one can explore further from the
published paper as referred in the reference
12 and 13. The adaptive boosting (AdaBoost ) algorithm is generally
considered as the first step towards more practical boosting algorithms 14, 15.
et al 16, presents a generalized
version of AdaBoost algorithm given is Fig. 3 usually referred as Real Boost.
It has been advocated in various works 17, 18, 19, 20 that Real Boost yields
better performance than the original AdaBoost (with decision trees as the weak
learners) algorithm is often refered to as best of out-of-the-box classifier.
When used with decision tree learning, information gathers at each stage of the
Adaboost algorithm is fed into decision tree.
is total number of weak classifier to
example score ln(,
and are the number of positive and negative
examples in the data set.
For each Haar-like feature in the data set, look for the optimal
threshold H and confidence score and to minimize the Z score minimum.
Select the best feature with the minimum.
Fig. 3. AdaBoost Learning Pseudo
view face detection
Huang et al. 21 given a novel tree structured multi view face detector (MVFD), which uses
the “coarse to fine” strategy in which entire face space is divide into smaller subspaces. It presents the
extended boosting algorithm called vector boosting which covers large a large
range of the face apace. It covers rotation in plane (RIP) and rotation off
plane (ROP). Basically, it extends the Viola Jones framework which uses
separate cascade for different views and to train them. Fig. 5 shows a
number of few possible detector structures for multi-view face detection. From all
these structures, the most straightforward one is given in Fig. 5(a) which is a
parallel cascade structure proposed by Wu et al. 22. An individual classifier
is learned for each face view. For a given test window, it is passed to all the
classifiers. After a few nodes, a cascade with the highest score will finish
the classification and make the decision. One approach is the pyramid structure
that uses coarse-to-fine strategy to handle pose variance of ROP as shown in
Fig. 5(b). Due to the similarities that exist in various poses of faces, the
pyramid method consider them as one ensemble positive class to improve the efficiency of extracted
features. As a result, a sample that has to pass the parent node has to be sent
to all its child nodes (See fig.5(b) ), which considerably slows down the
decision making process.
Another approach, the decision tree method puts emphasis
upon the diversities between different poses and the tree works as a pose
Here decision tree significantly reduces the time spent on
(a) Parallel Cascade
(b) Pyramid (c) Tree
Different structures of MVFD
Another approach, the decision tree method 23,24,
puts emphasis upon the variations between different tree works and poses as a
pose estimator. With the imperative judgments made by the decision tree, it has
been observed that this significantly reduces the time spent on pose estimation
25. However, the results are somewhat unstable that make its generalization
ability not so well. To overcome WFS (Width-First-Search) tree structure to
balance these two aspects so as to enhance the detection in both accuracy and
speed. Huang et al. 21 and Lin and Liu 26 independently proposed a very
similar solution to this issue, which is named as vector boosting and
multiclass Bhattacharyya boost (MBHBoost), respectively. The idea is to have
vector valued output for each weak classifier, which will allow an example to
be pass into multiple subcategory classifiers during final results, testing and
the final results are fused from the vector output. Table-I gives most
prominent boosting schemes for face detection.
Table I. Face/object detection
schemes to address challenges in boosting learning.
Multiview face detection
Deep convolutional Neural Network
Convolutional neural network
(CNN) are very popular in the field of
computer vision. One of the reason is availably of large amount of training
data. Vaillant et al. 27 in 1994 have
applied neural networks for detecting faces in uncluttered images. They
designed a convolutional neural network that can be trained to detect the
presence or absence of a face in a given
image. This will scan the whole
image at all possible locations. Rowley
et al. 28 developed a neural network
for upright frontal face detection. later in 1998 29 the method was extended
for pose invariant face detection. Neural networks are adopted in most
of the applications such as issues in recognition of pattern, recognition of
character, recognition of the object and autonomous robot driving. The major
purpose of this network in the recognition of face is the training feasibility
of the system for capturing the difficult class of patterns in the face. Deep
convolution CNN are not only used for face detection but also for face
alignment 30. For obtaining the best performance of such method, it has to
highly tune number of nodes, layers, rates for learning and so on 31. The
drawback in the approach of a neural network is that when the quantity of
classes maximizes. In template matching, other templates for the face are
exploited from various prospects for characterizing single face. Such
algorithms are not cost effective and cannot be easily carried out as stated 32.
Farfade et al. 33 conducted a
research to examine multi-view detection of the face using deep CNN. Developed
framework does not need landmark or pose annotation and can identify faces in a
large choice of orientations with the help of a single model.