As a technical consultant, I’ve worked on various client projects delivering functional solutions in a range of industries. A recent project at a large corporate organisation not only took me out of my comfort zone, but also set me on the path of working on solving the many challenges of facial recognition.
This journey began when I was asked to solve a computer vision problem utilising OpenCV, a library of programming functions, and cascading classifiers, which use all information collected from an output from a given classifier as additional information for the next classifier in the cascade. In hindsight this technology stack was ill-suited to the needs of the project. Not knowing this at the time, we pushed this technology to its limits and achieved over 80% detection on average as well as near real-time recognition.
When the limits of these technologies ultimately became clear, a decision was taken to put the project on hold until an alternative solution could be found. After some research into possible alternate solutions, I decided that I would change to using a neural network based approach. In my own time I started studying object recognition using neural networks and began playing with the technology in order to understand how I could apply this to the original business case.
Neural networks are computing systems inspired by, but not necessarily identical to, animal brains. The benefit is that these networks can ‘learn’ to perform tasks such as image recognition through being shown positive and negative images. For example, training a neural network to seek out ‘heads’ in an image. The system is trained by manually labelling ‘head’ images and feeding the system negative images with no heads, and using the trained network to identify ‘heads’ in other images.
The biggest challenge is that it takes hundreds of thousands of images, if not more, to try and cater for the wide range of hairstyles, clothing, caps, glasses, lighting, and camera angles when training a neural network to recognise ‘head’ images.
I then came up with the idea of combining my neural network with a basic facial recognition system that would search for facial features in order to reduce the false positives. I theorised that with this approach I could more accurately identify false positives (negative images) and separate those from the positive images (images correctly identified as heads). This would in theory allow me to use those into further training of the network, while ensuring that I still retain enough variance to prevent overfitting and/or network bias. The goal of this theory was to improve the overall accuracy of the model, as well as tweak the network parameters to reduce the need to run on expensive hardware.
Following this I implemented my own version of Google’s FaceNet research paper. In a nutshell, their theory proposed using a convolutional neural network that would, in its simplest form, return a 128-dimensional vector embedding for each face. These embeddings could then be used in facial recognition, clustering and verification. The Google FaceNet method proved to be accurate and handled facial occlusion very well, specifically in support of clustering and verification. This in turn proved my theory, helped my network to more accurately detect heads and allowed me to optimise training.
During the course of this journey, it came to my attention that my client had a different business case to solve, one that could leverage some of what I had learned. The new business case was to create a proof of concept for a cost-effective solution that could accurately detect faces and run facial recognition in under 5 seconds. At the time, current off-the-shelf systems did not quite meet the client’s requirements in terms of cost-effectiveness.
Fortunately as I had already looked into combining basic facial detection into my solution, this enabled me to leverage the work I had already done as a foundation for the new project.
The most challenging part of this journey was not the facial recognition, but rather the ability to accurately and consistently detect a face. In the same way my previous work required detecting heads, the new system needs to be able to find faces accurately prior to running recognition. Low quality images, images with bad lighting as well as blurry images make it harder to detect a face in an image. These were significant challenges as the system cannot run facial recognition before detecting actual faces. The actual recognition requirement was solved by using the extracted facial embedding from the detected face and then passing this on to the FaceNet implementation where the embedding can be matched against a database.
To overcome this challenge, my solution was to break down the detection process into various stages, with each stage passing through its own specialised neural network. Each of these stages look for specific features, extract those and feed their results into the next stage. During the last stage, the final facial data is extracted as a 128-dimensional embedding and run through my recognition method. This approach not only improved the accuracy but also the overall detection of frontal, partial and occluded faces.
The approach I took achieved great results on a relatively modest Nvidia GT1030 graphics card. The system managed to average an impressive 1.4 seconds to accurately match an unknown face in an image to a sample database of approximately 1000 faces and 487 unique profiles.
As many organisations continue to explore the possibilities of new technologies, these kinds of projects will become increasingly prevalent. The experience of creating this proof of concept has been incredibly rewarding, and has sparked what I’m sure will be a life time of learning on this journey into the world of AI.
Written by Jason Elder, Senior Technical Consultant at Saratoga