Face Detection Tutorial Using the Vision Framework for iOS

In this tutorial, you’ll learn how to use Vision for face detection of facial features and overlay the results on the camera feed in real time. By Yono Mittlefehldt.

Leave a rating/review
Download materials
Save for later
Share

Precise gene editing technology has been around since about 2012. So why don’t we all have super powers yet?!?

And what’s the greatest super power? No. Not flying. That’s far too dangerous.

The correct answer is laser heat vision!

Imagine what you could do with laser heat vision! You could save money on a microwave, easily light any candle in sight and don’t forget the ability to burn your initials into your woodworking projects. How cool would that be?

Well, apparently real life superpowers aren’t here yet, so you’ll have to deal with the next best thing. You’ll have to use your iPhone to give you pretend laser heat vision.

Fortunately, Apple has a framework that can help you out with this plan B.

In this tutorial, you’ll learn how to use the Vision framework to:

  • Create requests for face detection and detecting face landmarks.
  • Process these requests.
  • Overlay the results on the camera feed to get real-time, visual feedback.

Get ready to super power your brain and your eyes!

Getting Started

Click the Download Materials button at the top or bottom of this tutorial. Open the starter project and explore to your heart’s content.

Note: The starter projects uses the camera, which means you’ll get a crash if you try to run it in the Simulator. Make sure to run this tutorial on an actual device so you can see your lovely face!

Currently, the Face Lasers app doesn’t do a whole lot. Well, it does show you your beautiful mug!

There’s also a label at the bottom that reads Face. You may have noticed that if you tap the screen, this label changes to read Lasers.

That’s exciting! Except that there don’t seem to be any lasers. That’s less exciting. Don’t worry — by the end of this tutorial, you’ll be shooting lasers out of your eyes like Super(wo)man!

You’ll also notice some useful Core Graphics extensions. You’ll make use of these throughout the tutorial to simplify your code.

Vision Framework Usage Patterns

All Vision framework APIs use three constructs:

  1. Request: The request defines the type of thing you want to detect and a completion handler that will process the results. This is a subclass of VNRequest.
  2. Request handler: The request handler performs the request on the provided pixel buffer (think: image). This will be either a VNImageRequestHandler for single, one-off detections or a VNSequenceRequestHandler to process a series of images.
  3. Results: The results will be attached to the original request and passed to the completion handler defined when creating the request. They are subclasses of VNObservation

Simple right?

Writing Your First Face Detector

Open FaceDetectionViewController.swift and add the following property at the top of the class:

var sequenceHandler = VNSequenceRequestHandler()

This defines the request handler you’ll be feeding images to from the camera feed. You’re using a VNSequenceRequestHandler because you’ll perform face detection requests on a series of images, instead a single static one.

Now scroll to the bottom of the file where you’ll find an empty captureOutput(_:didOutput:from:) delegate method. Fill it in with the following code:

// 1
guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
  return
}

// 2
let detectFaceRequest = VNDetectFaceRectanglesRequest(completionHandler: detectedFace)

// 3
do {
  try sequenceHandler.perform(
    [detectFaceRequest], 
    on: imageBuffer, 
    orientation: .leftMirrored)
} catch {
  print(error.localizedDescription)
}

With this code you:

  1. Get the image buffer from the passed in sample buffer.
  2. Create a face detection request to detect face bounding boxes and pass the results to a completion handler.
  3. Use your previously defined sequence request handler to perform your face detection request on the image. The orientation parameter tells the request handler what the orientation of the input image is.

Now you maybe be wondering: But what about detectedFace(request:error:)? In fact, Xcode is probably wondering the same thing.

You’ll define that now.

Add the following code for detectedFace(request:error:) to the FaceDetectionViewController class, wherever you like:

func detectedFace(request: VNRequest, error: Error?) {
  // 1
  guard 
    let results = request.results as? [VNFaceObservation],
    let result = results.first 
    else {
      // 2
      faceView.clear()
      return
  }
    
  // 3
  let box = result.boundingBox
  faceView.boundingBox = convert(rect: box)
    
  // 4
  DispatchQueue.main.async {
    self.faceView.setNeedsDisplay()
  }
}

In this method you:

  1. Extract the first result from the array of face observation results.
  2. Clear the FaceView if something goes wrong or no face is detected.
  3. Set the bounding box to draw in the FaceView after converting it from the coordinates in the VNFaceObservation.
  4. Call setNeedsDisplay() to make sure the FaceView is redrawn.

The result’s bounding box coordinates are normalized between 0.0 and 1.0 to the input image, with the origin at the bottom left corner. That’s why you need to convert them to something useful.

Unfortunately, this function doesn’t exist. Fortunately, you’re a talented programmer!

Right above where you placed the method definition for detectedFace(request:error:), add the following method definition:

func convert(rect: CGRect) -> CGRect {
  // 1
  let origin = previewLayer.layerPointConverted(fromCaptureDevicePoint: rect.origin)
  
  // 2
  let size = previewLayer.layerPointConverted(fromCaptureDevicePoint: rect.size.cgPoint)
  
  // 3
  return CGRect(origin: origin, size: size.cgSize)
}

Here you:

  1. Use a handy method from AVCaptureVideoPreviewLayer to convert a normalized origin to the preview layer’s coordinate system.
  2. Then use the same handy method along with some nifty Core Graphics extensions to convert the normalized size to the preview layer’s coordinate system.
  3. Create a CGRect using the new origin and size.

You’re probably tempted to build and run this. And if you did, you would be disappointed to see nothing on the screen except your own face, sadly free of lasers.

Currently FaceView has an empty draw(_:) method. You need to fill that in if you want to see something on screen!

Switch to FaceView.swift and add the following code to draw(_:):

// 1
guard let context = UIGraphicsGetCurrentContext() else {
  return
}

// 2
context.saveGState()

// 3
defer {
  context.restoreGState()
}
    
// 4
context.addRect(boundingBox)

// 5
UIColor.red.setStroke()

// 6
context.strokePath()

With this code, you:

  1. Get the current graphics context.
  2. Push the current graphics state onto the stack.
  3. Restore the graphics state when this method exits.
  4. Add a path describing the bounding box to the context.
  5. Set the color to red.
  6. Draw the actual path described in step four.

Phew! You’ve been coding for quite some time. It’s finally time!

Go ahead and build and run your app.

What a good looking detected face!