Mobile devices are getting better and better at solving sophisticated tasks. Not only because of better hardware, but also due to modern trends towards AI – such tasks as face detection, barcode recognition, rectangle detection, text recognition, etc. are now supported on the operating system level making it really simple to solve them in your app. Here I am going to show how to detect face landmarks in real time using the Vision framework. The demo app that we’re going to build here is also available on GitHub.
AVCaptureSession
The first thing to do is to configure an instance of AVCaptureSession to capture the video stream from the front camera. We’re going to direct the stream to
- AVCaptureVideoPreviewLayer to preview it on the screen
- AVCaptureVideoDataOutput to perform the face landmarks detection
Let’s start with a small helper property to get the front camera AVCaptureDevice. We’re using the AVCaptureDeviceDiscoverySession specifying that we’re interested in the front camera.
public AVCaptureDevice GetDevice() { var videoDeviceDiscoverySession = AVCaptureDeviceDiscoverySession.Create(new AVCaptureDeviceType[] { AVCaptureDeviceType.BuiltInWideAngleCamera }, AVMediaType.Video, AVCaptureDevicePosition.Front); return videoDeviceDiscoverySession.Devices.FirstOrDefault(); }
Now the AVCaptureSession itself.
public void ConfigureDeviceAndStart() { var device = GetDevice(); if (device.LockForConfiguration(out var error)) { if (device.IsFocusModeSupported(AVCaptureFocusMode.ContinuousAutoFocus)) { device.FocusMode = AVCaptureFocusMode.ContinuousAutoFocus; } device.UnlockForConfiguration(); } // Configure Input var input = AVCaptureDeviceInput.FromDevice(device, out var error2); _captureSession.AddInput(input); // Configure Output var settings = new AVVideoSettingsUncompressed() { PixelFormatType = CoreVideo.CVPixelFormatType.CV32BGRA }; var videoOutput = new AVCaptureVideoDataOutput { WeakVideoSettings = settings.Dictionary, AlwaysDiscardsLateVideoFrames = true }; var videoCaptureQueue = new DispatchQueue("Video Queue"); videoOutput.SetSampleBufferDelegateQueue(new OutputRecorder(View, _shapeLayer), videoCaptureQueue); if (_captureSession.CanAddOutput(videoOutput)) { _captureSession.AddOutput(videoOutput); } // Start session _captureSession.StartRunning(); }
Here we’re initiating the capture session by adding instances of the AVCaptureDeviceInput and AVCaptureVideoDataOutput classes. We’re setting AlwaysDiscardsLateVideoFrames to true to save some memory (well, it’s true by default, but let’s make it explicit). And also what’s important here is the OutputRecorder – our implementation of IAVCaptureVideoDataOutputSampleBufferDelegate which will do the face landmarks detection.
VNSequenceRequestHandler and VNDetectFaceLandmarksRequest
At this point, we have the configured AVCaptureSession and we’re ready to process the output to detect face landmarks. To do this let’s override the DidOutputSampleBuffer method.
public class OutputRecorder : AVCaptureVideoDataOutputSampleBufferDelegate { public override void DidOutputSampleBuffer(AVCaptureOutput captureOutput, CMSampleBuffer sampleBuffer, AVCaptureConnection connection) { using (var pixelBuffer = sampleBuffer.GetImageBuffer()) using (var ciImage = new CIImage(pixelBuffer)) using (var imageWithOrientation = ciImage.CreateByApplyingOrientation(ImageIO.CGImagePropertyOrientation.LeftMirrored)) { DetectFaceLandmarks(imageWithOrientation); } sampleBuffer.Dispose(); } ... }
The method is called every time there are new frames captured. We’re creating a CIImage and passing it to the DetectFaceLandmarks method which will use the Vision framework to detect face landmarks and draw on the overlay layer. Note that we need to properly dispose all objects, otherwise the app becomes unresponsive very quickly.
VNSequenceRequestHandler _sequenceRequestHandler = new VNSequenceRequestHandler(); VNDetectFaceLandmarksRequest _detectFaceLandmarksRequest; void DetectFaceLandmarks(CIImage imageWithOrientation) { if (_detectFaceLandmarksRequest == null) { _detectFaceLandmarksRequest = new VNDetectFaceLandmarksRequest((request, error) => { RemoveSublayers(_shapeLayer); if (error != null) { throw new Exception(error.LocalizedDescription); } var results = request.GetResults<VNFaceObservation>(); foreach (var result in results) { if (result.Landmarks == null) { continue; } var boundingBox = result.BoundingBox; var scaledBoundingBox = Scale(boundingBox, _view.Bounds.Size); InvokeOnMainThread(() => { DrawLandmark(result.Landmarks.FaceContour, scaledBoundingBox, false, UIColor.White); DrawLandmark(result.Landmarks.LeftEye, scaledBoundingBox, true, UIColor.Green); DrawLandmark(result.Landmarks.RightEye, scaledBoundingBox, true, UIColor.Green); DrawLandmark(result.Landmarks.Nose, scaledBoundingBox, true, UIColor.Blue); DrawLandmark(result.Landmarks.NoseCrest, scaledBoundingBox, false, UIColor.Blue); DrawLandmark(result.Landmarks.InnerLips, scaledBoundingBox, true, UIColor.Yellow); DrawLandmark(result.Landmarks.OuterLips, scaledBoundingBox, true, UIColor.Yellow); DrawLandmark(result.Landmarks.LeftEyebrow, scaledBoundingBox, false, UIColor.Blue); DrawLandmark(result.Landmarks.RightEyebrow, scaledBoundingBox, false, UIColor.Blue); }); } }); } _sequenceRequestHandler.Perform(new[] { _detectFaceLandmarksRequest }, imageWithOrientation, out var requestHandlerError); if (requestHandlerError != null) { throw new Exception(requestHandlerError.LocalizedDescription); } }
The method is quite simple. First, we initiate a new VNDetectFaceLandmarksRequest by specifying a handler which will iterate through all results and draw them (note that we’re doing the drawing on the UI thread). And second, we’re using the VNDetectFaceLandmarksRequest to perform the detection on the CIImage from the previous step.
And lastly, the DrawLandmark method:
void DrawLandmark(VNFaceLandmarkRegion2D feature, CGRect scaledBoundingBox, bool closed, UIColor color) { var mappedPoints = feature.NormalizedPoints.Select(o => new CGPoint(x: o.X * scaledBoundingBox.Width + scaledBoundingBox.X, y: o.Y * scaledBoundingBox.Height + scaledBoundingBox.Y)); using (var newLayer = new CAShapeLayer()) { newLayer.Frame = _view.Frame; newLayer.StrokeColor = color.CGColor; newLayer.LineWidth = 2; newLayer.FillColor = UIColor.Clear.CGColor; using (UIBezierPath path = new UIBezierPath()) { path.MoveTo(mappedPoints.First()); foreach (var point in mappedPoints.Skip(1)) { path.AddLineTo(point); } if (closed) { path.AddLineTo(mappedPoints.First()); } newLayer.Path = path.CGPath; } _shapeLayer.AddSublayer(newLayer); } }
Since the Vision framework returns normalized points of landmarks we’re transforming them to the screen coordinates before drawing. The rest code is just about adding a new CAShapeLayer with the drawn line.
Conclusion
Here I showed you how simple it is to perform such a complex task as the detection of facial landmarks. If you’re creating your own app that uses this feature, don’t forget to add an NSCameraUsageDescription to your info.plist. Also, keep in mind that the Vision framework is available on iOS 11+. Happy coding!