ML Kit Tutorial for iOS: Recognizing Text in Images

In this ML Kit tutorial, you’ll learn how to leverage Google’s ML Kit to detect and recognize text. By David East.

Leave a rating/review
Download materials
Save for later
Share

A few years ago, there were two types of machine learning (ML) developers: the advanced developers and everyone else. The lower levels of ML can be hard; it’s a lot of math, and it uses big words like logistic regression, sparsity and neural nets. But it doesn’t have to be that hard.

You can also be an ML developer! At its core, ML is simple. With it, you solve a problem by teaching a software model to recognize patterns instead of hard coding each situation and corner case you can think of. However, it can be daunting to get started, and this is where you can rely on existing tools.

Machine Learning and Tooling

Just like iOS development, ML is about tooling. You wouldn’t build your own UITableView, or at least you shouldn’t; you would use a framework instead, like UIKit.

It’s the same way with ML. ML has a booming ecosystem of tooling. Tensorflow, for example, simplifies training and running models. TensorFlow Lite brings model support to iOS and Android devices.

Each of these tools requires some experience with ML. What if you’re not an ML expert but want to solve a specific problem? For these situations, there’s ML Kit.

ML Kit

ML Kit is a mobile SDK that brings Google’s ML expertise to your app. There are two main parts of ML Kit’s APIs for common use cases and custom models that are easy to use regardless of experience.

ML Kit

The existing APIs currently support:

Each of these use cases comes with a pre-trained model wrapped in an easy-to-use API. Time to start building something!

Getting Started

In this tutorial, you’re going to build an app called Extractor. Have you ever snapped a picture of a sign or a poster just to write down the text content? It would be great if an app could just peel the text off the sign and save it for you, ready to use. You could, for example, take a picture of an addressed envelope and save the address. That’s exactly what you’ll do with this project! Get ready!

Start by downloading the project materials for this tutorial using the Download Materials button at the top or bottom of this tutorial.

This project uses CocoaPods to manage dependencies.

Setting Up ML Kit

Each ML Kit API has a different set of CocoaPods dependencies. This is useful because you only need to bundle the dependencies required by your app. For instance, if you’re not identifying landmarks, you don’t need that model in your app. In Extractor, you’ll use the Text Recognition API.

If you were adding the Text Recognition API to your app, then you would need to add the following lines to your Podfile, but you don’t have to do this for the starter project since the lines are there in the Podfile – you can check.

pod 'Firebase/Core' => '5.5.0'
pod 'Firebase/MLVision' => '5.5.0'
pod 'Firebase/MLVisionTextModel' => '5.5.0'

You do have to open the Terminal app, switch over to the project folder and run the following command to install the CocoaPods used in the project though:

pod install

Once the CocoaPods are installed, open Extractor.xcworkspace in Xcode.

If you’re unfamiliar with CocoaPods, our CocoaPods Tutorial will help you get started.

Note: You may notice that the project folder contains a project file named Extractor.xcodeproj and a workspace file named Extractor.xcworkspace, which is the file you’re opening in Xcode. Don’t open the project file, because it doesn’t contain the additional CocoaPods project which is required to compile the app.

This project contains the following important files:

  1. ViewController.swift: The only controller in this project.
  2. +UIImage.swift: A UIImage extension to fix the orientation of images.

Setting Up a Firebase Account

To set up a Firebase account, follow the account setup section in this Getting Started With Firebase Tutorial. While the Firebase products are different, the account creation and setup is exactly the same.

The general idea is that you:

  1. Create an account.
  2. Create a project.
  3. Add an iOS app to a project.
  4. Drag the GoogleService-Info.plist to your project.
  5. Initialize Firebase in the AppDelegate.

It’s simple process but, if you hit any snags, the guide above can help.

Note: You need to set up Firebase and create your own GoogleService-Info.plist for both the final and starter projects.

Build and run the app, and you’ll see that it looks like this:

Starter app

It doesn’t do anything yet except allow you to share the hard-coded text via the action button on the top right. You’ll use ML Kit to bring this app to life.

Detecting Basic Text

Get ready for your first text detection! You can begin by demonstrating to the user how to use the app.

A nice demonstration is to scan an example image when the app first boots up. There’s an image bundled in the assets folder named scanned-text, which is currently the default image displayed in the UIImageView of the view controller. You’ll use that as the example image.

But first, you need a text detector to detect the text in the image.

Creating a Text Detector

Create a file named ScaledElementProcessor.swift and add the following code:

import Firebase

class ScaledElementProcessor {

}

Great! You’re all done! Just kidding. Create a text-detector property inside the class:

let vision = Vision.vision()
var textRecognizer: VisionTextRecognizer!
  
init() {
  textRecognizer = vision.onDeviceTextRecognizer()
}

This textRecognizer is the main object you can use to detect text in images. You’ll use it to recognize the text contained in the image currently displayed by the UIImageView. Add the following detection method to the class:

func process(in imageView: UIImageView, 
  callback: @escaping (_ text: String) -> Void) {
  // 1
  guard let image = imageView.image else { return }
  // 2
  let visionImage = VisionImage(image: image)
  // 3
  textRecognizer.process(visionImage) { result, error in
    // 4
    guard 
      error == nil, 
      let result = result, 
      !result.text.isEmpty 
      else {
        callback("")
        return
    }
    // 5
    callback(result.text)
  }
}

Take a second to understand this chunk of code:

  1. Here, you check if the imageView actually contains an image. If not, simply return. Ideally, however, you would either throw or provide a graceful failure.
  2. ML Kit uses a special VisionImage type. It’s useful because it can contain specific metadata for ML Kit to process the image, such as the image’s orientation.
  3. The textRecognizer has a process method that takes in the VisionImage, and it returns an array of text results in the form of a parameter passed to a closure.
  4. The result could be nil, and, in that case, you’ll want to return an empty string for the callback.
  5. Lastly, the callback is triggered to relay the recognized text.