Real Time Image Classification with TensorFlow and React-Native

This is an intermediate level tutorial. In this tutorial we will implement a simple mobile app that acts as a pictionary (picture dictionary) for a specific language by using camera stream capture to recognize objects using React-Native and Google's Machine Learning Tensorflow framework and getting their correct translation by leveraging the Google Translation API.

When learning a new spoken language (hebrew in my case), it is always useful to count on some sort of dictionary to increase your vocabulary. However, the majority of this kind of resources are presented in written form and you have to search for the particular object definition among many pages/searches and getting the right information is not a straightforward task.

Before diving into the engineering of this application, I want to show you the end result:

(I must warn you that I suck at video editing, I apologize in advance!)


Our task is to create a cross-platform mobile app that is able to:

  1. Fire-off the camera to let us point the object we want to target.
  2. Recognize the object.
  3. Give us the translated word in a given language.
  4. No backend dependencies (Just third parties).
  5. Give us option to select from several languages.


The app's logical flow model is depicted below:

No alt text provided for this image

OK, let's roll up our sleeves and get to work! I'll explain the important pieces of theory while presenting the code.

STEP 1

You may want to clone the finished code project from my github repository: https://github.com/danielwind/pictionary-tutorial

For this tutorial you will need to create a Google API key. For doing so, you need a Google Cloud Account. To sign in you'll need a credit card. No charge will be performed until you start incurring in usage fees. Almost any service free up to a certain quota ($300.00 credit at the beginning). That would be enough to translate more objects that you can translate. After that NMT translations cost $20 per million characters. Be sure to remove/disable your key after you have done your testing!

The first step is to create the mobile app backbone. I am going to use React-Native as it gives me the ability to deploy cross-platform hybrid applications with minimal setup.

* If you do not know React-Native, their documentation is pretty detailed. Gone are those days when we relied on the single-threaded poor performance webviews (e.g cordova) for hybrid applications. React native is native bridge-oriented, allowing almost (?) native performance for this sort of apps. A beast!

OK, so for creating the react-native app we are going to use the great expo.io. These guys have done an outstanding job (IMHO, the most impressive framework I have seen in years) by abstracting all the complexities that cross-platform mobile development involves (if you are a mobile dev, you know what I am talking about!). The other reason I chose this framework is because Tensorflow relies on some components of this library so we will import them right away!

From their website, I think this paragraph summarizes Expo.io effort very well:

Expo enables you to build universal native apps using only JavaScript. Use your favorite text editor to write powerful React Native components without ever opening Xcode or Android Studio. In addition to React Native components, you'll have access to the Expo SDK, a library that provides a wide variety of native APIs on iOS and Android. Expo can also manage your assets for you, take care of push notifications, and build your final native binary for submission to the app store.

Yes, all in one place. This is brilliant.

All right, so to get started, three preliminary steps need to be taken care of:

  1. head on to the expo.io website and create an account.
  2. Download the expo framework to your computer (shown below).
  3. Download the expo-client app on your phone.

Android: https://play.google.com/store/apps/details?id=host.exp.exponent iOS: https://apps.apple.com/us/app/expo-client/id982107779

Open your code editor, Visual Studio Code in my case, and open the Pictionary project. From there, open a terminal window in VS code and type the following command. This will install expo cli (command line tool) and all dependencies needed (along with react-native) for an expo project:

#installs expo-cli globally
npm install -g expo-cli

Once this command finishes, we are ready to setup our project. Clone the Pictionary-repository and then navigate to its root directory. From there, let's install all dependencies (you may need to run this command with "sudo" depending on where you are running it):

# install all project dependencies
npm install

I recommend you clone the repository even if you are going to start from scratch because there are several hard-bound dependencies between Tensorflow and Expo. You can obviously look for the package.json file in the Pictionary project to refer to all those dependencies.

Once all dependencies have been installed, we are ready to run the project. Let's enter the following command to start the expo server (in the same root directory):

# runs the Expo server and starts our project
npm start

If everything goes fine, the metro bundler (Expo's sync server) should have been started and you should be presented a web app like so (port 19002 by default):

No alt text provided for this image

The Metro bundler is just a binding server between our development environment and the mobile device (be it a real phone or a simulator installed in your computer). It pushes our updated code to the device and we can see it real-time, instead of recompiling and re-bundling every time.

I personally prefer the option of testing on my phone right away, but you may want to run it in the simulator in case no real device is available. In this case, you need to install and configure the respective simulator/emulator (by installing X-code and/or Android Studio).

OK so for starting the app on your phone, open your freshly installed expo-client android/ios app and read the QR code shown at the left-lower corner of the metro-bundler web page. The package transfer will start and after a few seconds, you will be able to run the pictionary app.

STEP 2

Now that you have a running application, it's time to deep dive into its code. I'll start with a little theory behind the technologies being used.

In this application, we are leveraging the power of the Tensorflow platform. In summary, this platform is a machine learning based framework from Google, that allows you to build and execute machine learning models to resolve complex problems such as pattern recognition, deep learning, natural language processing, etc.

Tensorflow main source code is python-based, and there are bindings for Java, Go and C https://www.tensorflow.org/install/lang_java but none of these would help us with our project per our requirements.

Luckily, Google has ported the TF library to Javascript: https://www.tensorflow.org/js/ and recently (Feb, 2020), they announced their react-native API implementation: https://js.tensorflow.org/api_react_native/0.3.0/

I'll describe how Tensorflow works during the code demonstration. The other library we will use is the Google Translation API, which not surprisingly translates a word/sentence from an origin language to a target language. This API is fairly straightforward and you may have seen it already in action here: https://translate.google.com/

OK. Now the code. Our app consists of a single view with all the related code embedded. Here's its full content (full comments and details below this snippet):

App.jsx

import React, { useState, useEffect } from 'react';
import { ActivityIndicator, Text, View, ScrollView, StyleSheet, Button, Platform } from 'react-native';
import Constants from 'expo-constants';
import RNPickerSelect from 'react-native-picker-select';
import { Chevron } from 'react-native-shapes';


//Permissions
import * as Permissions from 'expo-permissions';


//camera
import { Camera } from 'expo-camera';


//tensorflow
import * as tf from '@tensorflow/tfjs';
import * as mobilenet from '@tensorflow-models/mobilenet';
import {cameraWithTensors} from '@tensorflow/tfjs-react-native';


//disable yellow warnings on EXPO client!
console.disableYellowBox = true;


export default function App() {


  //------------------------------------------------
  //state variables for image/translation processing
  //------------------------------------------------
  const [translation, setTranslation] = useState('');
  const [word, setWord] = useState('');
  const [language, setLanguage] =  useState('he');
  const [translationAvailable, setTranslationAvailable] = useState(true);
  const [predictionFound, setPredictionFound] = useState(false);
  const [hasPermission, setHasPermission] = useState(null);


  //Tensorflow and Permissions
  const [mobilenetModel, setMobilenetModel] = useState(null);
  const [frameworkReady, setFrameworkReady] = useState(false);


  //defaults


  //if adding more languages, map codes from this list:
  // https://cloud.google.com/translate/docs/languages
  const availableLanguages = [
    { label: 'Hebrew', value: 'he' },
    { label: 'Arabic', value: 'ar' },
    { label: 'Mandarin Chinese', value: 'zh' }
  ];
  const GoogleTranslateAPI = "https://translation.googleapis.com/language/translate/v2";
  const GoogleAPIKey = "AIzaScvP36u3iunWo4rjcUODHFpZAT8RholowaT1";

  const TensorCamera = cameraWithTensors(Camera);
  let requestAnimationFrameId = 0;


  //performance hacks (Platform dependent)
  const textureDims = Platform.OS === "ios"? { width: 1080, height: 1920 } : { width: 1600, height: 1200 };
  const tensorDims = { width: 152, height: 200 }; 


  //-----------------------------
  // Run effect once
  // 1. Check camera permissions
  // 2. Initialize TensorFlow
  // 3. Load Mobilenet Model
  //-----------------------------
  useEffect(() => {
    if(!frameworkReady) {
      (async () => {


        
        //check permissions
        const { status } = await Camera.requestPermissionsAsync();
        console.log(`permissions status: ${status}`);
        setHasPermission(status === 'granted');


        //we must always wait for the Tensorflow API to be ready before any TF operation...
        await tf.ready();


        //load the mobilenet model and save it in state
        setMobilenetModel(await loadMobileNetModel());


        setFrameworkReady(true);
      })();
    }
  }, []);


  //--------------------------
  // Run onUnmount routine
  // for cancelling animation 
  // (if running) to avoid leaks
  //--------------------------
  useEffect(() => {
    return () => {
      cancelAnimationFrame(requestAnimationFrameId);
    };
  }, [requestAnimationFrameId]);


  //--------------------------------------------------------------
  // Helper asynchronous function to invoke the Google Translation
  // API and fetch the translated text. Excellent documentation
  // for parameters and response data structure is here 
  // (Translating text (Basic)):
  // https://cloud.google.com/translate/docs/basic/quickstart
  //
  // NOTE: Here we are using the simple GET with key model. While
  // this is simple to implement, it is recommended to do a POST
  // with an OAuth key to avoid key tampering. This approach is
  // for instructional purposes ONLY.
  //---------------------------------------------------------------
  const getTranslation = async (className) => {
    try {
      const googleTranslateApiEndpoint = `${GoogleTranslateAPI}?q=${className}&target=${language}&format=html&source=en&model=nmt&key=${GoogleAPIKey}`;
      console.log(`Attempting to hit Google API Endpoint: ${googleTranslateApiEndpoint}`);
      
      const apiCall = await fetch(googleTranslateApiEndpoint);
      if(!apiCall){ 
        console.error(`Google API did not respond adequately. Review API call.`);
        setTranslation(`Cannot get transaction at this time. Please try again later`);
      }


      //get JSON data
      let response = await apiCall.json();
      if(!response.data || !response.data.translations || response.data.translations.length === 0){ 
        console.error(`Google API unexpected response. ${response}`);
        setTranslation(`Cannot get translation at this time. Please try again later`);
      }


      // we only care about the first occurrence
      console.log(`Translated text is: ${response.data.translations[0].translatedText}`);
      setTranslation(response.data.translations[0].translatedText); 
      setWord(className);
    } catch (error) {
      console.error(`Error while attempting to get translation from Google API. Error: ${error}`);
      setTranslation(`Cannot get translation at this time. Please try again later`);
    } 


    setTranslationAvailable(true);
  }


  //-----------------------------------------------------------------
  // Loads the mobilenet Tensorflow model: 
  // https://github.com/tensorflow/tfjs-models/tree/master/mobilenet
  // Parameters:
  // 
  // NOTE: Here, I suggest you play with the version and alpha params
  // as they control performance and accuracy for your app. For instance,
  // a lower alpha increases performance but decreases accuracy. More
  // information on this topic can be found in the link above.  In this
  // tutorial, I am going with the defaults: v1 and alpha 1.0
  //-----------------------------------------------------------------
  const loadMobileNetModel = async () => {
    const model = await mobilenet.load();
    return model;
  }



/*-----------------------------------------------------------------------
MobileNet tensorflow model classify operation returns an array of prediction objects with this structure: 

prediction = [ {"className": "object name", "probability": 0-1 } ]

where:
  className = The class of the object being identified. Currently, this model identifies 1000 different classes.
  probability = Number between 0 and 1 that represents the prediction's probability 

Example (with a topk parameter set to 3 => default):
  [
     {"className":"joystick","probability":0.8070220947265625},
     {"className":"screen, CRT screen","probability":0.06108357384800911},
     {"className":"monitor","probability":0.04016926884651184}
  ]

In this case, we use topk set to 1 as we are interested in the higest result for both performance and simplicity. This means the array will return 1 prediction only!
------------------------------------------------------------------------*/
  const getPrediction = async(tensor) => {
    if(!tensor) { return; }

    //topk set to 1
    const prediction = await mobilenetModel.classify(tensor, 1);
    console.log(`prediction: ${JSON.stringify(prediction)}`);

    if(!prediction || prediction.length === 0) { return; }

    //only attempt translation when confidence is higher than 20%
    if(prediction[0].probability > 0.2) {


      //stop looping!
      cancelAnimationFrame(requestAnimationFrameId);
      setPredictionFound(true);


      //get translation!
      await getTranslation(prediction[0].className);
    }
  }

/*-----------------------------------------------------------------------
Helper function to handle the camera tensor streams. Here, to keep up reading input streams, we use requestAnimationFrame JS method to keep looping for getting better predictions (until we get one with enough confidence level).
More info on RAF: https://developer.mozilla.org/en-US/docs/Web/API/window/requestAnimationFrame
-------------------------------------------------------------------------*/
  const handleCameraStream = (imageAsTensors) => {
    const loop = async () => {
      const nextImageTensor = await imageAsTensors.next().value;
      await getPrediction(nextImageTensor);
      requestAnimationFrameId = requestAnimationFrame(loop);
    };
    if(!predictionFound) loop();
  }


  //------------------------------------------------------
  // Helper function to reset all required state variables 
  // to start a fresh new translation routine! 
  //------------------------------------------------------
  const loadNewTranslation = () => {
    setTranslation('');
    setWord('');
    setPredictionFound(false);
    setTranslationAvailable(false);
  }


  //------------------------------------------------------
  // Helper function to render the language picker
  //------------------------------------------------------
  const showLanguageDropdown = () => {
    return  <View>
              <RNPickerSelect
                placeholder={{}}
                onValueChange     ​={(value) => setLanguage(value)}
                items={availableLanguages} 
                value={language}
                style={pickerSelectStyles}
                useNativeAndroidPickerStyle={false}
                Icon={() => {
                  return <Chevron style={{marginTop: 20, marginRight: 15}} size={1.5} color="gray" />;
                }}
              />
                
            </View>  
  }


  //----------------------------------------------
  // Helper function to show the Translation View. 
  //----------------------------------------------
  const showTranslationView = () => { 
    return  <View style={styles.translationView}>
              {
                translationAvailable ?
                  <View>
                    <ScrollView style={{height:400}}>
                      <Text style={styles.translationTextField}>{translation}</Text>
                      <Text style={styles.wordTextField}>{word}</Text>
                    </ScrollView>
                    <Button color='#9400D3' title="Check new word" onPress​={() => loadNewTranslation()}/>
                  </View>
                : <ActivityIndicator size="large"/>
              }
            </View>
  }


/*-----------------------------------------------------------------------
Helper function to show the Camera View. 

NOTE: Please note we are using TensorCamera component which is constructed on line: 37 of this function component. This is just a decorated expo.Camera component with extra functionality to stream Tensors, define texture dimensions and other goods. For further research:
https://js.tensorflow.org/api_react_native/0.2.1/#cameraWithTensors
-----------------------------------------------------------------------*/
  const renderCameraView = () => {
    return <View style={styles.cameraView}>
                <TensorCamera
                  style={styles.camera}
                  type={Camera.Constants.Type.back}
                  zoom={0}
                  cameraTextureHeight={textureDims.height}
                  cameraTextureWidth={textureDims.width}
                  resizeHeight={tensorDims.height}
                  resizeWidth={tensorDims.width}
                  resizeDepth={3}
                  onReady   ​={(imageAsTensors) => handleCameraStream(imageAsTensors)}
                  autorender={true}
                />
                <Text style={styles.legendTextField}>Point to any object and get its {availableLanguages.find(al => al.value === language).label } translation</Text>
            </View>;
  }


  return (
    <View style={styles.container}>
      <View style={styles.header}>
        <Text style={styles.title}>
          My Pictionary
        </Text>
      </View>


      <View style={styles.body}>
        { showLanguageDropdown() }
        {translationAvailable ? showTranslationView() : renderCameraView() }
      </View>  
    </View>
  );
}


const styles = StyleSheet.create({
  container: {
    flex: 1,
    justifyContent: 'flex-start',
    paddingTop: Constants.statusBarHeight,
    backgroundColor: '#E8E8E8',
  },
  header: {
    backgroundColor: '#41005d'
  },
  title: {
    margin: 10,
    fontSize: 18,
    fontWeight: 'bold',
    textAlign: 'center',
    color: '#ffffff'
  },
  body: {
    padding: 5,
    paddingTop: 25
  },
  cameraView: {
    display: 'flex',
    flex:1,
    flexDirection: 'column',
    justifyContent: 'flex-start',
    alignItems: 'flex-end',
    width: '100%',
    height: '100%',
    paddingTop: 10
  },
  camera : {
    width: 700/2,
    height: 800/2,
    zIndex: 1,
    borderWidth: 0,
    borderRadius: 0,
  },
  translationView: {
    marginTop: 30, 
    padding: 20,
    borderColor: '#cccccc',
    borderWidth: 1,
    borderStyle: 'solid',
    backgroundColor: '#ffffff',
    marginHorizontal: 20,
    height: 500
  },
  translationTextField: {
    fontSize:60
  },
  wordTextField: {
    textAlign:'right', 
    fontSize:20, 
    marginBottom: 50
  },
  legendTextField: {
    fontStyle: 'italic',
    color: '#888888'
  },
  inputAndroid: {
    fontSize: 16,
    paddingHorizontal: 10,
    paddingVertical: 8,
    borderWidth: 1,
    borderColor: 'purple',
    borderStyle: 'solid',
    borderRadius: 8,
    color: 'black',
    paddingRight: 30,
    backgroundColor: '#ffffff'
  },
});


const pickerSelectStyles = StyleSheet.create({
  inputIOS: {
    fontSize: 16,
    paddingVertical: 12,
    paddingHorizontal: 10,
    borderWidth: 1,
    borderColor: 'gray',
    borderRadius: 4,
    color: 'black',
    paddingRight: 30
  },
  inputAndroid: {
    fontSize: 16,
    paddingHorizontal: 10,
    paddingVertical: 8,
    borderWidth: 0.5,
    borderColor: 'grey',
    borderRadius: 3,
    color: 'black',
    paddingRight: 30,
    backgroundColor: '#cccccc'
  },
});


---------------------------------------------------------------

pretty long huh? Let's digest this code little by little.

Let's begin with the import statements.

import React, { useState, useEffect } from 'react';

//React Native
import { ActivityIndicator, Text, View, ScrollView, StyleSheet, Button, Platform } from 'react-native';

//Picker
import RNPickerSelect from 'react-native-picker-select';
import { Chevron } from 'react-native-shapes';


//Expo
import Constants from 'expo-constants';
import * as Permissions from 'expo-permissions';
import { Camera } from 'expo-camera';


//Tensorflow
import * as tf from '@tensorflow/tfjs';
import * as mobilenet from '@tensorflow-models/mobilenet';
import {cameraWithTensors} from '@tensorflow/tfjs-react-native';


//disable yellow warnings on EXPO client!
console.disableYellowBox = true;

We start by importing the main react package along with two hooks to help us to manage the state and the render process of the component.

We then import react-native specific UI components for constructing the view. These are merely visual components.

We then import a special component (react-native-picker-select) which helps us render a selection component with the target languages available.

We then import some utilities from the Expo SDK. Constants is a helper object that provides system information that remains constant throughout the lifetime of the app. Permissions is an object that provides a permissions "manager" that abstracts sensitive functionality, such as accessing the device's camera, location, etc. Finally, Camera is the most important one as we rely on this component entirely. It is basically a wrapper for the Camera device's native component.

It's turn for Tensorflow. We import Tensorflow JS, which is the engine port in which the react native client depends on. After that, we import the model we want to work with, that is MobileNet implementation. I'll explain in detail what this is when the time comes. Finally, we import cameraWithTensors from the TensorflowJS react native library for converting the camera stream output to Tensors (since React Native is not really an embedded browser - webview - as other hybrid frameworks we need this different approach).

Finally, we add some expo specific runtime flag to disable warnings in the app (console.disableYellowBox). They show as yellow boxes over the app and that is a bit intrusive in my opinion. This is for development purpose only and you may wish to enable it otherwise. It is up to you!

OK, second part, variables definition:

export default function App() {


  //------------------------------------------------
  //state variables for image/translation processing
  //------------------------------------------------
  const [word, setWord] = useState('');
  const [translation, setTranslation] = useState('');
  const [language, setLanguage] =  useState('he');
  const [translationAvailable, setTranslationAvailable] = useState(true);
  const [predictionFound, setPredictionFound] = useState(false);
  const [hasPermission, setHasPermission] = useState(null);


  //Tensorflow and Permissions
  const [mobilenetModel, setMobilenetModel] = useState(null);
  const [frameworkReady, setFrameworkReady] = useState(false);


  //defaults


  //if adding more languages, map codes from this list:
  // https://cloud.google.com/translate/docs/languages
  const availableLanguages = [
    { label: 'Hebrew', value: 'he' },
    { label: 'Arabic', value: 'ar' },
    { label: 'Mandarin Chinese', value: 'zh' }
  ];
  const GoogleTranslateAPI = "https://translation.googleapis.com/language/translate/v2";
  const GoogleAPIKey = "AIzaScvP36u3iunWo4rjcUODHFpZAT8RholowaT1";
  
  //TF Camera Decorator
  const TensorCamera = cameraWithTensors(Camera);
  
  //RAF ID
  let requestAnimationFrameId = 0;


  //performance hacks (Platform dependent)
  const textureDims = Platform.OS === "ios"? { width: 1080, height: 1920 } : { width: 1600, height: 1200 };
  
  
  const tensorDims = { width: 152, height: 200 }; 


Here we define a set of variables for our app to work properly. The first 5 state variables are merely functional-scoped:

word => holds the word to be translated (in English)

translation => holds the finally translated word in the target language.

language => holds the selected language (hebrew by default). Note we store the ISO-639-1 code for the language as defined here: https://cloud.google.com/translate/docs/languages

translationAvailable => boolean flag for the rendering code to react once a translation response is processed.

predictionFound => boolean flag for controlling processing logic once Tensorflow has returned a valid prediction.

hasPermission => boolean flag that indicates if we have given the app permissions to access the camera.

--------------------------------------------------------------------------------------------------------------

Then, we define a set of state variables to control the Tensorflow logic:

mobilenetModel => holds the MobileNet model that TF will use. More details below.

frameworkReady => boolean flag to indicate that the tensorflow framework is ready and loaded.

---------------------------------------------------------------------------------------------------------------

Now onto regular constants to help our code:

availableLanguages => an array of key:value pairs with the Language label and its respective ISO-639-1 value. By default, I set it up to support 3 languages (Hebrew, Arabic, Mandarin Chinese). If you want to add more languages (say Italian) you can simply add a new key value pair for it (e.g {"label": "Italian", "value": "it"}) and that's it!

GoogleTranslateAPI => holds the API URL for Google Cloud Translation API endpoint.

GoogleAPIKey => holds the Google Cloud API Key (you need to generate yours by login into the Google Cloud Engine). More details below.

TensorCamera => This is perhaps the most important component to bind the React Native and Tensorflow. It uses the cameraWithTensors high order component from the react-native tensorflow API which converts the simple Camera expo component to a Tensor stream camera. A very important note from its docs: "the component allows on-the-fly resizing of the camera image to smaller dimensions, this speeds up data transfer between the native and javascript threads immensely."

requestAnimationFrameId => holds the RAF id se we can cancel the frame animation request whenever a prediction is acceptable (avoid unnecessary leaks)

textureDims => The width and height for the camera texture. Platform dependent. Currently the Expo SDK does not provide a way to determine the resolution of the camera so these values are empirical. Change it if you wish to for other devices like iPads.

tensorDims => Fixed output tensor width and height (based on TF model).

--------------------------------------------------------------------------------------------------------------

OK so now let's analyze the component load process. This is the main entry point for all of our logic.

useEffect(() => {

    if(!frameworkReady) {

      (async () => {
        
        //check permissions
        const { status } = await Camera.requestPermissionsAsync();
        console.log(`permissions status: ${status}`);

        setHasPermission(status === 'granted');


        //we must always wait for the Tensorflow API to be ready before any TF operation...
        await tf.ready();


        //load the mobilenet model and save it in state
        setMobilenetModel(await loadMobileNetModel());


        setFrameworkReady(true);
      })();
    }
  }, []);

  1. We start by using the useEffect hook by passing an empty array to run it once. The code within first evaluate if the Tensorflow framework is ready. If not, then we proceed with a self-invoking function (async function since TF ready returns a promise) with this code:
  • Ask for camera usage permission
  • Invoke the ready() method on Tensorflow JS. This is strictly required to initialize all of the tensorflow framework internals and start accepting models.
  • Once TF is ready, we load the mobilenet model. The code for that method is:
   const loadMobileNetModel = async () => {
    const model = await mobilenet.load();
    return model;
  }

As you can infer from the code above, it is simply a load() method on the mobilenet package. But what exactly is this mobilenet thing?

Tensorflow provides pre-trained models that you can reuse in your projects. There are several options. We are using the "image classification" model which is based on the popular imagenet database (http://www.image-net.org/). The MobileNet classification is due to the fact that the model has been adapted for lower-end devices (limited resources). In other words, they made it faster and less intensive in order for it to run properly on limited resources contexts.

It is important to note that the load() method receives a configuration object that you can tweak at your convenience. For example, you can specify the alpha parameter to exchange accuracy for performance. In this tutorial I am just going with the defaults (no config object being passed). More info here: https://github.com/tensorflow/tfjs-models/tree/master/mobilenet

There is one major caveat here that you must be aware of. The Tensorflow mobilenet pre-trained model only supports 1000 classes (objects) so do not expect it to do magic! However, based on this tutorial, I have to say that it has an acceptable outcome nonetheless.

-------------------------------------------------

OK, now we are going to discuss the code based on the execution flow. We'll start by looking at the camera view component:

 const renderCameraView = () => {
    return <View style={styles.cameraView}>

                <TensorCamera
                  style={styles.camera}
                  type={Camera.Constants.Type.back}
                  zoom={0}
                  cameraTextureHeight={textureDims.height}
                  cameraTextureWidth={textureDims.width}
                  resizeHeight={tensorDims.height}
                  resizeWidth={tensorDims.width}
                  resizeDepth={3}
                  onReady   ​={(imageAsTensors) => handleCameraStream(imageAsTensors)}
                  autorender={true}
                />
                
            </View>;
  }

This component renders the camera but most importantly, provide real time feedback (stream) of Tensorflow tensors that are just a tensor mirror of the image generated by the camera internally. The most important parameters here are onReady and autorender. The first one is just a callback to pass the tensor and do something with them. The autorender is a flag to let the view get updated automatically (if you want to do some fancy rendering, such as applying some mask or filter then set this to false and read the onReady's callback with an extra-parameters updateCameraPreview and gl).

handleCameraStream()

This is our callback for processing tensors. Its code is depicted below:

const handleCameraStream = (imageAsTensors) => {
    const loop = async () => {
      const nextImageTensor = await imageAsTensors.next().value;
      await getPrediction(nextImageTensor);
      requestAnimationFrameId = requestAnimationFrame(loop);
    };
    if(!predictionFound) loop();
  }

First, we construct a recursive asynchronous function loop that will iterate until a prediction is found. Within that block we get the next tensor's value and invoke a "getPrediction" function to obtain predictions. Finally, we use the requestAnimationFrame to repaint if needed. Note that we save the ID globally so we can cancel this loop when desired.

getPrediction()

This is the main function to get predictions. Its code:

const getPrediction = async(tensor) => {

    if(!tensor) { return; }

    //topk set to 1
    const prediction = await mobilenetModel.classify(tensor, 1);
    console.log(`prediction: ${JSON.stringify(prediction)}`);

    if(!prediction || prediction.length === 0) { return; }

    //only attempt translation when confidence is higher than 20%
    if(prediction[0].probability > 0.2) {


      //stop looping!
      cancelAnimationFrame(requestAnimationFrameId);
      setPredictionFound(true);


      //get translation!
      await getTranslation(prediction[0].className);
    }
  }

First we start with a safe check to avoid nullity exceptions. Then, and this is where the magic resides, we invoke the mobilenet model classify() method, which takes as parameters the tensor and "topk" which is just a number of the top k predictions we want returned.

The prediction returned is an array with topk number of elements (1 in our case). The data structure looks like this:

[ {"className":"joystick","probability":0.8070220947265625} ]

I think this is self-explanatory. ClassName refers to the real life object being classified (again, based on the image-net training of the model) and probability is, well, the probability of the prediction.

OK, so once we get a valid prediction (after some sanity check) we determine if the probability is higher than a certain threshold. You don't really want to set this too low because it will give you false positives all the time. You also do not want to set this too high because it will be hard to get a prediction. Remember that our scene is always polluted with other objects and luminosity (noise), so that confuses the engine.

After some experimentation, I determined that 20% works fine for my app. You might be more rigorous than I am, so feel free to experiment.

OK so if the prediction meets our probability threshold requirement, then we stop the tensor read loop by cancelling the animation frame and since we have now detected the object, we can now attempt to get its translation!

getTranslation()

In this method we have one argument, the object name that we have detected. Since Tensorflow returns classNames in English, this is perfect for invoking the google translation API (you can send API queries on any origin language and it will work fine too, but through experimentation, I have found that english as source delivers more accurate translations).

The code:

const getTranslation = async (className) => {
    try {

      const googleTranslateApiEndpoint = `${GoogleTranslateAPI}?q=${className}&target=${language}&format=html&source=en&model=nmt&key=${GoogleAPIKey}`;

      console.log(`Attempting to hit Google API Endpoint: ${googleTranslateApiEndpoint}`);
      
      const apiCall = await fetch(googleTranslateApiEndpoint);

      if(!apiCall){ 
        console.error(`Google API did not respond adequately. Review API call.`);
        setTranslation(`Cannot get translation at this time. Please try again later`);
      }


      //get JSON data
      let response = await apiCall.json();
      if(!response.data || !response.data.translations || response.data.translations.length === 0){ 
        console.error(`Google API unexpected response. ${response}`);
        setTranslation(`Cannot get translation at this time. Please try again later`);
      }


      // we only care about the first occurrence
      console.log(`Translated text is: ${response.data.translations[0].translatedText}`);
      setTranslation(response.data.translations[0].translatedText); 
      setWord(className);
    } catch (error) {
      console.error(`Error while attempting to get translation from Google API. Error: ${error}`);
      setTranslation(`Cannot get translation at this time. Please try again later`);
    } 


    setTranslationAvailable(true);
  }

OK, let's quickly dive into this method. The first thing we do is to build the Google Translate API endpoint. Note that I am NOT using the Google Client libraries but instead I am just doing a simple HTTP get call. I did not include the client for simplicity sake. The approach I took is definitely NOT the one you would do in prod, as the API key could be easily stolen (by decompiling the app or just proxying http calls) and you would get charged for those unauthorized calls!

With that said, we just do a GET call to Google with a series of parameters in the query string:

q= the word we want to translate

source= the source language that word is originally (English in our case)

target= the target language we want to translate the word to (Hebrew by default).

format= The format we want the result on (html or text)

model= The translation model. Can be either base to use the Phrase-Based Machine Translation (PBMT) model, or nmt to use the Neural Machine Translation (NMT) model. 

key= Your API key would go here.

All right so, we move on by sending the HTTP call using the JS fetch API and then parsing the response to an object. The data structure of a google Translate API response is very interesting:

{
  "data": {
    "translations": [
      {
        "translatedText": "כֶּלֶב",
        "model": "nmt"
      }
    ]
  }
}

As you can see, it is an array of translation object (since an object may have more than one translation) with the translatedText set to the specific target language.

So back to our code, after some sanity checks on the response, we store the word, the translation and the flags in our state variables to control the UI View and show the tarnslation panel.

And that is it really! the rest of helper functions are merely UI routines to show/hide the two important views. Nothing fancy in all honesty.

Conclusion

After experimenting with Tensorflow React-Native library I am surprised it performs decently. My Android device is a BLU device (6GB RAM, 2 GHz 8 core CPU) running on Android 9 (PIE). I also tested it in a Samsung Note 9 and it behaves acceptable. I frankly expected a very clunky experience. That was definitely NOT the case.

In regards to the actual image classification, I have to say that I was surprised that 6 out of 10 attempts, it got it right! But remember, we are using a pre-trained model. This is actually quite acceptable in my opinion. Please be aware however that scene contamination might be problematic as the model attempts to identify any object in the scene so you have to be careful. In a future tutorial, I will show how to create a model so we can get more accurate translations based on the objects we have around us!

As a funny anecdote, the prediction I got for my computer's mouse was: a toilet. And if you see it, it kind'a looks like it lol! (I know, it's kind of girly and dirty, but I love it!)

No alt text provided for this image

I am very pleased to see Google's effort to make Machine Learning more accessible to lower end devices and easier and broader APIs for us the developers to get productive in no-time. I also have to admit that I had to dive deep into their github repositories in order to find more information, but I also believe that we are still incipient when it comes to documentation of these technologies since they are really new. I hope this tutorial helps someone out there looking for this information or at least gives some insight on where to find more info!

Thanks for reading.

hi Can another camera be used for @tensorflow/tfjs-react-native?

Like
Reply
Joshua Wilkinson

Software Engineer at Cisco Meraki

3y

Thanks for this article, its incredibly useful! I was wondering if you had any suggestions on keeping the preview live while suggestions are displayed on top. When I've tried doing this the setState calls cause the camera to re-render ( setWord() setTranslation()). Keep up the great work!

Like
Reply
山口啓一

フリーランス, プログラマー, TIkToker

3y

I also posted this in your repo of GitHub. Thaks a lot for your repository. I was abled to find onReady is called only for the first time. now onReady is not called in TensorCamera. In your environment , onReady is called everytime???

Like
Reply

Thank you so much for the nice post, recently I follow this post and setup my project, but I got a weird async promise rejection error at tf-backend-webgl.node.js callAndCheck(...) saying 'func' is not type of func(); ( Reason and Solution : ) The reason behind at 2020/08/28 is the package of tfjs-react-native 0.3.0 is compatible with tfjs 2.0.0, but not the later 2.0.3, so simply do "expo install @tensorflow/tfjs" does not work for me, ended up I go to the package.json at application and change it to "@tensorflow/tfjs": "2.0.0" and problem solved (for now). (If anyone want to dig deeper: ) ----------------------------------------------------------------------------------------- Reading the source code, the callAndCheck in tfjs-backend-webgl have signature: [tensorflow/tfjs: ^2.0.3] function callAndCheck(gl, func) {...} [tensorflow/tfjs: 2.0.0] function callAndCheck(gl, debugMode, func) {...} And from tsjs-react-native, the callAndcheck is passing in: webgl_util.callAndCheck(gl, debugMode, () => { ... } And this is why it break... ----------------------------------------------------------------------------------------- I wish people reading this nicely written article can go explore what they want to do smoothly and does not need to spent time like I did to resolve potential package dependency problem like I did.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics