A Deep Dive on AWS DeepLens

Dec 6th, 2017 12:13pm by Janakiram MSV

Last week at the Amazon Web Services’ re:Invent conference, AWS and Intel introduced a new video camera, AWS DeepLens, that acts as an intelligent device that can run deep learning algorithms on captured images in real-time. The key difference between DeepLens and any other AI-powered camera lies in the horsepower that makes it possible to run machine learning inference models locally without ever sending the video frames to the cloud.

Developers and non-developers rushed to attend the AWS workshop on DeepLens to walk away with a device. There, they were enticed with a hot dog to perform the infamous “Hot Dog OR Not Hot Dog” experiment. I managed to attend one of the repeat sessions, and carefully ferried the device back home. Despite the jet lag setting in after a 22-hour flight, I managed to power up DeepLens, and configured it to count the number of people sitting in the living room.

This new device managed to get the attention in the midst of the myriad announcements made at AWS re:Invent, and I am confident that DeepLens will grow its fanfare to build a vibrant ecosystem of developers and ISVs — just like how Amazon Alexa did it.

As an IoT, Edge, and AI enthusiast, my imagination was captured by DeepLens almost instantly. It gave shape to many hypotheses and theories on how edge computing can become intelligent. DeepLens becomes an amazing playground to test how some of the emerging technologies such as IoT, edge computing, machine learning, and serverless computing come together to address powerful scenarios.

Based on my initial experiments, I am attempting to demystify the architecture of DeepLens. As I continue to explore the depth of DeepLens, I promise to share my findings.

Configuration

DeepLens is more of a PC than a camera. It is essentially a powerful computer with an attached camera that’s only marginally better than an average webcam. The device instantly reminded me of Intel NUC, a server-grade computer with a handy form-factor, which formed the foundation of my edge computing experiments.

The PC powering DeepLens is based on an Intel Atom X5 Processor that comes with four cores and four threads. With 8GB RAM and 16GB storage, it delivers just enough juice to run machine learning (ML) algorithms. But the most fascinating part of DeepLens is the embedded GPU in the form of Intel Gen9 Graphics Engine. While this is certainly not the best of the breed hardware, it is sufficient to run local ML inferences.

The PC runs Ubuntu 16.04 LTS that can be connected to a standard keyboard, mouse and a HDMI display. You can fire a terminal window and treat the device like any other Linux machine. Beyond the OS, there are other software components that make DeepLens an intelligent device. We will explore them in the later sections.

The camera is just another 4 megapixel webcam that barely manages to deliver 1080p resolution. Instead of connecting it to one of the available USB slots, Intel embedded it in the same PC cabinet.

Once AWS exposes the software layer, anyone would be able to emulate DeepLens on their desktops and even Raspberry Pis. The key takeaway is that the hardware in itself is not the exciting part of DeepLens.

The Secret Sauce

What’s fascinating about DeepLens is the way AWS managed to connect the dots. It is an elegantly designed software stack that spans the device and the cloud. Amazon exploited many services to make DeepLens a powerful edge computing platform.

Since developing a convolutional neural network (CNN) is hard, and also requires access to tens of thousands of images to train the model, AWS has made a few projects available out of the box. Once a DeepLens device is registered with your AWS account, you can easily push any of these projects to the device within minutes.

To appreciate the architecture of DeepLens, you need to have a basic understanding of AWS IoT, AWS Greengrass, AWS Lambda, and Amazon SageMaker.

AWS IoT is a platform that manages machine-to-machine communication (M2M) and supports ingesting device telemetry. Developers can easily connect sensors and actuators to AWS IoT and orchestrate the workflow based on a simple rules engine. For advanced scenarios, the telemetry is sent to AWS Lambda for further processing.

AWS Greengrass is an extension of AWS IoT that’s designed to run on gateways and hubs that can aggregate telemetry in offline modes. The software can be easily configured on x64 or ARM devices that are deployed within the local environments. Greengrass exposes the same M2M capabilities along with a local Lambda engine for programmability. Developers can push Node.js or Python functions to run within Greengrass. AWS Greengrass can also run ML inference models locally without relying on the cloud. So, Greengrass is an edge computing platform from AWS.

The support for Python functions in Lambda opens up doors to bring sophisticated machine learning algorithms to the edge. Greengrass can run ML models based on MXNet and TensorFlow.

Amazon SageMaker is the new ML platform to build, train, and deploy ML models in the cloud. Developers can use Jupyter notebooks to write the models and use GPU-based EC2 infrastructure to train those models. The final model can be published as a web service or can be pushed to Greengrass for local inference.

DeepLens effectively exploits all these services to perform deep learning on the images. Essentially, data scientists use SageMaker to build and train a CNN in the cloud, and then deploy it to DeepLens for offline inference.

DeepLens Architecture

Ultimately, DeepLens is an edge computing device powered by AWS Greengrass. The camera connected to it is treated like any other sensor that can ingest telemetry. In this case, the camera is sending video frames instead of time-series data.

Each video frame is sent to an AWS Lambda function that runs the ML inference model written in Python. This function takes advantage of the locally available GPU to run the convolutional neural network on each frame. The model then emits a score which is formatted in JSON.

The output of the Lambda function, which is a JSON payload with the score, will also have additional annotations. This JSON payload is published to an AWS IoT MQTT topic like any other sensor telemetry payload.

Once the payload is published to the topic, it can be passed through AWS IoT rules engine that can invoke a Lambda function to evaluate each message. Once the message lands in Lambda, it is upto the developer on what he wants to do with it.

Roadmap

Technically speaking, you don’t need a $249 device to run an offline convolutional neural network model. If you are a maker with an appetite to build things from the scratch, consider getting the Google Vision Kit. It is a cheaper, DIY version of DeepLens.

Amazon must be appreciated for visualizing and designing a product like DeepLens. It acts as a reference architecture for many edge computing use cases. The whole integration of DeepLens with AWS Management Console, the workflow involved in importing a SageMaker model as a project, and finally pushing the inference model with one click, makes DeepLens a rich edge computing platform.

The device will have to go through multiple iterations before it becomes polished and sophisticated. Once the DeepLens platform becomes stable, original device manufacturers will start embedding that in their cameras. Amazon might even ship an Android and iOS version of DeepLens SDK to enable mobile developers to build intelligent computer vision applications.

DeepLens is a proof that edge computing is here and it is real. We are certainly living in exciting times.