Machine Vision in Test Automation

Snehal Akhouri
Disney+ Hotstar
Published in
8 min readOct 8, 2020

--

Every once in a while we come across interesting problems we must solve, but are no longer tenable with the usual toolset at our disposal. At our scale, at Disney+Hotstar Quality Engineering, that happens fairly frequently and the way we look to solve these problems is to look at other domains and try to learn and implement solutions that are unique to us.

In a two part blog, we will talk about how machine vision, leveraging artificial intelligence, can help become the human eye in checks that are too voluminous, contextual and energy draining for a human to participate in.

Let machines do the work!!

Our first problem statement was around our ads stack. We were faced with two interesting challenges that we needed to solve at speed and scale:

  • Guarantee correct ad content is shown to the user in a live match.
  • Guarantee that the ad formats appear visually exactly as business (and clients) intend it to.

Live Ads Detection

Serving the right advertisement to the right customer is paramount. Any platform which cannot do this reliably all the time is setting up its clients for failure in reaching the intended target of their campaigns and disappointing its users by serving irrelevant ads.

Target users are bucketed in various buckets based on device type, geography, gender, age, tastes, and so on. The industry term for such buckets is “cohorts”, put simply, a cohort is a combination of many attributes a target user may fall into. As a platform you want the capability to target different customers with different advertisements, so that you and me sitting in the same bus to home while watching the Indian Premier League (IPL) on our phones can get two different advertisements as we both are practically users with different attributes.

For a leading prime time content like IPL which draws millions of users in a typical match, there can potentially be thousands of cohorts and again thousands of users getting bucketed into each of them.

How could we test that our cohort logic was working fine ? Further, whether the intended advertisement was showing in the target user device in the correct format and content (photo, video, survey, etc.) ?

To further complicate things, ads in IPL are slotted between two overs in the live stream, usually a slot can run for 30 seconds, but could be “crashed” if the action resumes, with multiple ads squeezed into it. Each ad in a typical slot would be different from others — one ad might have a CTA button or an external link or some actions for the users to perform while the other one may be just a video roll and a further third ad could be a combination of some of these.

Our advertisement tech platform is completely automated, however the operational aspects of advertisement cannot be automated since brands and campaign managers need the freedom to keep changing their creatives and the ad-elements to be able to impress and influence their intended target users, sometimes in real time.

We found this is the part which opened a gate to broken experiences for the end customer and missed opportunities for business due to manual operational errors to update a creative to the correct ad type or creative or ad elements. How do we detect an operational mistake like a wrong ad creative shown to the user or even if the right creative is shown it is not missing critical ad elements (Click through button, links, or actions) — and that too while the Live match is underway!

How complex is it ?

To set context, if you have one person monitoring ad delivery on one device for a particular cohort throughout a live IPL match duration (5 hours approximately). We would need upwards of 9000 people monitoring live ads for the duration of the game, given the multiple cohorts that we can target.

Obviously this is not a scalable approach, so we decided to take a smarter route and develop a tool which we call “LIVE Ads Detector”.

We also started to build a common test service alongside this tool with the belief that if we can solve this problem, it will unlock many other use cases that surface in the coming times that the service can help with.

Requirements

  • Identify and detect each ad type on a live stream
  • Verify ads are rendered as expected on actual device (CTA , WatchList and Link Type Ad Formats on live content)
  • Real-time reporting of verification

Challenges

The two major challenges we faced were :-

Detecting elements with Appium

  • Appium needs element ids, but it’s 2020 and who defines element ids anymore.
  • Appium requires the main thread to be free to issue GET commands. Our Social feed, rolling ads makes it impossible to free the main thread.

Network Interception Layer

  • We needed to find a middle layer which can tell our LIVE Ads Detector what ad type we should expect in the next second on the player. This layer needs to alert real time, zero tolerance for latency.

Solution

We broke the solution into two parts:

  • MITM : To intercept VAST calls. Reading the VAST call allows us to determine the type and behaviour of upcoming ad
  • Image Validator service (Template Matching and Text extraction service).
Working Model of live ads detector

Figure: Image validator service

In the example above, how could we ensure the banner ad shown is for exactly the same match that is shown in the masthead — and whether the two images are even for the match IND Vs SA.

What happens when the app is not able to render the image due to a bug, or app renders an incorrect image? The test will still pass , but in reality if a human sees the masthead which has a wrong image or corrupted image the test will fail.

Training and predicting patterns, faces and objects in a source image is the answer. Our Image Validator service is built on an implementation of OpenCV, Tesseract and image processing/normalising. We started applying the service in our regular automation — expanding our training and test data sets to gain confidence of its accuracy.

Why Machine Vision?

  • AI doesn’t care about elements. It doesn’t care whether you are using an Android driver or iOS driver in your automation.
  • Training data set creation is simpler — collect data, identify outliers in your data and focus on main features and accuracy in detecting those
  • Lend eyes to AI — allow your test automation to stream real time data to the image validator service for comparisons

Image Validator Features

Let’s have a look at features that we support in the image validator currently and their functional overview

Object Detection / Template Matching

We use an OpenCV template for matching patterns in a source image to a trained image.

Here are a few examples:

Typical automation would identify the image below of a “yellow bar” component in the UI as just another element and assert that it is displayed.

However, the nuance here is that the yellow bar is a progress bar which progresses proportionate to the ad-video playing. Its appearance suggests an ad is playing, its disappearance suggests ad has ended. So, would normal automation be able to assert on ad progress, start/end of an ad visually ? Further the ad video could get stuck, progress bar could malfunction or not show at all and so on. We needed to marry visual verification to mid-layer VAST intercepts.

Asset Image (Yellow Bar)
Screenshot while ad is playing during a live event

Source image for us is a screenshot taken from an actual device where the user is viewing a match and encounters an ad.

Automation calls an API in Image Validator service for pattern matching , which takes the screenshot as input and runs against all the patterns we have had stored in assets folder (S3) and returns in API response all the matches with confidence ratios and location of its bounding box (x,y coordinates). With that settled, the automated test can safely assert on the final verdict based on its threshold confidence level expectation.

Text Extraction

Video Ad with CTA text “DOWNLOAD”

This is different from pattern matching, here we need to ‘extract’ the text from the source image. We use OpenCV and Tesseract to achieve that.

Let’s take an example

We have a Ad source image like above displayed which has a embedded link that has a text “DOWNLOAD” & <app name>, let’s assume that our scenario is something like this :-

When DOWNLOAD text is displayed

And if User clicks on DOWNLOAD

Then User lands on the “Google Play Store” page of the client <app name>.

[remember a simple operational blunder would mean an invalid text & link has been configured]

Image Validator service has an API for text matching where it searches for an expected text in a source image file, in this case “DOWNLOAD“. On calling the API, the algorithm eliminates all the noise from the image and extracts all the possible words, then it traverses through and finds the exact or closest match and also extracts the location of the bounding box. The API then returns location, UI coordinates, if text is found.

This way our automated tests can not only verify that some text shows up, but we can also be 100% sure that not only it is the exact expected text, but that the link also takes the user to the correct location.

It’s ALIVE!

We followed the train and predict model, combined AI features and built a hybrid to solve a complex problem of Live Ads Validations at hotstar scale. And adding live reporting was cherry on cake !!

Sample Live Ads Report

Yes.. we saved man hours and instead of constantly watching the stream for issues, we now have a machine doing the work and we have setup alerts and all we need to do is monitor them and take actions accordingly.

Stay tuned for Part 2 — where we will delve into more goodness!

If you want to solve problems like these and work smart, look up our open job positions on tech.hotstar.com.

--

--