Appium Native Mobile Commands — Execute Script Overloading

Lana Begunova
18 min readJan 5, 2024

Discover Appium’s unique method for enabling new and platform-specific commands in client libraries without the need for updates to the client library.

Execute Script Overloading

Let’s learn about something that we can call Execute Script Overloading in Appium. Basically, it’s a way of using the driver.execute_script method for purposes beyond its original intent.

  • In Selenium, there’s a command called driver.execute_script. It is supposed to take a string representing the body of a Javascript function, and other parameters as arguments for that function. The idea is that we use this method to run JavaScript inside of a webpage we are automating.
  • Now, native apps don't run on Javascript and HTML, so this method is sort of useless when it comes to Appium's automation of native apps. But, the Appium developers found a very interesting use for this method, which puts it to service in implementing a whole host of new commands that Appium provides.
  • We call this overloading the Execute Script function, in other words, adding responsibilities to it that it wasn't originally written for. But in practice, it works well.

Why does Appium want to use the execute script function for a different purpose? Why not just leave it alone and ignore it, if it makes no sense for mobile apps?

Why Overload?

  • Early on, the Appium team realized there were lots of new features that they wanted to add to the WebDriver spec.
  • For some of these features, it made sense to extend the WebDriver spec. Appium developers created new commands and new routes, and then they added support for these new routes to the Appium client.
  • But for other features, they weren’t quite sure if this was the right approach. Maybe the feature was experimental and they weren’t sure if it would stick around for a while. In this case, setting a route and parameters in stone would be hard to change. Or, maybe the feature was only available on a single platform, and so it felt odd to create a route for just a single platform. Or, maybe they wanted the feature to be available immediately, without requiring updates in the Appium client libraries. Because they kept running into cases like this, they needed another way to make these commands available.

And the Appium team actually found a very natural way to solve this problem sitting right in front of them, in the form of the Execute Script command. They decided that certain commands are best exposed as these Execute Script mobile methods.

driver.execute_script(“mobile: ”)

  • The Execute Script command was designed from the get-go to allow for the execution of code, with arbitrary parameters. That is very much like what they wanted for all the Appium features that fell into one of the categories we’ve just mentioned.
  • Moreover, the Execute Script command was not currently being used with Appium, so it was available for repurposing.
  • And finally, the Execute Script command was already available in every Appium and Selenium client, with a flexible and future-proof API.
  • So, what they did is establish a sort of convention, where they could expose certain commands from Appium via driver.execute_script, simply by putting the name of the command as the JavaScript string, with a special mobile: prefix. Then, the command parameters, if it required any, could be passed in as the first "JavaScript" parameter in the Execute Script command.

We might think of some use cases that are not automatable, but they actually are! Appium supports 50+ native mobile commands. Let’s look at some examples.

Why Native Mobile Commands?

The WebDriver protocol adheres to W3C standards, constituting a portion of the protocol. Native mobile commands, found in the Appium documentation, encompass APIs distinct from the W3C standards and WebDriver specifications. Google and Apple offer platform-specific APIs, such as Siri on iOS, which allows interaction with Siri through an API not integrated into the WebDriver specification. The WebDriver spec, conceived by individuals like Simon Stewart and Jonathan Lipps, maintains a comprehensive generic design that is adaptable to diverse platforms, browsers, and IoT devices. It intentionally lacks the nuances specific to particular platforms, browsers, iOS devices, or IoT platforms. Meanwhile, platform vendors like Google and Apple supply APIs specific to their automation libraries for executing gestures or Siri interactions on iOS. Let’s deep dive into the native commands on iOS and Android.

Automating Gestures

We frequently incorporate gestures into our mobile apps and embed test code within a framework for executing these gestures. While the WebDriver Actions API is commonly utilized in our code, it can be frustrating when automating mobile gestures. The challenge arises when, despite receiving a 200 response from the server, no corresponding actions occur on the screen. Reviewing logs and seeking community assistance often leads to an impasse, as the logs show a successful response (200) without any indication of errors. Consequently, the reported action fails to take place on our device, and the issue remains unresolved, resulting in closure without a solution.

In more complex scenarios, such as executing intricate gestures like providing a digital signature, challenges arise. For instance, a digital signature may be part of a crucial app feature, like the checkout process, making end-to-end testing unachievable. This situation necessitates the implementation of workarounds and results in unreliable tests.

A valuable solution emerged when we encountered the W3C-compliant WebDriver Actions API. As Selenium transitioned to W3C standards, Appium followed suit, aligning with the W3C APIs. Adhering to these standards required modifications to our code base. Embracing the Actions API proved to be a successful resolution, enabling the seamless scripting of the entire digital signature flow.

However, it’s important to note that the Actions API presents its own challenges. The process involves numerous sequences, requiring the addition of multiple actions and complex coding, especially for seemingly simple tasks like scrolling up or down. Coordinating specific pauses between actions becomes crucial, as different platforms, such as XCUI and Android, prescribe specific wait times. This complexity poses a challenge, necessitating the crafting of intricate code to meet these requirements.

Android

Start Activity

We’ve already discussed how to start Android activities via capabilities. The execute_script call provides us with another venue of starting an activity on Android. We pass in two parameters: the package ID of the application we want to start, and the activity name or ID to launch within that app.

The particular command we’re running here is called startActivity, and to access it we use the command name as the body of the execute_script JavaScript string, but with the prefix mobile: in front of it. Why do we have this prefix? Well, in the event where executing normal JavaScript ever does make sense with Appium, we want to make sure that our special Appium commands aren't interpreted somehow as valid JavaScript. As for the argument, we pass in a dictionary. The key is “component”, and the value is the the full intent name which is made of a combination of the app package ID and activity name.

Let’s give this a whirl! First, we’ll launch a native Settings app as an activity via capabilities. Then, we’ll install our sample mobile AUT (ApiDemos.apk), define two of its activities, and launch them with the execute_script command.

The two activities we intend spin up for the demo are:

  1. .view.DateWidgets2
  2. .app.FragmentAlertDialog
adb shell am start-activity <package_name>/.<activity_name>

In our Python script mobile_commands_android.py we start the date widget and alert dialog activities as follows:

driver.execute_script("mobile: startActivity", {"component": f"{app_id}/{app_act1}"})
driver.execute_script("mobile: startActivity", {"component": f"{app_id}/{app_act2}"})

We add a static wait after the startActivity command, merely to observe the activity launch for the demo purposes.

Device Information

github.com/appium/appium-uiautomator2-server/blob/master/app/src/main/java/io/appium/uiautomator2/handler/GetDeviceInfo.java

What else can we do on Android with Execute Script commands?

For example, we can retrieve the device information with execute_script(“mobile: deviceInfo”) and print it to the console output or file if preferred.

print(driver.execute_script("mobile: deviceInfo"))
Start activities and get device info with execute_script(). HD video: https://youtu.be/36anq3JmEuk.

The mobile_commands_android.py script outputs a lot of stuff:

Here we have information about the Android ID (udid) of the device, the API version, the manufacturer, whether the network is connected and how fast it is, the screen size, timezone and so on. Why might we need this information? Well, we might write a test which is sensitive to some of these factors, for example the timezone, and may need to behave differently based on some of them.

At the beginning of our test we could retrieve this info from the device in order to inform our test how to execute correctly. This would be especially useful in a cloud execution environment where we might not have total control over which type of device we get for a given test. And since this is just a Python dictionary, we can walk through it just like we would with any other Python dictionary in our code.

Alerts

Now, let’s explore some alert actions. Commands “mobile:acceptAlert” and “mobile:dismissAlert” help us work with accepting and dismissing the alerts on Android devices accordingly.

In the Python script native_alerts_android.py we launch the App/Alert Dialogs screen. Then we invoke an alert and accept it. Subsequently we invoke another alert and dismiss it.

# Accept an alert
alert_acceptance = wait.until(EC.presence_of_element_located(
AppiumBy.ACCESSIBILITY_ID, 'OK Cancel dialog with a message')))
alert_acceptance.click()
time.sleep(1)
driver.execute_script("mobile:acceptAlert")

# Dismiss an alert
alert_dismissal = wait.until(EC.presence_of_element_located(
(AppiumBy.ACCESSIBILITY_ID, 'OK Cancel dialog with a long message')))
alert_dismissal.click()
time.sleep(1)
driver.execute_script("mobile:dismissAlert")
Native alerts handling with execute_script(). HD video: https://youtu.be/gjWXcgaxXWo.

Shell Interactions

We can do some performance profiling on Android by executing the shell commands provided by ADB (Android Debug Bridge).

Android Shell Interactions: https://developer.android.com/tools/adb#shellcommands
  • List all services: adb shell service list
  • Get battery information: adb shell dumpsys battery
  • Get CPU information: adb shell dumpsys cpuinfo
  • Get memory usage overview: adb shell dumpsys meminfo <package_id>

Let’s incorporate these ADB commands into Python code. In our file android_shell_interactions.py we’ll execute all four adb commands illustrated above.

Execute ADB shell commands (requires server flag --relaxed-security to be set).

Upon running the script, we capture useful output which we can analyze for performance monitoring and profiling.

iOS

Gestures

Let’s continue our exploration of the impressive capabilities of these native commands. Over time, I’ve frequently heard my peers express frustrations regarding the challenges of iOS testing. However, it’s worth noting that iOS does offer valuable features for testing. We’ll delve into the native commands that iOS exposes, Appium’s creation of corresponding “mobile:” commands, and how user-friendly they prove to be in practice.

# mobile: swipe
args = {}
args["direction"] = "up"
driver.execute_script("mobile: swipe", args)
args["direction"] = "down"
driver.execute_script("mobile: swipe", args)

# mobile: scroll
args = {}
args["direction"] = "down"
driver.execute_script("mobile: scroll", args)
args["direction"] = "up"
driver.execute_script("mobile: scroll", args)

# mobile: pinch
args = {}
args["scale"] = 5
driver.execute_script("mobile: pinch", args)

Consider a scenario involving a scroll action. When utilizing the Actions API, imagine a situation where we have a lengthy list of data, and our objective is to keep scrolling until we reach a button positioned at the very bottom of the page. Typically, we employ a while loop or a for loop, based on the logic, to continuously scroll until the target element is present on the screen.

The main purpose of the test is to scroll down, reaching the bottom to click a specific button — a valid testing scenario. In this case, we include the scrolling actions in a loop, verifying if the button has been successfully pressed. If not, the loop iterates, initiating another scroll, and the process continues. However, this involves repeated calls to XCUI, which can be slow and time-consuming. The time spent scrolling is not an ideal use of resources, especially when the ultimate goal is simply to locate the button for a click or another action.

The question arises: why go through the hassle of scrolling multiple times when the primary objective is to locate and interact with a specific element?

Let’s look at one of the mobile native commands which iOS exposes, which is actually XCUI, and how simple it is. Within the realm of native mobile commands, there’s a lot of gesture-specific stuff, and one noteworthy command is “mobile:scroll”.

The process is straightforward — we specify the desired direction, whether it’s “up” or “down”, and then execute the script with the “mobile: scroll” command. This action facilitates a smooth scroll all the way down, eliminating the need for loops. Consequently, we can efficiently locate a button element positioned at the bottom of the screen without the time-consuming repetition of scrolling. As a native API, “mobile: scroll” operates rather swiftly, enhancing efficiency.

We’ll run the Python code in native_mobile_scroll_ios.py to demonstrate how the native “mobile: scroll” command works in practice. We are opening the native Settings app, clicking on Siri & Search, then scrolling all the way down and then again all the way up.

driver.execute_script(“mobile: scroll”, args). HD video: https://youtu.be/EOCwsBvdgP0.

Now, let’s consider the pinch gesture. I’ve faced challenges with pinching, even when using the Actions API, as it involves calculating coordinates and moving x and y in different directions to achieve the desired zoom effect. However, with “mobile:pinch”, the process is straightforward — simply specify the desired scale, and that’s all there is to it.

These commands are tightly integrated with the iOS platform and are not compatible with Android; they are specific to iOS. This is a crucial consideration, particularly when employing a cross-platform framework for an app that spans multiple platforms. It prompts us to assess whether to continue using these iOS-specific commands or revert to the Actions API, ensuring a unified code base that functions seamlessly on both Android and iOS platforms. Nevertheless, achieving a shared code base is feasible with these native gestures as well; it primarily involves switching between the different APIs provided by Android and iOS.

# mobile: tap (assume an element object already exists)
args = {
"element": element.id,
"x": 5
"y": 5
}
driver.execute_script("mobile: tap", args)

# mobile: doubleTap - double-tap the screen at a specific point
args = {
"element": element.id,
"x": 100
"y": 100
}
driver.execute_script("mobile: doubleTap", args)

# mobile: twoFingerTap - two-finger-tap an element
args = {
"element": element.id,
}
driver.execute_script("mobile: twoFingerTap", args)

Consider the scenario of a double tap or even a two-finger tap, which are commonly employed, for instance, when integrating Google Maps for a zoom-in and zoom-out actions respectively. When using the Actions API, the process involves creating multiple actions, merging them, and hoping that the test will pass for this specific action. However, there’s often uncertainty — the test might pass, fail, or return a 200 code without any observable effect. At times, resorting to manual tapping becomes necessary to ensure functionality (or asking Jason Huggins to build a custom tapster robot). Let’s explore how these actions can be executed using the “mobile:” command.

driver.execute_script(“mobile: twoFingerTap”, {“element”: element.id})

With “mobile: twoFingerTap”, we simply specify the element where the two-finger tap should occur. These commands are then transmitted to XCUI, which manages the execution. This approach significantly simplifies our gestures when utilizing native commands. The same streamlined process applies to a tap or a doubleTap — we provide the “x” and “y” coordinates for either action, presenting a straightforward solution.

driver.execute_script(“mobile: doubleTap”, {“element”: element.id, “x”: 100, “y”: 100})

We can run Python script native_mobile_doubletap_ios.py to test zoom-in with the “mobile: doubleTap” command on the native Maps app on iOS.

driver.execute_script(“mobile: doubleTap”, {“x”: 100, “y”: 100}). HD video: https://youtu.be/QqFU-kfhloc.

Biometric Functions

Here’s an example of an execute_script call for iOS biometrics:

driver.execute_script('mobile: enrollBiometric', {'isEnabled': True})

The command is called enrollBiometric. What does this enrollBiometric command do? Well, it's something that works on iOS only, and it can either enroll or unenroll a user from the biometric system, meaning Touch ID or Face ID. For security reasons, this command only works on simulators. But we can see that in addition to the command itself which is sent as the first parameter of the method call, we are passing a Python dictionary {} as the second parameter. This is because the enrollBiometric command takes arguments. By convention, all these execute_script mobile methods take a single parameter, which on the server side of things is just a JSON object. In the Python client, we can send in a Python dictionary with keys and values that will be automatically converted to a JSON object when the command is called.

Each command will specify what sort of object should be sent in. In the case of this enrollBiometric command, it's an object with an isEnabled key, and a boolean value. If the value is set to True, then the simulator will be enrolled in the biometric system. If it's False, then it will be unenrolled.

Now, what functions do we have available to support automation of biometric security?

  • The first method we have is a mobile: execute script method, called isBiometricEnrolled. This mobile method takes no parameters and will simply return a True or False boolean value to let us know whether the device is enrolled in a biometric scheme. This is important to know, because before we attempt to automate a biometric match using Touch ID or Face ID, the device must be enrolled. So, how do we enroll it if it's not enrolled?
  • By using another mobile method, called enrollBiometric. This method does take a parameter dictionary, with a single key called isEnabled. This key should have a value of either True or False, denoting whether we want to enroll or unenroll the device into or from the biometric program respectively. Which program will the device be enrolled in? Well, every iOS device either supports Touch ID or Face ID, but not both, so we have to know what kind of device were automating in order to know whether we’ve just enrolled in Touch ID or Face ID.
  • The next step is to figure out how to actually trigger a biometric match. To do this we use the sendBiometricMatch mobile method. It takes two parameters, which combine into a single Python dictionary, with a key called type and a key called match. type refers to the type of biometric match we want, whether Touch ID or Face ID, and should be exactly equal to the string touchId or faceId. The match key should have a boolean value, where True denotes a positive match and False denotes a negative match. What do we mean by positive and negative matches? A positive match simulates a successful authentication of the biometric security. This is what we use when we want to see if our app correctly handles when a user passes the biometric test. And a negative match is used to simulate a failed authentication. It's what we use when we want to test how our app handles the situation where a user does not pass the biometric test. We might want to test that the user is not logged in in that case.

Python script biometric_faceid_ios.py tests out both positive and negative Face ID match scenarios. We use the above captioned functions with an open-sourced sample app.

“mobile: sendBiometricMatch”. HD video: https://youtu.be/pPqyB2Uadbg.

As an extra treat, this practice code includes some application management commands:

driver.execute_script("mobile: terminateApp", {"bundleId": app_bundle_id})  
# driver.terminate_app(app_bundle_id)


driver.execute_script("mobile: activateApp", {"bundleId": app_bundle_id})
# driver.activate_app(app_bundle_id)

App management is an independent subject on its own that deserves a separate post.

Performance Profiling

We use the mobile: startPerfRecord and mobile: stopPerfRecord commands to signal to Appium when during our script we'd like the profiling to occur.

Another fascinating capability at our disposal is the ability to assess the performance of our app, specifically on iOS. iOS provides APIs that allow us to measure the performance metrics of our application. We use the mobile: startPerfRecord and mobile: stopPerfRecord commands to signal to Appium when during our script we'd like the profiling to occur.

These APIs enable us to initiate profiling, execute actions like scrolls and clicks, and subsequently conclude the profiling. This process generates a trace file, which can be seamlessly integrated into Instruments or Xcode. These tools offer a visually informative dashboard, providing insights into how effectively our app is performing.

Open the trace file in Instruments and analyze the app performance.

If we want to identify specific threads exhibiting unusual behavior in our app, we can simply insert the trace file into Instruments. Instruments then provides a comprehensive overview of the main threads and other activities within our app.

This feature, accessible through “mobile:” in XCUI, exposes Apple’s time profiling. It enables us to scrutinize specific events over a given time span, revealing invoked threads, memory consumption, and identifying threads responsible for spikes in CPU performance. Apple includes various default profiling features, and by modifying the native mobile commands, we obtain the trace file mentioned earlier. Opening this file in Xcode Instruments offers a holistic perspective on our app’s performance.

Delving into the tool allows us to examine the multitude of threads, pinpointing the exact code associated with spikes in memory, CPU usage, or other performance-related aspects. Capturing performance data during the execution of our automation becomes a straightforward process.

args = {
"timeout": 60000,
"pid": "current",
"profileName": "Time Profiler"
}
driver.execute_script("mobile: startPerfRecord", args)

# test case steps

args = {
"profileName": "Time Profiler"
}
b64_zip = driver.execute_script("mobile: stopPerfRecord", args)
bytes_zip = base64.b64decode(b64_zip)
with open(trace_zip, 'wb') as stream:
stream.write(bytes_zip)

In our practice script performance_profiling_ios.py, we state the name of the “profileName” we want to keep. We pass in “mobile: startPerfRecord”. Then we do whatever our actual test cases are supposed to do and stop capturing performance.

More specifically, for the current process ID for a particular timeout, we capture the time profile in Xcode through Xcode build tools. We can ask it to capture the memory profile as well. It yields a trace zip file (not a straight trace file). All we need to do is know that it returns a base64 string, so we’ve got to decode that back and write it into our trace file. In the code we have a base64 string just written in a trace zip file. Then we enable performance log capture and perform some operations. After the operations, we stop capturing the logs. We can locate our trace.zip file at the path we provide in the code.

Generate a trace file with ‘profileName’: ‘Time Profiler’. HD video: https://youtu.be/JG2jzJEA_2Q.

When we open this file in Instruments, it displays a nice view of what our memory consumption looks like for the particular app.

Open the trace file with the Instruments Time Profiler.

This has been captured by Xcode build tools, not by Appium, for a particular process ID not all process IDs running behind the scenes. It’s completely specific to our app as well.

Analyze the app performance on a granular level.

I strongly advise incorporating automated performance testing into the app development process. Even with a simple test similar to the one described above, we can identify issues that significantly impact user experiences, which might go unnoticed by traditional functional tests.

Happy testing and debugging! Share your thoughts and comments on the subject. Connect with me on LinkedIn, X , GitHub, or Insta.

Resources:

GitHub repo with code samples and more: https://github.com/lana-20/appium-native-mobile-commands

Appium Execute Mobile Command: https://appium.readthedocs.io/en/latest/en/commands/mobile-command/

Appium Execute Methods: https://appium.io/docs/en/2.0/guides/execute-methods/

Android UiAutomator2 Driver mobile startActivity command: https://github.com/appium/appium-uiautomator2-driver?tab=readme-ov-file#mobile-startactivity

UiAutomator2 Applications Management: https://github.com/appium/appium-uiautomator2-driver?tab=readme-ov-file#applications-management

ADB Shell Commands: https://developer.android.com/tools/adb#shellcommands

Automating Mobile Gestures For iOS With WebDriverAgent/XCTest Backend: https://dpgraham.github.io/docs/en/writing-running-appium/ios/ios-xctest-mobile-gestures/

iOS ‘mobile:’ Screen Swipe: https://appium.github.io/appium.io/docs/en/writing-running-appium/tutorial/swipe/ios-mobile-screen/

XCUIElement Scrolling’ and ‘Performing Gestures’ Sections: https://developer.apple.com/documentation/xctest/xcuielement

About the iOS Instruments Trace Document: https://help.apple.com/instruments/mac/10.0/#/dev2843b037

Native Mobile Commands In Appium by Srinivasan Sekar & Sai Krishna AppiumConf2019: https://youtu.be/FPD6NpB8_zA?si=HZBFnVl_AzM-dEh_

Capturing Performance Data for Native iOS Apps: https://appiumpro.com/editions/12-capturing-performance-data-for-native-ios-apps

Android Espresso — Flashing Elements on Screen: https://appiumpro.com/editions/48-flashing-elements-on-screen

Automating Physical Buttons on iOS Devices: https://appiumpro.com/editions/68-automating-physical-buttons-on-ios-devices

Batching Appium Commands Using Execute Driver Script to Speed Up Tests: https://appiumpro.com/editions/85-batching-appium-commands-using-execute-driver-script-to-speed-up-tests

Learn how to do Biometric Authentication — Testing iOS Face ID with Appium: https://twitter.com/therunninglight/status/1641680364602290176

--

--

Lana Begunova

I am a QA Automation Engineer passionate about discovering new technologies and learning from it. The processes that connect people and tech spark my curiosity.