How we achieve Continuous Deployment in a native app

Published in

Mercadona Tech

9 min readJul 2, 2021

In this article I’d like to share my experience over the last months, more specifically since last October, when we challenged ourselves to achieve a continuous deployment workflow in a native app; a feat which we finally achieved a month ago. Now we have a release process that is both safer and faster!

First of all let me introduce myself and what I do: my name is Luis and I’m a software engineer at Mercadona Tech. Since October 2019 I’ve been working on Deliverapp, the app our drivers use to make the delivery process easier. With the app our drivers can see the list of orders they have to deliver as well as performing various other actions: navigating to an order’s address in Google Maps, scanning an order’s labels to ensure the products being delivered are the correct ones, or reporting an incident to ACMO (Atención al Cliente de Mercadona Online).

First steps: Release Trains

In the beginning I followed the same strategy as we had in the Shop application. Every two weeks we prepare and release a new version. This strategy is quite well explained in this talk by Antonio Escudero.

But there was a big difference between the two apps: Shop was public and DeliverApp was for internal use (our only users would be our drivers). So the Shop app was released to Google Play whereas DeliverApp was released to a bucket in our Google Cloud Storage, something which allowed us to avoid Google validations and deliver faster.

Once we validated that model and we felt comfortable with it we decided to move on and make releases every Monday. We were able to deliver more value and receive feedback earlier (feedback loops as a lifestyle as our CTO Fernando Díaz repeats over and over again).

To ensure a low-impact release we followed a phased rollout approach. We first released the new version in one warehouse (the one in Valencia in this case), and two days later we released it to the rest of the warehouses. In case there was something wrong with the release we could fix it without impacting all the warehouses.

How do we notify the user that there’s a new version?

As we had an internal application which we weren’t releasing on Google Play we had to implement our own in-app update process, which worked as follows:

In the backend:

Provides an endpoint that serves the supported versions range with a min and max version allowed.
Via middleware verifies that the version is within range, and if not, returns a 416 status.

In the app:

If we receive a 416 we show a non-dismissible dialog to force update the app version. This process downloads the APK from Google Cloud Storage and launches the installation process (although we need manual interaction to complete the installation) (Image1).
During daily use, the app checks if there is a newer version available, and if so, downloads the APK in background. Then once the user logs out it suggests installing the update (Image 2).

When a new app version is released we make a request to that endpoint to update the maximum version supported.

Looking for automatization and refining the release process

The next step was to start automating that process, for which we used Fastlane and Jenkins. With Fastlane we created scripts that Jenkins would run across its various stages: linting, unit tests, integration tests, and e2e tests. However, our release process was still manual and needed to find a way to automate it

We created a new lane in Fastfile to prepare a release: this created a new tag in GitHub and after that we had to go to Jenkins to build that tag manually. It was a start, we avoided building and signing the APK manually and uploading it to the Google Cloud Storage bucket.

As the project grew, we released versions with a lot of changes (remember that we were still waiting until Monday to launch them). This created a problem, the more changes a new version has, the higher the number of possible bugs there could be. This gave us an easy next step on which to improve.

We discussed that point with the team and we all agreed that even though we planned to do a release in one warehouse first and then after two days to release in the rest of warehouses, we needed to take more measures to feel safe when we released new versions. The first one was an obvious one, to release smaller versions. The second one was to ensure we had a fast and automatic rollback mechanism. And the third one was to use feature flags to give us a way of removing bugs introduced by new features quickly without having to rollback or release a new version.

How can we rollback a native app?

Looking at other projects, we saw that they could easily perform rollback painlessly, just go to Jenkins, build a new version and let Jenkins do its thing, unfortunately we couldn’t do things the same way in Android.

Looking into the official documentation, there are two elements used to identify an APK:

“versionName”: a string used as the version number shown to users.
“versionCode”: used by the Android system to protect against downgrades by preventing users from installing an APK with a lower versionCode than the version currently installed on their device.

Does that mean that a direct rollback to an earlier version can’t be done? Well the documentation is pretty self explanatory but remember, we’re working with an internal application, so we were able to find a way around this and perform rollbacks.

We changed our code and the way we managed our versions. We decided to use the “versionName” as the actual version and then rather than updating the “versionCode” with each release, we decided to freeze it. Let look at this through an example:

We release a new version v32 with “versionName” = 32 and “versionCode” = 1.
We release a new version v33 with “versionName” = 33 and “versionCode” = 1.
We realize there’s a bug in v33, so we need to perform a rollback. This is possible because v32 has the same versionCode as v33 (“versionCode” = 1) and so the Android system allows us to install an APK with the same versionCode.

That’s how we created a rollback mechanism!

To show you a real world example, we realized we had a bug in v168 and so rolled back to v167. Notice how the version v168 is losing users as the out of range version dialog is shown!

3rd party dependencies

I want to explain this part in great detail to give a better understanding of the process. In the warehouse there’s a Delivery Manager who is responsible for coordinating all the drivers.

When the shift starts she has a meeting with all of them. There she distributes the routes and notifies them if there are any updates to the app. To give more detail about the new features in the app we create a document or a video which the Delivery Manager shares with the drivers.

After a few months a new person joined the team. She was in charge of creating these explanatory documents and videos, these were expected to be received when the version was already released or at least close to being released. A new player in the game!

After some weeks we realized that we had to split the two processes up so as not to create a bottleneck where one of us was waiting on the other; sometimes the document wasn’t ready on Monday and we had to delay the release to Tuesday or even Wednesday and vice versa. The two processes were tightly coupled and we needed to find a solution. This emphasized even more the need of using feature flags, and I have to say that once you start using them, you’ll never look back.

Use feature flags and sleep better

When we started using feature flags we decided to use Firebase Realtime Database. As the name suggests it’s a database that propagates changes in real-time. Once the library is included in the app there’s a listener to notify when a key changes its value.

Using feature flags provided two improvements, firstly we had a way to hide a new feature if we detected a bug in new developments, and secondly, we could release versions even if the internal explanatory documentation wasn’t ready yet.

As we can see in the image below we define one feature flag per warehouse (Vlc1 — Valencia, Bcn1 — Barcelona, Mad1 — Madrid), so we can show or hide a new feature in each individual warehouse and feel safer when do release it that we have reduce the impact and risk.

Firebase Realtime Database for feature flag values

In the past weeks we’ve changed the way we manage feature flags and we stopped using Firebase Realtime Database and replaced it with our own API. We use the headers in the HTTP response to provide a list with the enabled feature flags. If a feature flag is not included in the list that means the feature flag is not enabled.

Android Studio logs for the app to see feature flags in API response headers

By using feature flags we not only achieved two of our three measures that we mentioned previously to make our release process safer but also we broke our dependency on the internal explanatory documentation.

This allowed us to start doing more releases per week and not only on Mondays.

Last steps: we’re almost there!

The next step was to reduce even further the size of the versions and do what was basically a manual continuous deployment, we were releasing single commit versions to test the whole process. After a few days of doing that we changed our Jenkinsfile so that it released a version with every merge to master and announced it to all our engineering team!

To summarize the evolution of the app during the last year and a half I would like to share with you some metrics from the last 12 months. As you can see, the number of commits per release decreases at the same time the number of releases goes up.

Conclusions

Before:

Release trains
Huge versions (at the beginning some versions had up to 30 commits!)
No rollback mechanism
No feature flags
Dependency with a 3rd party

After:

Not worrying about when to release a version, every change is a release now
Single commit versions
Rollback mechanism
Feature flags
No dependency with any 3rd parties
Safer release environment (sometimes we perform up to 10 releases on Friday and it’s not an issue)

It may seem like just a few changes but I assure you that it has been a long road full of learning and of course, a few bugs along the way!

In my opinion this can’t be achieved without knowing the product and the domain you’re working on. I wouldn’t start a project with continuous deployment at the beginning, I would build tools as necessary. Having an automated process saves you a lot of time and reduces all the friction of the release process. I remember the first few times when I had to build and sign the APK through Android Studio, then once it was finished, uploading it to the remote folder, updating the number of the last version in the Versions Service, sending a message to announce there’s been a new update. This now seems like an insane amount of work, whereas now I only have to merge a pull request and voilá!

On the other hand, it’s useless to have an automatic process if later we have many errors and we are too slow in solving them. We should feel comfortable that we are releasing not only in a fast-way but also in a safe-way.

The next challenges are to adapt our flow to use Google Play instead of our Google Cloud remote folder. This has come about due to the fact that soon we’re going to be working with Android Enterprise and we want to take advantage of all of Google Play’s power. But I’ll save this for the next episode!

Stay tuned!