My Debugging Bot saves 90% time

5 min readDec 20, 2023

Starting work on a UITest automation project got me really excited. I led a team of testers who handled test cases for user interfaces. I found my passion resonating more with Solution Architecting and Research tasks from my prior experiences. Eager to contribute, I applied those skills to develop a Bot aimed at enhancing efficiency for my team.

Context

In the initial training sessions, we learned how automation testers use Resource or Accessibility Identifiers to access UI Elements. These identifiers, specified by developers, validate accessibility and appearance. However, if developers change these identifiers in the code, it breaks the test cases. This compels automation testers to adapt to the new identifier.

Adopting a new identifier is more complicated than it seems. Testers must check logs to determine if the test failed due to a bug or an identifier mismatch. They then inspect the view hierarchy to find and update the code with the new identifier.

Boring Maintenance Task

Here’s a step-by-step breakdown of the existing process:

Tests run on a CI/CD pipeline, so testers must monitor failed cases.
They download logs of failed executions for review.
Identifying the root cause of failure comes next.
Fixing involves updating the code with the new identifier.
After updating, they test it locally and again in the CI/CD pipeline.
A Pull Request is raised only upon successful execution.

I crafted solution in 3 phases.

Phase 1 : Read logs for clustering similar failures. For reducing duplicate efforts by multiple team members.

Phase 2 : Continuous monitoring of Developer’s code to promptly reflect identifier changes in the Tester’s code.

Phase 3 : Solve challenges of Phase 2. Monitor executions, allowing immediate updates and fixes to the Tester’s code upon failure occurrence.

Phase 1 : Failure Clustering

The bot keeps running for 24 x7. It monitors failures and reads log messages of every failures. 30% of failures were common across multiple test cases.

The bot notifies testers when a colleague’s fix is expected to resolve their own failure, enabling them to avoid redundant efforts on the same issue.

Phase 2: Developer Code Review

Bot parses all source code files of Developer. Git hook was integrtaed with Bot to get triggered as soon as a developer’s Pull Request is merged to Main branch.

The bot keeps track of all UI element IDs. When a new Pull Request is merged, it checks for changes in IDs compared to the previous version.

👎👎👎 90% of identifiers could not be detected as Android Developers conditionally use render from multiple Layout XML files. In other words, identifiers used to change at output without observing change in Code as some logic was driven by API response or conditions.

Really Frustrating

💡💡💡 The most effective solution is to review output video recording and output logs. Let’s move to Phase 3 ⚡

Phase 3: Review Output Logs & Fix Code

So Plan here is to Pick a Failure, Find its previous successful execution and compare identifiers in their log files. Later adopt it in Tester’s code.

To compare screenshots, we need vision intelligence. I used MobileNet model’s embeding layer and compared images using cosine similarity.

Map the identified screenshots with log files of the same timestamp. So we have now log files to compare identifiers by just reading them.

Prior to comparison, we need to parse failure message to get identifiers those raised assertion. We achieved it using Regular Expressions.

Now check for the same identifier in Passed and Failed execution’s log. Thus we can get the idea of identifier-x has became identifier-y.

Wait !!! The old log has identifier_x which is missing in new log and new log has a number of identifiers like identifier_y, identifier_p and identifier_q which are missing in old log file. How did we confirm if identifier_y is the correct replacement of identifier_x ??? 🤓🤓🤓

We have solution

Well we compared coordinates, classes and UI properties of new identifier’s host element with missing old identifier, and one with the most similarity becomes the replacement of old identifier.

Now Critical step is to update Tester’s code with new identifier. Sometimes we have to replace old identifiere and in other cases we had to keep new identifier as an option to old identifier as Andriod code conditionally shows identifier_x and identifier_y.

Here we took help from GenAI; Mistral Model has acceptable range of accuracy with code generation. The bot feeds existing code to Mistral and asks it to modify code by adopting the new identifier. Bot specifies in prompt whether to append or replace the new identifier and Mistral takes care of it.

Effortless Coding

We place updated code back to CI/CD pipeline for testing and later trigger Git commits for raising Pull Request.

Effectiveness

The bot’s implementation significantly improved efficiency by enhancing accuracy and speed. It notably slashed triaging time from a previous 4-hour endeavor to a mere 10 minutes.

Our team meticulously oversees 400 Test methods each day, encountering an average of 40–60 failures. Previously, addressing these issues consumed nearly a week’s effort. However, the bot has condensed this timeframe dramatically, streamlining the process to a mere 2 days.

Demo

Demo Video

In above demonstration video, the bot executes all six prescribed steps within a swift 20-second timeframe. This achievement stands in stark contrast to the average 10 minutes typically spent by a tester conducting code reviews on raised pull requests.

Conclusion

The implementation of this bot has proven to be a transformative addition to our workflow. Its swift actions and accurate notifications have significantly reduced redundant efforts, saving substantial time previously spent on triaging and resolving issues. By streamlining our processes, this automation tool has not only enhanced efficiency but also empowered our team to focus more on critical tasks, leading to a more productive and streamlined work environment.

The bot doesn’t solely rely on AI; rather, it integrates specific AI modules as needed, aligning with the complexity and demands of the tasks. Its primary objective remains problem-solving. To prevent unnecessary complexity, we strategically employed AI where essential, avoiding overengineering. Acknowledging the inherent nature of AI models, their accuracy may not always reach 100%, necessitating manual reviews whenever AI intervention occurs.