What Does QA Work Look Like in the Era of AI?

10 min readMay 28, 2023

Art generated by Midjourney using a prompt related to the article written.

Try and picture with me, if you will, a world where AI is the default method of writing new software. An engineer pulls up a prompt resembling a Terminal window on their Mac, feeding requirements in, and an AI agent intelligently bootstraps, writes and ships a solution for you. “Here’s your deployment URL!” says the prompt. Click and see the product of your hard work. Need a new feature? The AI will also take requests for new buttons, panels, modals, and anything, adding them to your solution, and rejecting any that don’t align with the scenario you’ve laid out.

It’s not hard to imagine. ChatGPT already writes essays, article synopsis’s, business plans, and more for fledgling writers. There’s even been some light orchestration of the coding process by some engineers that have used AI to generate code for Unity and random web apps. Hell, Copilot is increasingly being used and that more-or-less uses AI to template out large software blocks for you.

The framework is there for the portions of software engineering to slowly migrate towards AI tools writing most of this software *stuff.

(Inner-monologue: *Stuff is fine to use here, right? It encompases writing the software in an array of languages, editing files, pushing code, and building infrastructure for everything. Nobody will yell at me for using it.)

But, what about the QA parts of this process that happen afterward?

What about the manual testing required to ensure these solutions don’t have bugs that will block users from doing their tasks? What about the UX testing to verify the editorial aspects of this software complies with both regulatory standards and with basic accessibility functions for human beings? Or how about the automation wrapping all of these new apps, web pages, libraries, and UI bits that ensure every new deployment doesn’t regress on any functionality that’s intrinsic to the core tenets of the *thing you’re building?

(Inner-monologue: Let me just add: *thing is totally valid here. The term here can be a multitude of options, from a piece of accounting software to a library someone else needs to get something done. Totally valid usage. Nailed it.)

Software will always be built for people. That should be a matter of fact. I mentioned this in my breakdown about the blue-collarization of the tech industry. Even if our industry evolves into a more blue-collar affair resembling factory work, software will always involve a human touch during design, test, and some (even a small) portion of the engineering phase. That maps to something like I said above: someone potentially feeding the software requirements to the AI via a prompt that answers with a solution that looks like what you’d expect, and then a human being testing the deliverable to ensure it meets the business’ expectations.

That leads me to believe software will still require specific testing for verifying design intentions, ensuring bugs don’t creep into new features, and checking for regressions in every new build or release.

A lot of this stuff, we already do, by the way. The framework will continue on as it has with a few minor changes to our role.

Verifying design intentions

This is a funny one, since it’s gong to be the most crucial in a world of AI-generated software, but it’s also the one I assume most teams aren’t already doing.

In your traditional software engineering process, a person designs a *thing, then gives that thing to a software engineer who writes it, then QA makes sure the thing doesn’t have any bugs. In some organizations, the software engineer and design person are the same. Ideally, they’re not, however. Ideally, a different person who understands the path a human will take through the software and the business needs of a new feature will sit down and design the feature before passing it off to an engineer write it. This necessitates someone to check those original design intentions by playing the role of an end-user before the feature goes out.

(Inner-monologue: I can’t anymore with this *thing. This bit is over. It was funny the first two times.)

I’ve worked at places that simply have the designer go through and UAT (user acceptance test) the feature before it goes out. I’ve worked at places that have QA do this UAT work before the feature goes out. And — as mentioned before — I’ve worked at places this skip this entirely. Those organizations assumed QA would naturally check this UAT aspect of the ticket by simply bug-testing it. You can’t exactly bug test a thing that doesn’t critically do what you want it to, can you?

Whatever your process (let’s be honest here, have specific UAT testing whether it’s QA or not), this will be critical in the world of software built by AI.

QA’s like me already roll their eyes when they get a feature that they aren’t certain a software engineer tested in a basic way that meets basic requirements for some basic ticket. I’ll often get work that doesn’t even function — sometimes, it crashes outright. Or the required button on the ticket doesn’t even appear when you slide the browser into a mobile view. It’s very goofy.

Now, the software engineer will be a computer. There could be versions of this AI tooling that do sanity checks on the features being added to ensure they’re there, but the permutations of these features in different modes is going to be insane for this tool to check all of them. There’s also simple considerations for screen transitions and viewing content in the browser that the computer may not totally understand until it gets better.

Consider if you will, the way the DOM works in HTML. It’s all code that the browser interprets and the user views on their screen. Yet, this code doesn’t always translate to exactly what the user views on the page. That goes double if you start integrating style data with CSS. It means even adding a simple button may need to be checked by a human to ensure the visibility, size, and text all reflect what the designer intended and the end user will end up seeing by the end of it.

And — heck — maybe I buried a lede a bit, but I’m assuming these AI tools will allow software engineers to type in anything into the prompt, from “add a button to the user information page that lets the user edit their profile” to “add a blue edit button to the user profile page that is light blue on hover and which turns into an edit icon in mobile views”. Those are fundamentally different. Only one asks to display in mobile. Both are also not specific enough for a computer to guarantee the exact information the designer may have intended.

Ensuring bugs don’t creep into new features

This will be a decently quick point to make, since it’s largely what we’re already doing as QA. The bread-and-butter of most teams is checking software for bugs that we don’t want end-users to trip over when they’re doing their jobs.

Unlike the design intentions being verified, bugs may be a tad easier for an artificial intelligence to catch before providing a deployment to QA to check over. The reason is that a lot of this tooling (at least, with these GPT tools— go look it up) are basically averages of existing code the AI is writing for you based on the prompt. It more or less scans the code, gets a basic understanding of the format, and provides you with a solution meeting your requirements after learning that format and rewriting it.

I won’t get into the nitty gritty details, but what we can surmise here is that good-ish code in your AI training data will result in good-ish code in the solutions your AI is providing.

But, even then, let me beat on two quick points that make me confident QA will always have a role as bug testers.

Point uno is that even good code has bugs. You can ask any senior software engineer worth their salt if the years of experience they’ve accrued has purified the software they’ve written from any issues what-so-ever. Every single one of them will laugh at you. A few select highly intelligent engineers will go so far as to tell you that their experience hasn’t eliminated bugs from their code, rather it’s made it easier to identify and fix them. I’m mostly focused on QA here, but you can also see that software engineers that can fix problems are probably going to be extremely important in this new world of AI-generated software.

Really quick, but point dos is just a philosophical one: one person’s feature is another person’s bug. This means even well-written software will have bugs. Some could even be user error. Maybe the designer told the engineer to write a “red only button when editing is not available on the user profile”. It sounds fine, doesn’t it? Well, what I meant to write was “read-only button”. The software was written to spec! It’s valid! And yet a bug still occurred out of a design error. In the QA biz, we call that job security.

Checking for regressions

At the beginning of my career, I hated regression testing. It was the bane of my existence. It was tedious, checking every little bit of tickets I already tested before finally releasing everything. As time went on, I grew to love it though. By adding tons of API tests and UI automation, I was able to cover a ton of the boring, monotonous regression testing so I could focus on finding new bugs and going through features organically like a new user vs. some robotic script reader that needs to juice the apps I’m building to squeeze out as many bugs as I can before shipping.

With the above in mind, I’m excited about AI-generated software with regards to regression testing.

I haven’t seen anything like this yet (maybe I should build it), but I can imagine a world where the AI prompt both writes and hosts an application that does x and that provides a test suite that checks x. Maybe when I add a new button to my software *thing in React the tool will write a Cypress test that checks the button automatically for me.

(Inner-monologue: you get it now, right? Software *things can literally be anything. That’s especially so in the world of this new AI built stuff where a person can create the scaffolding for something in seconds and modify the hell out of it. Go generate your knock-off version of Facebook that only allows you to dislike content. Or Digg.com for only cute cat photos.)

Similar to the first point I made in this where the AI won’t always be able to guarantee the designed content shows up correctly on the page, these tests won’t always work as expected. They’re going to be based on existing tests hosted on the internet. But, it should be easy for a QA engineer to go in and change a few assertions to work as expected. This process will still save you time.

And in the case where these AI written tests are working how they should and testing the basic pieces of the features you’re shipping, this will give QA room to write more complicated tests that can check page transitions, form content, button actions, and all sorts of fun stuff that will further give QA room to regression test the more interesting parts of software from release to release.

It rocks, doesn’t it? That’s what I want to instill in a lot of QA people with these new technologies. Don’t get freaked out about them taking your job. Get invested in them now so you can make your job even easier. Start learning the ins and outs of these tools to build better software and make QA the people steering the ship for all these wonderful new solutions that organizations are going to create over the next 10–15 years because of AI.

Sort of anti-climactic after that last paragraph, but that’s what I got.

I was worried about a lot of this stuff months ago, when I started looking into it. It was easy to be as well. Some QA teams exist at the lowest rungs of software development shops. In the gaming industry, engineers are fighting to unionize to even the playing field and maybe cut down on the insane hours we face release to release to make sure customers have quality solutions for their businesses.

I can’t help but be hopeful though.

These new AI tools have a real chance at disrupting our industry. Previous eras caused us to speed up and ship more and more and more, faster and faster and faster, leading to crunch and (what I’d prescribe as) our quality assurance process constantly eroding in favor of businesses making more and more money. I see AI as a potential playing field leveler. It has the potential to give us a process that aligns QA more closely with design leads, that makes bug hunting even more crucial to businesses succeeding, and that can possibly generate automation for us that lets us do our jobs more efficiently in the long run.

What Does QA Work Look Like in the Era of AI?

Verifying design intentions

Ensuring bugs don’t creep into new features

Checking for regressions

Written by Patrick Ramser