How Agile Tames Tough Software Problems

June 2020, Prime Minister of Saxony steps out of a new VW ID.3. (Photo Hendrik Schmidt/picture ... [+] alliance via Getty Images)

dpa/picture alliance via Getty Images

Software at its best is magical but when it goes wrong the problems that it causes can be horrible. The Economist in its Schumpeter column this week writes about “the recalcitrant problems of software” and cites some well-known examples of software problems such as VW’s new electric car—the ID.3, British banks, and Boeing’s 737 MAX aircraft.

The article goes on to suggest that more such major breakdowns are inevitable. Even “shiny new IT systems” will “rapidly devolve into rickety, half-understood contraptions held together with gaffer tape and a prayer.” Startups like Tesla with “no legacy systems to maintain, and fewer old bugs to root out” have an initial advantage. They “can spend more time on features that customers want.” But, according to Schumpeter, the advantage is at best temporary. “Bugs will creep in. Bodge jobs will go unfixed. Developers will leave, taking knowledge with them. Today’s feisty usurpers will become tomorrow’s clumsy incumbents, held back by their antiquated, unreliable IT—and ripe for disruption in turn.”

A worrying picture. But is it so?

Modern Software Is Different

The scene that Schumpeter depicts reflects the way that software used to be designed, built, and managed in the 20^th century. And it’s the way firms still practicing 20^th century management do software even today.

But it’s an anachronistic picture of modern software practices. Today, firms like Amazon, Netflix, and Microsoft, design, build, and manage software in a radically different fashion, precisely to avoid the problems described by Schumpeter.

Truly modern software is dynamic and constantly upgraded on a weekly, daily or even hourly fashion. Such software is often self-healing and designed to reconfigure itself if there are unexpected loads or events. Firms like Netflix make systematic efforts to try to cause the system to fail so that they can make it ever more robust.

MORE FOR YOU

Apple iPhone 16 Unique All New Design Promised In New Report

Huawei s Pura 70 Ultra Beats iPhone With Pioneering New Feature

Meet The Fintech Billionaire Making A Fortune Rewarding Home Renters

Such software is not built, as The Economist article depicts, by lonely individuals, who make mistake after mistake in their coding and then spend vast amounts of time trying to correct their errors. Instead, modern software is built by Agile teams who test the software as it is being written, both in terms of its immediate function and in terms of how it will interact with the rest of the system. As a result, at the end of each short cycle, Agile teams are able to produce robust bug-free software that is ready to be deployed, even in very complex systems.

Shoes found from the crash of the ill-fated Lion Air flight JT 610 737MAX (Photo by BAY ISMOYO / ... [+] AFP) via Getty Images)

AFP via Getty Images

The software they are building is also very different from unwieldy monolithic systems that are difficult to maintain or upgrade. The software they build is generally in the form of what are known as “micro-services”. These are small independent modules, each of which delivers a specific narrow function and interacts with other modules through carefully defined interfaces. The modules operate together as a network. This design is hard to execute but it has many advantages.

It enables load-sharing: if one module is overloaded, it can redirect the load to another module that can pick up the slack.
It facilitates maintenance and upgrading. You don’t need to mess with the whole system to change it. You can replace a module, like adding a new piece in a LEGO set. Because each module interacts by way of defined interfaces, the functioning of the overall system is unchanged.
It enables the system to cope with extraordinary loads and unexpected stresses. For instance, in 2018, when Amazon was confronted with an unimaginably vast surge in orders in one its famous sale-days, it was able to process almost 6 trillion orders on the space of a day. There were some short delays in some parts of the system as loads were redistributed. But the system didn’t crash. It bent, it self-healed, and it got the job done. Reconfiguring the system to cope with such surges in future didn’t involve rebuilding the whole system. It meant simply adding additional modules to cope with such eventualities.
It reduces the risk of building something that doesn't work or building something the customer doesn’t need or want.
It helps eliminate the intra-mural battles that often take place in big bureaucracies between rival silos. Thus, when Amazon introduced its marketplace of third-party sellers, it didn’t require the collaboration or permission of Amazon’s own retail operation, with which the new marketplace would be in direct competition. In a bureaucracy, there would have been a battle royal, as the upstart newcomer initiative invaded the territory of the incumbent’s system. But in Amazon’s modular approach, the new marketplace simply plugged its modules into the existing system using the existing interfaces. No debate. No battle. No permission required.

Whereas software as a monolithic system is built to last, like a physical building, modular software is built to enable change. And in a fast-changing marketplace, an ability to change has become a necessity.

The scale of software operations today is also mind-boggling. An unexpected surge of 6 trillion operations in a single day would sink any monolithic system: it’s all in a day’s work for Amazon’s modular software.

Microsoft has calculated that the possible permutations and combinations of its Windows software in the hundreds of millions of different kinds, sizes and types of computers that deploy it is greater than the number of atoms in the universe. There is no way that Microsoft could test all of those permutations and combinations when it introduces a change. But Microsoft has made remarkable progress in making Windows more robust: the days of the dreaded “blue screen” are long gone.

And Microsoft has dramatically speeded up its ability to change and upgrade its Windows software. Upgrades used to take 3-5 years. Now they can be done weekly, daily, or even faster.

In the “old days”, that is, up to just a decade ago, Microsoft would carry out upgrades by spending a year designing the specs of the upgrade and then take another two years actually writing the code. The result was increasingly problematic. The software was always several years late and still full of bugs. Moreover, any ideas or changes that emerged during the coding period had to be set aside, no matter how good, until the next iteration, still 3-5 years away.

This had to change if Microsoft was to survive. So in 2015, Microsoft finally switched to “Windows as a service” with Windows10. Instead of releasing a new version of Windows every three to five years, as the company did with past iterations of the operating system, Microsoft continuously updates Windows10.

In effect, the biggest and best of the tech firms have learnt their lesson. But other firms, not so much. Many firms, like banks, manufacturers and car companies, for whom software is mistakenly seen as a sideshow, rather than their core business, are still operating as though they were living in the 20^th century, building monolithic software systems that are hard to maintain or upgrade and issuing bug-filled upgrades every few years. Such systems have a built-in tendency to crash, putting the firm out of business for hours, or even days, at a time. The crashes get the top’s attention but often not sufficiently to instigate a change in approach. Instead, the software team is often instructed to add more gaffer tape to the system in the vain hope that there won’t be another crash.

Adam Barr: The Problem with Software

The Economist's Schumpeter article relies on the book, The Problem With Software (2018) written by Adam Barr, who gained his experience as a software developer working at Microsoft on Windows NT in the 1990s. Barr’s depiction of the software development scene reflects that world. His view of better software seems to assume building better monoliths. In fact, the terms, “monolith” or “microservices” do not appear in the book. It goes without saying that big software means monoliths.

He is skeptical of modular or Agile approaches to developing software and doubts whether they could possibly work, citing the lack of formal academic studies proving that these new methods work better than “the old way.” He shows no sign of noticing that his old employer, Microsoft, has abandoned “the old way” and can issue seamless upgrades in hours or days, not the years it used to take. He ignores the evidence of what Agile firms like Amazon and Netflix have been able to accomplish. He notes that his familiarity with Agile methods consists of “having read a few books about Scrum” (p.278) but offers no evidence that he ever worked in an Agile workplace.

The Agile Mindset

Lack of Agile working experience would not be a disqualifying obstacle if Barr showed signs of having internalized the Agile way of looking at and thinking about the world—an obsession with the customer, working in self-organizing teams, and operating as a network of competence rather than a vertical hierarchy of authority.

But evidence in the book of an Agile mindset is slender to non-existent. Instead, Barr evinces the mindset of Microsoft in the 1990s, that seems to think “if we could just work harder and find better engineering practices, this time will be different: our next iteration will be bug-free and completed on time.” But it never was. The software was always several years late and full of bugs until Microsoft changed its way of thinking about developing software.

Barr does mention “software as a service” as a positive step. But there is scant recognition that for “software as a service” to succeed, the software has to be designed and built differently, in a modular fashion, so that it can be continuously maintained and upgraded in the light of customer experiences. The monolithic software of the kind that Barr helped build at Microsoft in the 1990s with cycles of 3-5 years can’t get that job done. Building modular software requires a different kind of mindset—an Agile mindset.

Both Barr and Schumpeter make fun of the interview practice in software firms of requiring candidates to show how they would approach a particular software challenge. What both authors miss is that the interview practice is designed to reveal, not so much what the candidate knows or whether the candidate can solve the particular problem, but rather understanding how the candidate thinks about solving problems. In other words, whether the candidate has the right mindset. It’s a test that both Barr and Schumpeter would almost certainly fail. They are still using 20th century linear thinking in situations where the problems are complex, multi-dimensional, and rapidly changing.

Nor is there evidence in the book that Barr is aware of the minor miracles being accomplished by the software of Amazon or Netflix or how they succeeded. The book’s chapter on Agile is mainly spent on the tangled history of the Agile movement, along with dismissive remarks about Agile generally, such as its failure to contribute new engineering methodologies, its lack of engineering practices, and the absence of formal academic studies about outcomes. The fact is, better engineering was never the intent of the Agile pioneers: Agile was always about better ways of managing the process of developing software and protecting developers from the unhelpful interventions by the company’s bureaucratic managers. The real-world outcomes of Amazon, Netflix, and Microsoft, speak louder than any academic study.

The Problem with Software Was Management

Agile reflected a recognition that managers were getting in the way of completing work in a timely and professional manner. The problem with software wasn’t the software developers. The problem with software was management.

The more the managers intervened, the less good work got done. There was a recognition that complex rapidly changing work of software development couldn’t be managed bureaucratically like building something static like a highway. It had to be managed more nimbly, flexibly, and interactively. They called this different way of managing “Agile”.

The fact that this different way of managing happened to be a better way of managing the rest of the organization as well came as a shock to many. It turned out that work elsewhere in the firm was also complex and rapidly changing. Software was simply the first sector to encounter and recognize that.

So Agile management began to spread. As Barr sarcastically describes it, “Agile has oozed out into the world beyond software,” (p. 260) as if Agile was some kind of noxious poison, rather than the solution to many 21st century management challenges.

The “problem with software” as described by Barr and The Economist is ultimately not a software problem at all. It’s a management debacle. It reflects a reluctance to recognize that the world we live in is complex, multidimensional, and rapidly changing. This is a world in which old hierarchical ways of managing or doing software no longer work well anymore.

The problems that the car companies, the British banks, and Boeing, have had with software crashes are symptoms of this deeper issue. They will go on having software problems unless and until they embrace 21st century management.

Why Hardware Must Embrace Agile Principles

Follow me on Twitter or LinkedIn. Check out my website or some of my other work here.

More From Forbes