AI Deserts

Jennifer Pahlka
Code for America Blog
10 min readOct 7, 2019

--

image courtesy Santiago Medem via flickr (CC BY-SA 2.0)

Yes, it’s true: Artificial Intelligence is coming and it’s going to change the world around us. Actually, AI and ML (machine learning) are already here, and we’re failing to appropriately grapple with the ramifications, especially ethical concerns. I don’t feel compelled to explain or support the above statements. Open any magazine, click randomly on any article on Medium, visit any public event at a think tank; chances are, concerns raised by the age of AI is the topic. Some of it will be bunk, some of it very thoughtful, but the topic is not exactly underdiscussed. What is under-discussed is how unevenly this change will happen, because we misunderstand and overestimate the preconditions for AI outside the private sector.

The first precondition for AI is data — lots of it. The private sector has digitized and enabled the collection, transport, and everyday use of massive amounts of previously inaccessible information. Google, Facebook, and by extension their advertising customers and applications that use their platforms, now have enormous amounts of information about us not just because we share this data explicitly, but also because these systems monitor what else we’re doing when we’re online. That’s why when you put a pair of shoes in your cart online, an ad for those shoes follows you around everywhere.

That’s not AI, but those retargeting ads are the tip of a huge iceberg. There is a hidden world of cooperating applications, passing data back and forth through a kind of digital bloodstream, constantly making connections. The digital transformation of legacy industries combined with greenfield digital environments like social networks has resulted in systems of systems that talk to each other because of how they are architected. AI and ML are possible in these environments because they consume and learn from the vast stores of data captured and connected across applications. If AI were a plant, data would be its air, soil, and water, and in these connected ecosystems, AI’s roots can reach far and wide to access resources. Massive amounts of data are needed to make these technologies work well. Remember what Google’s Peter Norvig once said: “We don’t have better algorithms than anyone else; we just have more data.”

But digital transformation hasn’t happened evenly across our society, and it’s particularly the public and social sectors that have been left behind. Here, large-scale systems of systems that talk to each other are few and far between, and the low availability and connectedness of data means that these sectors may become AI/ML deserts, so to speak. Visit the offices of your local homeless shelter and ask to see their data. (Of course, they won’t let you do so for privacy reasons, but pretend with me for a moment.) Their data is on paper, or trapped in a proprietary database that only one person in the organization still knows how to do exports from, and that person is retiring next month. They likely receive significant amounts of data from the other organizations and agencies they collaborate with in unstructured formats: email, fax, and phone calls logged by humans.

Visit the open data portal of any government website and you’ll see lots of data sets; they are usually manually exported from standalone systems on a regular basis (or once were, until someone’s stopped updating them after a personnel change) and they are almost all of the size you could easily download to your laptop in a minute or less with a reasonable internet connection. Open Data Barometer found in a study of 115 governments’ publicly available data that only half of it was machine readable. If a machine can’t read the data, it can’t be used to train an algorithm. AI will not grow in soil this thin and dry.

If a machine can’t read the data, it can’t be used to train an algorithm.

Take for example the question of how AI will be used in the military. There’s a lot of concern, and understandably so. Killer robots making independent decisions are the stuff of dystopian science fiction. So advocates for AI in the military are likely to tout the benefits of something like predictive maintenance for aircraft. Those kinds of applications raise fewer ethical complications, and paint a positive picture of servicemen and women enabled and empowered by technology to keep our military competitive. As the story goes, AIs will consume historical datasets of aircraft maintenance logs and begin to tell us when parts need to be replaced before they fail, reducing costs and increasing our military readiness.

But go into the field and try to find those historical data sets. If you are assuming that data is held in some sort of enterprise IT system, you will be very surprised. What you are likely to find (and forgive a little creative license to make my point — this is not meant to map exactly to any given military unit) is a spreadsheet that goes back six months, to when either the computer or the personnel responsible for maintaining it was assigned to that station. Perhaps it goes back earlier than that, but the earlier data is in a different format, or on paper logs. And each unit in each of the services maintains its data separately, with no standard across them. People much smarter about AI than me will point out that computers will soon be clever enough to overcome the standards problem; they’ll simply learn to infer what different fields mean and do the normalization themselves. But they may not have that chance; gaining access to that data from each individual unit is possibly the hardest challenge, because it’s not a technical problem, but a legal, bureaucratic, and human one. When we say things like “the Department of Defense has that data,” it makes it sounds like any given human of sufficient rank could assemble it. That simply isn’t true in practice. It’s not just the Department of Defense; what digital means is closer to the fragmented spreadsheet scenario than the systems of systems scenario in most government contexts, and certainly in the majority of nonprofits.

One theory goes that the benefits of AI will be so great that governments in particular will invest in the systems they need to take advantage of these technologies. In short, they’ll finally modernize. But governments have had plenty of reasons to modernize up until now, and to be fair, they’ve been trying, to the tune of $200B annually. (That’s what the US government spends on digital technology.) The problem is not how much government spends; it’s how government spends it.

Despite some bright spots of change, government remains constrained by antiquated procurement practices that focus on processes, rules, and mandated outputs rather than outcomes, and consistently result in mega-projects that take years and ultimately fail to produce working software. If you decided that the way to enable predictive maintenance of aircraft in our armed services was to build an enterprise system to capture that data (and therefore avoid the scenario in which contractors own the data and sell it back to government), you’d be staring down the barrel of a procurement process that would easily take six years, likely get delayed past that, and stood a small chance of ever being rolled out past a small pilot, if it even worked well enough to get that far. And that might only put you at the start of collecting the necessary data. Remember, algorithms are only as good as the data we feed them. What they want is many years of historical data across as many circumstances as possible. That’s very unlikely for many of the functions government is responsible for.

All this speaks to the second precondition for AI that we often fail to account for: organizational competence at digital that’s rooted in enduring structures and cultures. This is important because it’s a key driver of data availability, but it’s also important in its own right if you care about ethics in AI, because government is at least theoretically accountable to a democratic process. For example, to this issue of predictive maintenance of aircraft, you might argue that our military will have this capability because the contractors will start to put sensors on the aircraft they sell. That’s certainly possible, but while sensor-enabled devices proliferate rapidly in the consumer sector because consumer devices are replaced every year or two, most hardware used by the military has an operational life measured in decades, so it’s not going to happen soon. More to the point, when or if that does happen, the contractors will almost certainly try to sell the data and predictions back to the military rather than enabling them to have direct access, because that’s a better business model for them, and government will stay behind once again in its digital competency. Many cases like this just devolve to the same problem we started with: the private sector has the capabilities and therefore the power, while government, charged with, among other things, regulating the private sector, is fighting on and increasingly uneven playing field.

There are many exceptions to the rule of slow adoption of AI in government, of course, and many of the examples one might point to will suggest government’s lag would be a feature, not a bug. For example, the data collected through Project PRISM, the National Security Agency’s mass surveillance project, are ripe for AI and ML applications, and have almost surely been mined using these technologies. Remember, though, that these data were collected by systems built by the private sector; the NSA just accessed them. The reason government can do this is that it was sufficiently valuable for someone else to collect and retain enormous amounts of information about people for a wide variety of reasons; because, as many others have pointed out, we are the product. There are many areas where the value to society of collecting and making use of data is very high, but there’s insufficient profit incentive. That’s where government is supposed to act, but can’t, if it’s not digitally competent or competitive.

Government will try to do AI, of course, even without sufficient data or digital competence. Virginia Eubanks gives us just one chilling example in her book Automating Inequality: a predictive algorithm used in Allegheny County, PA, aimed at projecting which children are likely to become victims of abuse, and therefore should be removed from their families. In a classic case of failing to account for bias, the algorithm used proxies instead of actual measures of maltreatment. Eubanks explains “One of the proxies it uses is called call re-referral. And the problem with this is that anonymous reporters and mandated reporters report black and biracial families for abuse and neglect three and a half more often than they report white families.” There must be nuanced and thoughtful debate about whether algorithms should have any role in removing a child from their family, but let’s not judge the potential of AI from examples like this, in which government buys what’s sold as AI but is an impoverished cousin of AIs that leverage larger data sets, do a better job accounting for bias, and are much more thoroughly tested. This is not to say that AI in the private sector is always good (of course!), just that an organization without core digital competence is doomed to use AI badly, with potentially disastrous consequences.

This all probably sounds like great news to those eager to slow our march towards an AI future. The problem is, it won’t. The benefits companies stand to gain putting AI to use towards their goals will assure that (including those who sell to the public sector). What it does mean is that the gap between the sectors of our society is likely to grow to such a size that the sectors that haven’t already undergone a digital transformation — much of the public and social sectors — may literally never be able to catch up. The ethics of AI are just as complicated in public and social sectors as they are in the private sector, but the growing divide in capabilities is problematic in its own right. Do we want vulnerable populations and national issues to be beholden to the self-interested decisions of the private sector? There are consequences we don’t fully grasp to the spread of AI; there are also consequences we don’t fully grasp to the hugely uneven advance of it.

AI could mean the gap between the sectors of our society is likely to grow to such a size that much of the public and social sectors may literally never be able to catch up.

Efforts around “AI for good” are aimed at enabling AI to have positive impact, and these are laudable; but targeted efforts here will one by one face the barriers of poor data environments and low digital competence in government and nonprofits. What’s needed are more comprehensive efforts to deal with the root causes of the public and social sectors’ data and digital handicap — the work we need to be doing anyway to make government work for people regardless — before the AI revolution sends this whole problem into massive overdrive. That’s enormously hard work, but it’s work we must do.

We need an effective, capable public and social sector. Government and nonprofits play a critical role in what matters most in our lives: our health, our safety, our vulnerable kids, our friends and neighbors recovering from natural disasters, our veterans, our national infrastructure, our response to the climate crisis. Governments and advocates are also meant to serve as an important check on corporate power. We already live in a world with an enormous asymmetry between the capabilities of the private sector and the public and social sectors, and we are confronting a future in which that asymmetry will grow exponentially year over year.

How to address this must be among the questions we ask ourselves as we confront an AI age.

--

--

Jennifer Pahlka
Code for America Blog

Author of Recoding America: Why Government Is Failing in the Digital Age and How We Can Do Better, Fellow at the Federation of American Scientists