A.I. Is Solving the Wrong Problem

People don’t make better decisions when given more data, so why do we assume A.I. will?

On a warm day in 2008, Silicon Valley’s titans-in-the-making found themselves packed around a bulky, blond-wood conference room table. Although they are big names today, the success of their businesses was hardly assured at the time. Jeff Bezos’s Amazon operated on extremely tight margins and was not profitable. They had just launched the cloud computing side business that would become Amazon’s cash cow, but they didn’t know it yet. Sean Parker had been forced out of Facebook, retreating to a role as managing partner of Peter Thiel’s Founders Fund. He was a few years away from a critical investment in Spotify. To his right was Elon Musk, whose electric car company Tesla was in financial crisis and whose rocket company SpaceX had spent millions of dollars producing a rocket that had failed to launch three times already. Across from them, Larry Page was riding high on an acquisition spree at Google but was being sued by Viacom for $1 billion. Also in the room was one of Google’s former employees, a young software engineer named Evan Williams, who had just co-founded a new company called Twitter.

They committed nine hours of their valuable time crammed around that conference room table to listen to Nobel-Prize-winning scientists teach them about how human beings make decisions. They would apply this knowledge to give their technology companies an edge over the competition using what would eventually be called artificial intelligence.

No time on the agenda was devoted to the issue of data quality. The scientists in the room were behavioral economists, for whom the fantasy of an agent — either human or machine — being able to make completely rational decisions was something to critique and debunk. They didn’t talk about the problems of getting perfect data to build perfect decision-making machines because they didn’t believe that either thing was possible. Instead of eliminating human biases, they wanted to organize technology around those biases.

More and more data

For as long as the Department of Defense (DOD) has collected data, it has spent billions of dollars attempting to “clean” it. The dream of a master brain in which streams of clean and accurate data flow in and produce insight and greater situational awareness for governments and armies is as old as computers themselves — France tried it, Chile tried it, the Soviet Union tried it three times, undoubtedly China is trying it right now — but no matter how much data we gather, how fast or powerful the machines get, it always seems just out of reach.

Experts will point out that data scientists spend roughly 80% of their time cleaning data, and the key to achieving these centralized A.I.-powered command centers is breaking down silos between the services and creating interoperable flows for A.I. models. The DOD spends an estimated $11–$15 billion per year on the staff that manages their data into some form of cleanliness. After decades of investment, oversight, and standards development, we are not closer to total situational awareness through a computerized brain than we were in the 1970s. As the computers have gotten more advanced, the amount of data they are drowning in has increased too.

Instead of eliminating human biases, they wanted to organize technology around those biases.

And it’s not just the DOD’s money that has failed to solve the problem. Despite $16 billion in investment from the heavy hitters of Silicon Valley, we are decades away from self-driving cars. Despite billions invested in A.I. moderation, the largest social media companies still rely heavily on armies of human beings to scrub the most horrific content off their platforms.

It may not be a surprise that Big Government can’t get a good return on investment from A.I., but it seems Big Tech can’t either.

Maybe we’re solving the wrong problem

When attempting to engineer a solution to a hard problem, it’s worthwhile to strip things down to first principles: What assumptions are we making, and how do those assumptions frame what problems we think we need to solve? If those assumptions were different, would we be solving different problems? How do the problems we want to solve map to outcomes we value?

The outcome we’re all hoping for from A.I. is better decision-making. Total situational awareness is attractive because we assume that giving leaders access to more data is the key to making better decisions, and making better decisions means fewer negative impacts. There’s no mystery as to why the DOD would want to prioritize technology that will allow it to prevent conflict or minimize collateral damage. There’s no confusion as to why Facebook wants to control hate speech on its platform.

But the research that has been done by scientists like the ones who mentored the tech leaders in that conference room in 2008 calls the value of knowing more into question. In real life, decision-makers optimize for conserving effort. Total situational awareness is less desirable than tools that facilitate the team effort leading up to a decision. After all, decisions are often judged by results, which includes a bit of luck as well as correct analysis. Before those results are realized, even the most careful and thoroughly constructed strategy backed by the absolute best data cannot offer a guarantee, and everyone involved knows it. For that reason, the process of making a decision is less about an objective analysis of data and more about an active negotiation between stakeholders with different tolerances for risk and priorities. Data is used not for the insight it might offer but as a shield to protect stakeholders from fallout. Perfect information — if it is even achievable — either has no benefit or actually lowers the quality of decisions by increasing the level of noise.

That seems unbelievable: Perfect information should automatically improve the decision-making process. But it doesn’t because more information rarely changes the organizational politics behind a decision. A.I. can correctly identify the content, but the decisions made based on that content are heavily informed by the norms and expectations of both the users and the organization. Facebook’s moderation policies, for example, allow images of anuses to be photoshopped on celebrities but not a pic of the celebrity’s actual anus. It’s easy for human beings to understand how the relationships between stakeholders make that distinction sensible: One violates the norms around free speech and public commentary; the other does not.

As long as decisions need to be made in teams, accounting for various stakeholders and their incentives, the best path to improving decision-making isn’t simply adding more sensors to get more data. You need to improve communication between stakeholders.

This begs the question: Do we need to invest billions of dollars cleaning data and sharpening our sensors in order to see benefits from A.I.?

Poorly designed A.I. is a (national) security risk

The way we talk about data quality is misleading. We speak of “clean” data as if there is one state where data is both accurate (and bias-free) and reusable. Clean is not the same thing as accurate, and accurate is not the same thing as actionable. Problems on any one of these vectors could impede an A.I. model’s development or interfere with the quality of its results. There are lots of reasons data coming into a model may be problematic. Some are obvious: The data is factually incorrect, corrupted, or in an unexpected format. Other problems are more nuanced: The data was captured in a particular context and is being reused inappropriately; the data is at the wrong level of granularity for the model’s purpose; or the data isn’t standardized, and the same facts are represented or described in different ways.

Solving for one of these problems with a single source of data is difficult enough. Solving all of them at a large organization in an environment where adversaries will attempt to inject these systems with bad data in order to sabotage the models they feed is practically impossible. In our vision for A.I., we cannot afford to ignore that while innovation creates opportunity, it also creates vulnerability. A.I. will invent new ways to attack problems but also new ways to be attacked. Just as the digitalization of power plants, public transportation, and communication systems have created cyber armies and cyber crimes, A.I. will create a new generation of adversaries that find opportunities to compete for advantages by attacking the tools rather than building a competing force. From sensor attacks to deep fakes to “location spoofing” of satellite data, the techniques to blind or misdirect the enemy by corrupting the data are developing alongside the core technologies they will sabotage.

Current A.I. systems are completely dependent on the quality of their data not because the technology is immature or broken but because we have designed them to be vulnerable in this fashion. Production A.I. systems must be designed to be resilient to bad data. If we change the problem we are attempting to solve, we can change the design to mitigate the risk of attacks on A.I. We need to make A.I. antifragile.

What is antifragile A.I.?

In system thinking, “antifragile” is a design that not only recovers from failure but is actually stronger and more effective when exposed to failure. When we build A.I. based on what actually improves decision-making, we create an opportunity for antifragile A.I. We know from existing research into cognitive science that good decisions are the product of proactively articulating assumptions, structuring hypothesis tests to verify those assumptions, and establishing clear channels of communication between stakeholders.

Many of the cognitive biases that trigger so-called human error are a result of a blocker on one of those three conditions. When people do not clearly articulate their assumptions, they apply solutions that are inappropriate given ground conditions. When people do not test their assumptions, they fail to adjust good decisions to changing conditions. When frontline operators are not able to efficiently share information up the chain of command and among each other, opportunities to spot changing conditions and challenge assumptions are lost to the detriment of everyone.

A.I. is so vulnerable to bad data because we overemphasize its applications in classifying and recognizing and underemphasize its applications in suggesting and contextualizing. Simply put, A.I. that makes decisions for people is A.I. that can be sabotaged easily and cheaply.

Designing antifragile A.I. is difficult because the line between accepting the output of an algorithm’s analysis as a conclusion and treating it as a suggestion or prompt is a design challenge. Remember that decision-makers optimize for conserving effort, which means that any opportunity they have to accept an A.I. output as a conclusion, they will take it unless the user experience is designed to make that difficult. This tendency is at the heart of the catastrophic errors we’ve already seen applying A.I. to criminal justice and policing. The model was built to contextualize, but the user interface was built to report a conclusion; this made the product itself extremely vulnerable to bad data.

Meanwhile, medical science A.I. has been able to improve the quality of decision-making precisely because many diagnostic challenges have no single correct answer. In diagnosis, any set of symptoms has a range of possible causes with different probabilities. A clinician builds a decision tree in his head of all the possibilities he can think of and the tests that will rule out certain possibilities. The process of diagnosing a patient is about creating a cycle of defining assumptions, ordering tests, and narrowing the set of possible answers further and further until a solution is found.

The products that are designed to assist by prompting doctors with other possibilities to add to their mental models and identify tests that might speed up the time to a successful diagnosis have demonstrated improvements in patient outcomes despite bad data. A.I. in these cases has been used to improve communication and knowledge sharing between medical professionals or to elicit new and relevant information from the patient at critical times.

The products that try to outperform doctors by, for example, classifying a tumor as cancerous or noncancerous or determining whether lung spots are Covid-19, have been plagued with bad data issues.

Strong A.I. in a world of bad data

When determining how to best leverage promising technologies like artificial intelligence, technical leaders need to first consider how they are defining the problem that needs to be solved. If the goal of A.I. is to improve decision-making, then A.I. should be directing decision-makers into hypothesis tests, not trying to outperform experts. When A.I. attempts to outperform experts, it becomes fully dependent on the quality of the data it receives, creating a set of vulnerabilities that adversaries can cheaply and easily exploit.

When the goal of A.I. is not to best the top experts but instead reinforce and support good decision-making practices, that technology is resilient to bad data and capable of becoming antifragile. But that A.I. doesn’t make determinations. Instead, it helps people articulate the assumptions behind the decision, communicate those assumptions to other stakeholders, and alert the decision-makers when there are significant changes to ground conditions relevant to those assumptions. A.I. can assist decision-makers in figuring out what states are possible or in what conditions they are possible. Such a solution can strengthen a team’s overall performance by addressing existing weaknesses rather than inventing new ones for adversaries to exploit.

Author of Kill It with Fire Manage Aging Computer Systems (and Future Proof Modern Ones)