What is the acceptable failure rate of an airplane? Well, it is not zero… no matter what how hard we want to believe otherwise. There is a number, and it is a very low number. When it comes to machines, computers, artificial intelligence, etc., they are perfectly imperfect. Mistakes will be made. Poor recommendations will occur. AI will never be perfect. That does not mean they do not provide value. People need to understand why machines may mistakes and set their beliefs accordingly. This means understanding the three key areas on why AI fails: implicit bias, poor data, and expectations.
The first challenge is implicit bias, which are the unconscious perceptions people have that cloud thoughts and actions. Consider, the recent protests on racial justice and police brutality and the powerful message that Black Lives Matter. The Forbes article AI Taking A Knee: Action To Improve Equal Treatment Under The Law is a great example of how implicit bias has played a role in the discrimination and just how hard (but not impossible) it is to use AI to reduce prejudice in our law enforcement and judicial systems. AI learns from people. If implicit bias is in the training, then the AI will learn this bias. Moreover, when the AI performs work, that work will reflect this bias… even if the work is for social good.
Take for example the Allegheny Family Screening Tool. It is meant to predict which welfare children might be at risk from foster parent abuse. The initial rollout of this solution had some challenges though. The local Department of Human Services acknowledged that the tool might have racial and income bias. Triggers like neglect were often confused or misconstrued by associating foster parents who lived in poverty with inattention or mistreatment. Since learning of these problems, tremendous steps were taken to reduce the implicit bias in the screening tool. Elimination is much harder. When it comes to bias, how do people manage the unknown unknowns? How is social context addressed? What does “right” or “fair” behavior mean? If people cannot identify, define, and resolve these questions, then how will they teach the machine? This is a major driver AI will be perfectly imperfect because of implicit bias.
Coronavirus 2019 – ncov flu infection – 3D illustration
The second challenge is data. Data is the fuel for AI. The machine trains through ground truth (i.e. rules on how to make decisions, not the decisions themselves) and from lots of big data to learn the patterns and relationships within the data. If our data is incomplete or flawed, then AI cannot learn well. Consider COVID-19. John Hopkins, The COVID Tracking Project, U.S. Centers for Disease Control (CDC), and the World Health Organization all report different numbers. With such variation, it is very difficult for an AI to gleam meaningful patterns from the data let alone find those hidden insights. More challenging, what about incomplete or erroneous data? Imagine teaching an AI about healthcare but only providing data on women’s health. That impedes how we can use AI in healthcare.
Then there is a challenge in that people may provide too much data. It could be irrelevant, unmeaningful, or even a distraction. Consider when IBM had Watson read the Urban Dictionary, and then it could not distinguish when to use normal language or to use slang and curse words. The problem got so bad that IBM had to erase the Urban Dictionary from Watson’s memory. Similarly, an AI system needs to hear about 100 million words to become fluent in a language. However, a human child only seems to need around 15 million words to become fluent. This implies that we may not know what data is meaningful. Thus, AI trainers may actually focus on superfluous information that could lead the AI to waste time, or even worse, identify false patterns.
The third challenge is expectations. Even though humans make mistakes, people still expect machines to be perfect. In healthcare, experts have estimated that the misdiagnosis rate may be as high as 20%, which means potentially one out of five patients are misdiagnosed. Given this data as well as a scenario where an AI assisted diagnosis may have an error rate of one out of one hundred thousand, most people still prefer to see only the human doctor. Why? One of the most common reasons given is that the misdiagnosis rate of the AI is too high (even though it is much lower than a human doctor.) People expect AI to be perfect. Potentially even worse, people expect the human AI trainers to be perfect too.
On March 23, 2016, Microsoft launched Tay (Thinking About You), a Twitter bot. Microsoft had trained its AI to the level of language and interaction of a 19-year-old, American girl. In a grand social experiment, Tay was released to the world. 96,000 tweets later, Microsoft had to shut Tay down about 16 hours after launch because it had turned sexist, racist, and promoted Nazism. Regrettably, some individuals decided to teach Tay about seditious language to corrupt it. In conjunction, Microsoft did not think to teach Tay about inappropriate behavior so it had no basis (or reason) to know that something like inappropriate behavior and malicious intent might exist. The grand social experiment resulted in failure, and sadly, was probably a testament more about human society than the limitations of AI.
Implicit bias, poor data, and people expectations show that AI will never be perfect. It is not the magic bullet solution many people hope to have. AI can still do some extraordinary things for humans like restore mobility to a lost limb or improve food production while using less resources. People should not discount the value we can get. We should always remember: AI is perfectly imperfect, just like us.