AI doesn't explain itself - machine learning has a "Deus ex Machina" problem

It's become something of a pattern when software developers explain some difficult, amazing or intractable process as, "machine learning handles that," sidestepping how thoroughly or carefully this important set of steps works. For modern technology, it's a clumsy application of a two-thousand-year-old plot device: the "Deus ex Machina"

From Euripides and Deus ex Machina

~When the gods have something to say about it~

This Latin phrase originally described an ancient plot device used in Greek and Roman theatre. Many tragedy writers used Deus ex Machina to resolve complicated or even seemingly hopeless situations in the plots of their plays. The phrase is loosely translated as "god from the machine." This translation refers to how the Deus ex Machina was often performed in ancient theatre. An actor playing a god or goddess would be lowered on stage by a "mechane" which was the name of the crane device used.

In it rapid evolution, ML applications began with humble beginnings of labeling training data, and then crunching through 100's of thousands or even millions of record to find patterns in the data for classification or prediction. Neural nets, from humble feed-forward single-layer models to an ever-growing number or more exotic types of deep learning, are also, roughly speaking, machine learning. It is said that GPT-3 crushes billions of records, perhaps with billions of variables. So what exactly does it mean when we're told "machine learning handles that?"

Examining some of these suspicious AI claims, I sensed the key difference between rules-based engines and AI, because many vendors with hundreds of rules feel they have accomplished some sort of near version of AI. Rules-based engines are like signature-based antivirus (AV). They already know what to expect. You've got a bunch of researchers looking at what has happened in the past, and based on that they write a bunch of IF-THEN rules that identify known malware. Rules are only as good as your research and what the hacker community is doing that you know about. Hackers don't share ahead of time, so you are always behind.

Fotos, Por La Máquina, Aprender, Gatos

What does AI transparency entail? Two vendors that got Deus ex Machina right

I worked with a vendor that made the Deus ex Machina claim, but backed it up with actual detail. They were subsequently acquired and absorbed into another company, so I can't name them. However, their solutions were underpinned by what they described as their OneMind machine learning engine. Using Recurrent Convolutional Neural Networks RCNN, Semi-Structured Data Parsing: Hidden Markov Model and Knowledge Graphs, they backed up the claim of AI/ML in their product by describing how it was applied. The product provided:

Advice, like suggestions for how to organize data in a way that eased searches, and automated tasks such as reconciling formatting differences between records.
Data prep and data catalog recommendations to aid the discovery of similar datasets.
Natural Language Query support to that help users directly get answers to data questions.
Integrated governance and security help maintain compliance for self-service access to data.

Another example, Alation, is clear about all of the ways ML it is used In their product:

According to the Forrester Wave: Machine Learning Data Catalogs, Q4 2020:

Alation exploits machine learning at every opportunity to improve data management, governance, and consumption by analytic citizens. Every data catalog function is underpinned by intelligence that learns from data patterns, queries, and data professional search and interaction.

My take

Machine learning algorithms are based on statistics, not psychology or neuroscience. To apply the inference of a ML model requires the application of those inferences in an intelligent way. While the effective use of AI capabilities is important, the overall design of Alation's product must maximize the use of those technologies.

Beyond the data management field, have we found effective use of AI in the analytics side, or just "AI washing?" That's a meaty topic, but we can start with this question: Where can AI be of real assistance in analytics? Here are some of the viable use cases I've seen to date:

Query suggestions
Hidden insights
Personalized Answers
Ongoing Feedback
Conversational AI

Conversational AI: This is an area that is still more hype than reality. I took a realistic look at use cases in NLP brings interactive analytics forward - but what are the requirements to make augmented AI work on your project? But for now, let's leave it at this:

NLP has a limiting factor, however. It is not a database or a query processing engine, or a powerful calculation platform. It can understand your question and often provide stunning insight. It can generate a narrative of analysis in words, visualization, or generated speech, but it needs the intelligence of a powerful analytical engine to process the question. NLP can learn. It can extrapolate and provide the ability for you to be more creative, more expansive, and more dynamic in your explorations of data, but it can't count to ten. To complete functionality, an underlying mesh of analytics engines, databases, curated data storage and adequate metadata store to do the heavy lifting.

In other words, transparency isn't the only problem. But when AI vendors make Deus ex Machina claims, we should expect a full accounting.