Living on the edge: How edge cases will determine the future of generative AI

In AI development, success or failure lies significantly in a data science team’s ability to handle edge cases, or those rare occurrences in how an ML model reacts to data that cause inconsistencies and interrupt the usability of an AI tool. This is especially crucial now as generative AI, now newly-democratized, takes center stage. Along with increased awareness comes new AI strategy demands from business leaders who now see it as both a competitive advantage and as a game changer.

“As companies go from AI in the labs to AI in the field or production AI, the focus has gone from data- and model-centric development, in which you bring all the data you have to bear on the problem, to the need to solve edge cases,” says Raj Aikat, CTO and CPO at iMerit. “That’s really what makes or breaks an application. A successful AI application or company is not one that gets it right 99.9 percent of the time. Success is defined by the ability to get it down to the .1 percent of the time it doesn’t work — and that .1 percent is about edge cases.”

Binario, Uno, Cyborg, Cibernética

What edge cases look like in generative AI

Generative AI creates new data from old experiences and old data, and has been around for some time. For example, it’s commonly used to enhance photographs, by analyzing existing pixels and creating new ones. Today’s Omniverse, and the concept of digital twins, is a recent breed of generative AI model. And now it’s the language side of the technology, based on large language models (LLMs), that has been democratized by ChatGPT, and has captured the imagination of the public and businesses alike.

For LLMs, the most common edge examples are irrelevant or biased text, which arise when the model makes inexact or incorrect assumptions based on the information it’s ingested — especially when that information is contradictory, as it so often is on the web.

“Neural networks are infants, which come with a certain amount of hardwired information,” Aikat says. “You set one kid, or neural network, loose on the world wide web and they’re going to keep hitting snags and making misjudgments if there’s no supervision or training. With no human-in-the-loop and all the information in the world, an LLM will keep on hitting these edge cases.”

On the computer vision side, generative AI can adjust an image’s pixels so that the information is completely accurate, but look unrealistic to less-discerning human vision. This is great when an image is being used in automated driving, but not visually pleasing. For digital twins, such as ones used to test automated driver systems, it’s about synthetic data, or taking edge cases to make the worst-case scenario — a falling pedestrian, driving in a rain storm, or when there’s mud on the sensor, for example.

Cerebro, Mano, Artificial, Dedo

What edge case management entails

There are three pieces to edge case management: detect, triage, retrain. The first piece, detect, is the ops side, combining machine learning and human intelligence. There is so much data, including on edge cases, that it’s virtually impossible to have 100% human-in-the-loop to capture edge cases. Instead, a human monitors the data for the areas where the neural network gets stuck, confused or unable to make a decision, or for data points where the human user determined that the machine would not be able to make the right decision in that circumstance.

Secondly, those data points are triaged in real time — a human double clicks into those specific incidences to determine what was happening, and categorize the issue.

Finally, that data is used to retune or retrain networks. That requires a combination of production data, or all the possible weird scenarios that might pop up in your application, that occur especially with synthetic data. It’s about creating new combinations or edges cases based on what’s been learned about the edge cases that have been discovered, rather than having to replicate edge cases in the field — which would be both a safety and security risk (think autonomous vehicles).

From there, once the model is retuned, you create test scenarios, extracted from edge cases — the environments and circumstances which will best test these models. All of these steps in the cycle depend on having a human-in-the-loop, Aikat says, by definition of “edge case” alone.

Inteligencia Artificial, Cerebro, Pensar

Why human expertise remains critical

“The whole definition of edge cases means that the machine cannot understand what’s going on,” Aikat says. “Therefore, you could almost say that edge cases force a human-in-the-loop. You cannot solve the edge case problem with only a machine-in-the-loop, because the whole definition of edge cases is that the machine got stuck.”

That means a balance is critical: to operate successfully, gen AI requires both unsupervised learning, or a breadth of knowledge and data, and supervised learning, which is where the depth and the relevance come.

And as the call for generative AI is gaining momentum, a hue and cry around the need for regulatory guardrails is also rightfully gaining volume — and of course guardrails and moderation remain a human thing. They can address the kind of harmful bias that derails a model, and address the very specific problems that LLMs face, in particular — as well as be careful to include the kind of bias that’s increasingly essential in a world where there is actually not always two sides to every story, and opinions don’t hold the same weight as fact.

For example, addressing particularly sensitive topics about identity and racism from the perspective of the marginalized groups in question, or the history of the Holocaust from a Jewish perspective, or enslavement from a Black perspective.

“That’s conscious bias, which should be entered into any examination of the plight of those actively harmed and traumatized. That’s something that your models have to be very sensitive and agnostic about,” he explains. “On one hand, we try to remove bias from traditional networks, but here we’re trying to introduce crucial context from the perspective of the huge populations affected, while still trying to align with historical truth — and excavate those moments where the truth gets obscured. Only the human can provide the kind of sensitive bias and context that makes these models actually function.”

Inteligencia Artificial, Robot, Ai

ML DataOps is key to getting AI projects across the finish line

Human supervision, which is what makes edge case management possible, requires an MLOps and ML DataOps strategy, from experimentation to iteration and continuous improvement, because it requires collaboration between the data engineering, data science, and ML engineering teams, working in tandem.

MLOps has to be there right from what they call the EDA, or exploratory data analysis, Aikat says. In other words, you have to analyze exactly what the production application is, what data it requires, and what retuning and testing you’ll need to do, while you’re figuring out the model design, right at the start.

“That’s what makes or breaks your business,” he says. “Especially when it comes to dealing with that .1 percent of the time a model doesn’t work in production. So, establish an MLOps strategy right from the start to go in ready to succeed.”

Adding an MLOps strategy after the fact often requires an experienced partner, and there are many to choose from now in the middle of this boom.

“Look at partners who see the future and not just the past and the present,” he explains. “By the time you pick your partner and go into production with them, the world will have changed. It’s moving very fast. Where we were at this time last year is a very different world of AI than where we are this year.”

LeackStat 2023