The production of AI systems that power the products we use has undergone a rapid transformation over the past decade. Companies previously poured resources for teams to come up with new algorithms, but are now likely to use existing systems to create models that are constantly improving.
As a result, the focus has shifted to data.
A data engine is a closed loop system where a product or service is producing data in a form that can be used to continuously train an AI system, Sharma explained. Models are being trained periodically, and those models are deployed back into applications, generating new kinds of data. This continuous system makes an AI system better over time.
There are three keys to building a strong data engine, Sharma said: embracing automation, identifying the right data and rapid iteration.
The process of building a data engine can be very cumbersome, often requiring a lot of people manually labeling and categorizing information that range from text and receipts, to medical professionals hand labeling portions of medical images to identify tumors. This is where automation comes in.
With automation, AI teams can use models that can select and send data to humans for correction. Correcting data often costs less than creating data from scratch, Sharma said.
One of Labelbox’s largest agricultural customers uses this method of model-assisted labeling.
“It becomes a very iterative closed-loop approach where models and humans are working together, ultimately enabling AI teams to label data faster,” Sharma said.
The second major part of building a robust data agent is identifying the smallest set of data to label that can improve model performance across the data domain.
Sharma used this analogy: To understand a concept, humans don’t have to see every single example. We generally understand an idea and how it works with just a few of them.
AI systems can operate the same way, Sharma said.
“If your machine and teams are working smartly and have the right tools and workflows that enable them to choose the right data that is going to make the difference in the performance of the AI model, what we see is that most machine learning teams that are in production … they realize that they actually need less than 5% of labeled examples in the domain,” Sharma said.
Labelbox has introduced a new tool called “model diagnostics” that can do just that.
The product, Sharma said, helps machine learning teams understand model performance in depth. They can enter model predictions at every iteration that they do and the tool allows them to visualize these model predictions, analyze them and form a hypothesis.
Sharma said machine learning is much slower than software development, which usually involves a developer writing code and testing it within minutes. Machine learning can take weeks, if not months.
To increase the chances of a successful AI program, teams must shrink the length of the iteration cycle and be able to conduct as many experiments as possible.
“This is how we are seeing some of the best machine learning teams out there, accelerating their paths to production AI systems,” Sharma said.
© 2021 LeackStat.com
2025 © Leackstat. All rights reserved