Tips and tricks for deploying TinyML

TinyML is a generic approach for shrinking AI models and applications to run on smaller devices, including microcontrollers, cheap CPUs and low-cost AI chipsets.

While most AI development tools focus on building bigger and more capable models, deploying TinyML models requires developers to think about doing more with less. TinyML applications are often designed to run on battery-constrained devices with milliwatts of power, a few hundred kilobytes of RAM and slower clock cycles. Teams need to do more upfront planning to meet these stringent requirements. TinyML app developers need to consider hardware, software and data management and how these pieces will fit together during prototyping and scaling up.

ABI Research predicts the number of TinyML devices will grow from 15.2 million shipments in 2020 to a total of 2.5 billion by 2030. This promises many opportunities for developers who have learned how to deploy TinyML applications.

Sang Won Lee, CEO of embedded AI platform Qeexo, said, "Most of the work is similar to building a typical ML model, but there are two extra steps with TinyML: converting the model to C code and compiling for the target hardware." This is because TinyML deployments are geared toward small microcontrollers, which are not designed to run heavy Python codes.

It is also essential to plan how TinyML applications might deliver varying results in different environments. Lee said TinyML applications generally work with sensor data that's heavily dependent on the surrounding environment. When the environment changes, the sensor data changes as well. As a result, teams need to plan to reoptimize models in different environments.

Código, Programación, Hackear, Html, Web

What is involved in getting started with TinyML?

AI developers may want to brush up on C/C++ and embedded systems programming to understand the basics of deploying TinyML software on constrained hardware.

"Some familiarity with general principles of machine learning, embedded systems programming, microcontrollers and working with hardware microcontroller boards is needed," said Qasim Iqbal, chief software architect at autonomous submarine developer Terradepth.

Good products to assist TinyML deployments include the Arduino Nano 33 BLE Sense, the SparkFun Edge and the STMicroelectronics STM32 Discovery Kit. Secondly, a laptop or desktop computer with a USB port is needed for interfacing. Third, it's fun to experiment by equipping hardware with a microphone, accelerometer or camera. Finally, Keras software packages and Jupyter Notebooks might be needed for training a model on a separate computer before that model is moved to a microcontroller for execution and inference.

Iqbal also recommends learning preprocessing tools that transform raw input data to be fed to a TensorFlow Lite Interpreter. Then, a post-processing module can change the model's inferences, interpret them and make decisions. Once this is completed, an output handling stage can be implemented to respond to predictions using device hardware and software capabilities.

Before getting too serious, a few demo projects can help developers understand the implications of various TinyML constraints. In addition to limitations on RAM and clock speed, developers may also want to explore the limits of stripped-down Linux distributions that run on their target platforms. These often have limited support for the OS and system libraries that they would expect on larger Linux-based systems.

"Judicious decisions regarding the right device hardware, software support, machine learning model architecture and general software considerations are important," Iqbal said.

It's helpful to investigate whether a microcontroller will support the intended app or if larger devices, such as Nvidia's Jetson series of devices, might work better.

Computadora Portátil, Manzana, Teclado

Combining hardware and software

Developers learning about TinyML software might consider investigating the community behind each TinyML tool before getting too attached to any particular one.

"Quite often, you won't be able to find answers to your questions in the official documentation," said Jakub Lukaszewicz, head of AI for construction technology platform AI Clearing. Lukaszewicz often found himself resorting to browsing the internet, Stack Overflow or specialized forums to find answers. If the ecosystem around the platform is sufficiently big and active, it's easier to find people who have similar problems and learn how they address them.

It's also helpful to investigate the available hardware before diving in too deeply.

"The sad news is that in the post-pandemic reality, delivery times can be long and you may be left with a limited choice of what is currently available on the shelf," Lukaszewicz said.

After getting the board, the next step is choosing the ML framework to work with. Lukaszewicz said TensorFlow Lite is currently the most popular framework, but PyTorch Mobile is gaining traction. Finally, you want to find tutorials or dummy projects using the ML framework and board of your choosing to see how the pieces fit together.

Watch out for changes in the frameworks and hardware that may create issues. Lukaszewicz has often struggled with outdated documentation and things not working as they should.

"It is often the case that the platform was tested against a given version of a framework, such as TensorFlow Lite, but struggled with the newest one," he said.

In such cases, he recommends downgrading to the latest supported version of the framework and rerunning your model.

Another problem is dealing with unsupported operations or insufficient memory to fit the model. Ideally, developers should take an off-the-shelf model and run it on a microcontroller without too much hassle. "Unfortunately, this is often not the case with TinyML," Lukaszewicz said.

He recommends first trying out models that have been proven to work on the board of your choice. He often discovered that a state-of-the-art model uses some mathematical operations that are not yet supported on certain devices. In such a scenario, you would have to change network architecture, replace those operations with supported ones and retrain the model, hoping all this would not sacrifice its quality. Reading forums and tutorials is a great way to see what works and what doesn't work on a given platform.