How to leverage large language models without breaking the bank

Generative AI continues to dominate headlines. At its onset, we were all taken in by the novelty. But now we’re far beyond the fun and games — we’re seeing its real impact on business. And everyone is diving in head-first.

MSFT, AWS and Google have waged a full-on “AI arms race” in pursuit of dominance. Enterprises are hastily making pivots in fear of being left behind or missing out on a huge opportunity. New companies powered by large language models (LLMs) are emerging by the minute, fueled by VCs in pursuit of their next bet.

But with every new technology comes challenges. Model veracity and bias and cost of training are among the topics du jour. Identity and security, although related to the misuse of models rather than issues inherent to the technology, are also starting to make headlines.

Cost of running models a major threat to innovation

Generative AI is also bringing back the good ol’ open-source versus closed-sourced debate. While both have their place in the enterprise, open-source offers lower costs to deploy and run into production. They also offer great accessibility and choice. However, we’re now seeing an abundance of open-source models but not enough progress in technology to deploy them in a viable way.

All of this aside, there is an issue that still requires much more attention: The cost of running these large models in production (inference costs) poses a major threat to innovation. Generative models are exceptionally large, complex and computationally intensive, making them far more expensive to run than other kinds of machine learning models.

Imagine you create a home décor app that helps customers envision their room in different design styles. With some fine-tuning, the model Stable Diffusion can do this relatively easily. You settle on a service that charges $1.50 for 1,000 images, which might not sound like much, but what happens if the app goes viral? Let’s say you get 1 million active daily users who make ten images each. Your inference costs are now $5.4 million per year.

Tecnología, Desarrollador, Toque, Dedo

LLM cost: Inference is forever

Now, if you’re a company deploying a generative model or a LLM as the backbone of your app, your entire pricing structure, growth plan and business model must take these costs into consideration. By the time your AI application launches, training is more or less a sunk cost, but inference is forever.

There are many examples of companies running these models, and it will become increasingly difficult for them to sustain these costs long-term.

But while proprietary models have made great strides in a short period, they aren’t the only option. Open-source models are also showing great promise in the way of flexibility, performance and cost savings — and could be a viable option for many emerging companies moving forward.

Hybrid world: Open-source and proprietary models are important

There’s no doubt that we have gone from zero to 60 in a short time with proprietary models. Just in the past few months, we’ve seen OpenAI and Microsoft launch GPT-4, Bing Chat and endless plugins. Google also stepped in with the introduction of Bard. Progress in space has been nothing short of impressive.

However, contrary to popular belief, I don’t believe gen AI is a “winner takes all” game. In fact, these models, while innovative, are just barely scratching the surface of what’s possible. And the most interesting innovation is yet to come and will be open-source. Just like we’ve seen in the software world, we’ve reached a point where companies take a hybrid approach, using proprietary and open-source models where it makes sense.

There is already proof that open source will play a major role in the proliferation of gen AI. There’s Meta’s new LLaMA 2, the latest and greatest. Then there’s LLaMA, a powerful yet small model that can be retrained for a modest amount (about $80,000) and instruction tuned for about $600. You can run this model anywhere, even on a Macbook Pro, smartphone or Raspberry Pi.

Meanwhile, Cerebras has introduced a family of models and Databricks has rolled out Dolly, a ChatGPT-style open-source model that is also flexible and inexpensive to train.

Tecnología, Negocio, Análisis, Futurista

Models, cost and the power of open source

The reason we’re starting to see open-source models take off is because of their flexibility; you can essentially run them on any hardware with the right tooling. You don’t get that level of and control flexibility with closed proprietary models.

And this all happened in just a short time, and it’s just the beginning.

We have learned great lessons from the open-source software community. If we make AI models openly accessible, we can better promote innovation. We can foster a global community of developers, researchers, and innovators to contribute, improve, and customize models for the greater good.

If we can achieve this, developers will have the choice of running the model that suits their specific needs — whether open-source or off-the-shelf or custom. In this world, the possibilities are truly endless.

LeackStat 2023