How transfer learning can boost business efficiency

Transfer learning is a technique that’s risen to prominence in the AI and machine learning community over the past several decades. It refers to storing knowledge gained while solving one problem and applying it to a different, but related, problem. So far, transfer learning has been applied to cancer subtype discovery, video game playing, text classification, medical imaging, spam filtering, and more. Prominent computer scientist Andrew Ng said in 2016 that transfer learning will be one of the major drivers of machine learning commercial success.

Transfer learning has its benefits, chief among them allowing companies to repurpose machine learning models for new problems with less training data. But transfer learning is often simpler in theory than in execution. For example, models trained on one problem and applied to another can suffer from negative transfer, where the model becomes less accurate over time.

The potential pitfalls are why it’s important that organizations considering investing in technologies using transfer learning understand the basics of the technique. With this knowledge, they’re more likely to apply transfer learning successfully — no matter the domain in question.

Origins of transfer learning

The origins of transfer learning lie in a study conducted by academics Stevo Bozinovski and Ante Fulgosi in 1976. In it, the coauthors proposed the use of transfer learning in neural networks during the model training process. Nearly a decade later, a report was given on the application of transfer learning in character recognition. But the technique isn’t thought to have entered the mainstream until around 1995, where it was presented at a workshop during the NIPS machine learning conference in Denver, Colorado.

As opposed to traditional machine learning, which occurs on specific tasks and datasets, transfer learning leverages features and weights (among other variables) from previously trained models to train new models. Features are information extracted from a dataset to simplify a model’s learning process, like the edges, shapes, and corners of signature boxes and typefaces in documents. On the other hand, weights determine how a given piece of input data will influence the output data.

Models are trained in two stages in transfer learning. First, there’s retraining, where the model is trained on a benchmark dataset representing a range of categories. Next is fine-tuning, where the model is further trained on a target task of interest. The pretraining step helps the model to learn general features that can be reused on the target task, boosting its accuracy.

Transfer learning has a wealth of use cases, particularly in image and speech recognition as well as natural language processing (NLP). For instance, a model trained for an autonomous car can likely be leveraged for an autonomous truck — at least in part. And a model that developed strategies while playing the Chinese board game Go –such as DeepMind’s AlphaZero — can likely be adapted to related games like chess.

Google and Amazon are using transfer learning in Google Translate and Alexa so that the insights gleaned through training on high-resource languages (e.g., French, German, and Spanish) can be applied to the translation of low-resource languages (Yoruba, Sindhi, and Hawaiian). Meanwhile, Yelp has used transfer learning to identify photos most likely to contain spam uploaded by users to business listings.

Código, Codificación, Equipo, De Datos

Types of transfer learning

There’s several different kinds of transfer learning, each with their own upsides: inductive, unsupervised, and transductive transfer learning. With inductive transfer learning, the source and target domains are the same, yet the source and target tasks are different. Unsupervised learning involves different tasks in similar — but not identical — source and target domains without labeled data. As for transductive transfer learning, similarities exist between the source and target tasks, but the domains are different and only the target domain doesn’t have labeled data.

Unsupervised transfer learning models are distinguished from supervised models in that they must teach themselves to classify the data, processing unlabeled data to learn from its inherent structure. Supervised transfer learning models are trained on input data annotated for a particular output until they can detect the underlying relationships between the inputs and output results.

Transfer learning can be further categorized by the components of the model being transferred. Instance transfer reuses knowledge from the source domain to the target task, for example, while parameter transfer works on the assumption that the models for related tasks share some parameters. Parameters are the features internal to a model (including weights) that are learned from the training data.

Challenges

Transfer learning has plenty in the way of advantages, namely that it speeds up the process of training on a new task. Whereas models like OpenAI’s GPT-3 and DeepMind’s AlphaStar might need powerful hardware and countless hours to train, a “fine-tuned” model created through transfer learning typically requires a fraction of the time and effort.

As PJ Kirk, digital marketing executive at data analytics firm Analytics Engines, points out, transfer learning can enable more organizations to incorporate AI and machine learning into their core business strategies. “The reduced financial, time, and infrastructural costs have made AI and machine learning more accessible than ever before,” he wrote in a blog post. “Organizations no longer need to create dedicated deep learning models and can instead capitalize upon the expertise and models of others to provide the foundation upon which their solution is built.

But transfer learning isn’t without setbacks. There’s the aforementioned negative transfer, caused from too high a dissimilarity of the problem domains or an inability of the model to train for the new domain’s dataset. And there’s another barrier: explainability. Despite recent breakthroughs, it remains difficult to understand what enables a successful transfer and which parts of models are responsible for it.

In good news on the explainability front, researchers at Google recently published a paper that shed light on transfer learning’s fundamentals. They found that features become more specialized the “denser” the model is, and that feature reuse is more prevalent in the parts of the model closer to the input data. Beyond this, they discovered that it’s possible to fine-tune pretrained models on a target task earlier than originally assumed, without sacrificing accuracy.

Man, Computer, Screen, Desktop, Imac

Benefits

Work like Google’s illustrates that the challenges around transfer learning aren’t insurmountable. In any case, the benefits certainly appear to be worth it.

Kevin Dewalt, cofounder of AI consultancy Prolego, posits that transfer learning is in equal parts efficient and economical. “Suppose your CFO only approves enough budget to generate 1,000 pictures of meals labeled with calories — a mere 1% of what your data scientist requested. Before begging for more money, you [can generate] results through transfer learning,” he wrote in a Medium post. “Unless you’re Google or Facebook, getting labeled data can be prohibitively expensive. Transfer learning techniques provide two primary business benefits: Faster experiments [and] higher ROI, [because] transfer learning can reduce the cost of ongoing data managements and boost the ROI of any machine learning project.”

While hardly a panacea, transfer learning has already been applied to many problem domains including NLP, audio and video processing, image captioning models, and more. When faced with training a new model for a related problem domain, or if there’s a minimal amount of training data, transfer learning can save invaluable time and energy.