Prompt-based learning can make language models more capable

Supervised learning, where AI models are trained on input data annotated for a particular output until they can detect the underlying relationships between the inputs and outputs, plays a major role in natural language processing (NLP). Early NLP models relied heavily on feature engineering — researchers used domain knowledge to extract key information from training datasets and provide models with the guidance needed to learn from the data. But with the advent of neural network models for NLP, the focus pivoted from feature engineering to model architecture engineering. Neural networks enabled features to be learned jointly with the training of the models themselves.

Now the paradigm in NLP is shifting again in favor of an approach some researchers call “prompt-based learning.” Given a range of carefully designed prompts, a language model trained in an unsupervised fashion — that is, on unlabeled data — can be used to solve a number of tasks. But there’s a catch with prompt-based learning — it requires finding the most appropriate prompt to allow a language model to solve the task at hand.

Researchers at Carnegie Mellon University lay out the details in a new paper.

Pretrain, prompt, and predict

Four years ago, there was another sea change in NLP model training as researchers embraced a technique called “pre-train and fine-tune.” In this framework, a model like Google’s BERT is pretrained with the ability to complete a range of different language tasks, like summarization and text generation. Because the raw textual data necessary to train language models (e.g., ebooks and online encyclopedia articles) is available in abundance, these models can be trained on large datasets — and in the process learn general-purpose language features. The pretrained language models can then be adapted to different tasks through a process of fine-tuning using task-specific optimizations.

Pretraining and fine-tuning have led to countless advances in the field of NLP. For example, OpenAI fine-tuned GPT-3 to create the model powering GitHub’s Copilot, an AI service that provides suggestions for whole lines of code. For its part, Nvidia developed an AI-powered speech transcription system by fine-tuning a large model trained on health care and life sciences research. But “pre-train and fine-tune” is increasingly giving way to “prompt-based learning,” in which tasks like Copilot’s code suggestions are reformulated to look more like those solved during the original model training. By selecting the appropriate prompts, researchers can manipulate the model’s behavior so the pretrained language model can be used to predict the desired output — sometimes without any task-specific training.

Prompt-based learning involves prompt engineering, or the process of creating a “prompting function” that results in good performance on a target application. This can be a single prompt or multiple prompts. For example, given the task of analyzing the sentiment of the sentence “I missed the bus today,” researchers could continue with the prompt “I felt so [blank]” and ask a language model to fill in the blank with an emotion. Or they could append an incomplete sentence like “China’s capital is [blank]” with prompts containing examples such as “Great Britain’s capital is London. Japan’s capital is Tokyo. China’s capital is [blank].”

As Princeton Ph.D. student Tianyu Gao explains in an article for The Gradient: “A prompt is a piece of text inserted in the input examples so that the original task can be formulated as a (masked) language modeling problem. For example, say we want to classify the sentiment of the movie review ‘No reason to watch,’ we can append a prompt ‘It was’ to the sentence, getting ‘No reason to watch. It was [blank].’ It is natural to expect a higher probability from the language model to generate ‘terrible’ than ‘great.'”

Prompt-based methods seek to better mine the knowledge about facts, reasoning, understanding sentiment, and more from pretraining. For example, for a text classification task, a researcher would need to design a template (“It was”) and the expected text responses, which are called label words (e.g., “great,” “terrible”).

Some research shows that a prompt may be worth 100 conventional data points, suggesting they can enable a massive leap in efficiency.

Red Neural, Pensamiento, Mente, Mental

Challenges with prompts

Prompts can be designed either manually or through automated methods. But creating the perfect prompt requires both understanding a model’s inner workings and trial and error.

The stakes are high because the wrong prompt can bring bias from the pretraining dataset. For example, given “N/A” as an input, GPT-3 tends to output “positive” over “negative.” There’s evidence showing that language models in particular risk reinforcing undesirable stereotypes, mostly because a portion of the training data is commonly sourced from communities with prejudices around gender, race, and religious background.

Beyond bias, prompts are limited in terms of the types of tasks they can optimize for. Most prompt-based methods revolve around either text classification or generation. Information extraction, text analysis, and other, more complex tasks necessitate a less straightforward prompt design.

Even for tasks where prompt-based methods are known to be effective, a model’s performance will depend on both the templates being used and the answer being considered. How to simultaneously search or learn for the best combination of template and answer remains an open research question.

Despite these barriers, however, studies suggest prompt-based learning is a promising area of study — and may be for years to come. As Gao notes, prompts can better mine knowledge about facts, reasoning, and sentiment from unsupervised pretrained models, ultimately squeezing more potential out of language models and making them learn better.

“The concept of prompts and demonstrations also gives us new insights about how we can better use language models,” he wrote. “[Recent research proves that] models can well handle a wide range of tasks with only a few examples by leveraging natural-language prompts and task demonstrations as context while not updating the parameters in the underlying model.”