ChatGPT is the fastest growing app of all time, gaining more than 100 million users just two months after its launch in November. It allows users to have human-like conversations that include reasonable-sounding and often correct answers to all sorts of questions. Like humans, it can ask for more information and explain reasoning.
We’re now seeing the first academic research about the use of ChatGPT in finance. Two recent studies make GPT seem like a promising technology both to improve investment decision making and to explain its decisions. Perhaps the long-held dream of replacing humans in finance is coming true.
In December I wrote that “a tireless machine able to digest all information and immune to biases should be clearly superior to humans when it comes to investing. Except it’s not.” Financial management was one of the earliest goals of artificial intelligence, or AI, research because it seemed like an easy and highly rewarding task. But so far, AI has succeeded only in niche applications in finance.
GPT stands for Generative Pre-trained Transformer, a five-year-old idea that may be a game-changer in AI applications. Very broadly, there are three approaches to extracting useful information from data. With structured data, like accounting numbers or price histories, you can apply statistics and formal models. With completely unstructured data — series of bits that could be photographs or physical measurements or text or anything else — there are algorithms that can extract patterns and predict future inputs.
Language is somewhere in between. There is structure, meaning only certain letter combinations are intelligible words, and there are grammar rules for stringing words together. But there are exceptions to rules, and nuances beyond the literal text. You need a lot of domain knowledge and context to understand text. There is an old story — it has been tracked back to 1956 at which time it was already old — about an AI worker who built a program to translate between English and Russian. She gave it the phrase “out of sight, out of mind” to translate to Russian, and then translated the Russian back to English and got “invisible idiot.” There are no rules of language that tell us the phrase is an aphorism about forgetfulness rather than a description of an individual, but no native speakers would make the mistake.
GPT models are the hottest current approach to working with language data, but quantitative trading and investment have used cruder language models for many years. A human researcher reads relevant information such as company statements, news stories, surveys and research reports carefully and slowly. Computers can read vast quantities of information in many languages and come up with instant conclusions. This is essential for high-frequency trading when being a millisecond sooner to determine whether a news headline is good or bad news for a stock price is the name of the game.
Most of the language models used in quantitative finance today treat it as structured data. Algorithms look for certain words, or just measure the number of words in a headline or press release. Some algorithms look for certain patterns or structures. But none of the major ones try to understand the meaning of the text, and none of them can explain why they reach their conclusions or hold further conversation on the subject.
Now come two papers titled “Can ChatGPT Decipher Fedspeak?” and “Can ChatGPT Forecast Stock Price Movements?” We’re not talking about SkyNet taking over Wall Street, but whether ChatGPT beats older models — many of which treat language as structured — in making fast decisions about short texts.
The first paper asked ChatGPT to determine if an individual sentence from a Federal Reserve statement was “dovish” (suggesting the central bank was more likely to cut than raise interest rates) or “hawkish” (suggesting the opposite). A high-frequency trading algorithm might rate each sentence in the Fed release and use the output along with other data to trade federal funds futures or other instruments before the human analysts had finished reading the first word in the release.
This is not immediately useful for trading. The paper did not disclose how fast the model ran, nor whether overall interpretations of entire Fed releases agreed well with human overall conclusions (whether they agreed with reality is not the point, since high-frequency traders are trying to beat the market to the new consensus, not to the theoretically correct place). But it suggests that GPT models might have turned a corner to actually understanding language. If that’s true — and one study doesn’t prove anything — they can be unleashed on a much wider range of text to generate theses, such as inflation will likely to continue to be a problem over the next 12 months, rather than flash signals for high-frequency trading. And instead of binary buy/sell signals, ChatGPT can hold a conversation with a human analyst to improve investment decisions. Finally, if this seems to be working, a future generation of GPT models can be trained on the entire history of texts and financial price movements.
The second paper is more directly relevant for trading. It used ChatGPT to rate news headlines as good or bad for stock prices. It tested the strategy of buying a stock with good news at the open after the headline was released and selling at the close; or selling at the open and buying back at the close if the headline was bad.
The results are inconclusive. The ChatGPT signal had a 0.01 correlation with the next day’s raw stock return. But to evaluate a signal you need to compare to the residual return after adjusting for the market return, and perhaps for known factors. A 0.01 correlation could be valuable in combination with other signals, or it might not. The tested strategy did have positive returns from October 2021 to December 2022 without transaction costs, but the authors do not provide data on whether it beat a market strategy, nor whether the positive return was significant statistically. A reported 0.13% gross profit per trade suggests it might not overcome transaction costs.
The authors also report a regression that includes future information, so it cannot be used to evaluate effectiveness for making decisions based on information known at the time. The ChatGPT signal supplies no additional information to the three decimal places the authors show, although it does seem to have some small positive value. But inconclusive does not mean failure. The study did suggest that ChatGPT was better than popular alternative models, and research on GPT and other large language models is continuing.
GPT is an AI tool that can work with humans, and learn from them, and teach them rather than some incomprehensible black box. At the very least, it seems poised to replace older algorithms and to increase the use of AI in both quantitative and qualitative investing. It’s a long way from taking over Wall Street, but there’s no reason to think it can’t.
LeackStat 2023
2024 © Leackstat. All rights reserved